public inbox for dev@dpdk.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/27] combine multiple Intel scalar Tx paths
@ 2025-12-19 17:25 Bruce Richardson
  2025-12-19 17:25 ` [RFC PATCH 01/27] net/intel: create common Tx descriptor structure Bruce Richardson
                   ` (30 more replies)
  0 siblings, 31 replies; 274+ messages in thread
From: Bruce Richardson @ 2025-12-19 17:25 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

The scalar Tx paths, with support for offloads and multiple mbufs
per packet, are almost identical across drivers ice, i40e, iavf and
the single-queue mode of idpf. Therefore, we can do some rework to
combine these code paths into a single function which is parameterized
by compile-time constants, allowing code saving to give us a single
path to optimize and maintain - apart from edge cases like IPSec
support in iavf.

The ixgbe driver has a number of similarities too, which we take
advantage of where we can, but the overall descriptor format is
sufficiently different that its main scalar code path is kept
separate.

Bruce Richardson (27):
  net/intel: create common Tx descriptor structure
  net/intel: use common tx ring structure
  net/intel: create common post-Tx cleanup function
  net/intel: consolidate definitions for Tx desc fields
  net/intel: create separate header for Tx scalar fns
  net/intel: add common fn to calculate needed descriptors
  net/ice: refactor context descriptor handling
  net/i40e: refactor context descriptor handling
  net/idpf: refactor context descriptor handling
  net/intel: consolidate checksum mask definition
  net/intel: create common checksum Tx offload function
  net/intel: create a common scalar Tx function
  net/i40e: use common scalar Tx function
  net/intel: add IPSec hooks to common Tx function
  net/intel: support configurable VLAN tag insertion on Tx
  net/iavf: use common scalar Tx function
  net/i40e: document requirement for QinQ support
  net/idpf: use common scalar Tx function
  net/intel: avoid writing the final pkt descriptor twice
  net/intel: write descriptors using non-volatile pointers
  net/intel: remove unnecessary flag clearing
  net/intel: mark mid-burst ring cleanup as unlikely
  net/intel: add special handling for single desc packets
  net/intel: use separate array for desc status tracking
  net/ixgbe: use separate array for desc status tracking
  net/intel: drop unused Tx queue used count
  net/intel: remove index for tracking end of packet

 doc/guides/nics/i40e.rst                      |  18 +
 drivers/net/intel/common/tx.h                 | 101 ++-
 drivers/net/intel/common/tx_scalar_fns.h      | 441 ++++++++++++
 drivers/net/intel/cpfl/cpfl_rxtx.c            |   4 +-
 drivers/net/intel/i40e/i40e_fdir.c            |  34 +-
 drivers/net/intel/i40e/i40e_rxtx.c            | 500 +++-----------
 drivers/net/intel/i40e/i40e_rxtx.h            |  17 +-
 .../net/intel/i40e/i40e_rxtx_vec_altivec.c    |  23 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c   |  34 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c |  50 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_common.h |   6 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_neon.c   |  23 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_sse.c    |  23 +-
 drivers/net/intel/iavf/iavf_rxtx.c            | 642 ++++--------------
 drivers/net/intel/iavf/iavf_rxtx.h            |  30 +-
 drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c   |  53 +-
 drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c | 104 ++-
 drivers/net/intel/iavf/iavf_rxtx_vec_common.h |  36 +-
 drivers/net/intel/iavf/iavf_rxtx_vec_sse.c    |  27 +-
 drivers/net/intel/ice/ice_dcf_ethdev.c        |  10 +-
 drivers/net/intel/ice/ice_rxtx.c              | 590 +++++-----------
 drivers/net/intel/ice/ice_rxtx.h              |  15 +-
 drivers/net/intel/ice/ice_rxtx_vec_avx2.c     |  53 +-
 drivers/net/intel/ice/ice_rxtx_vec_avx512.c   |  51 +-
 drivers/net/intel/ice/ice_rxtx_vec_common.h   |  43 +-
 drivers/net/intel/ice/ice_rxtx_vec_sse.c      |  24 +-
 drivers/net/intel/idpf/idpf_common_rxtx.c     | 298 ++------
 drivers/net/intel/idpf/idpf_common_rxtx.h     |  21 +-
 .../net/intel/idpf/idpf_common_rxtx_avx2.c    |  53 +-
 .../net/intel/idpf/idpf_common_rxtx_avx512.c  |  53 +-
 drivers/net/intel/idpf/idpf_rxtx.c            |  17 +-
 drivers/net/intel/idpf/idpf_rxtx_vec_common.h |   6 +-
 drivers/net/intel/ixgbe/ixgbe_rxtx.c          | 103 ++-
 .../net/intel/ixgbe/ixgbe_rxtx_vec_common.c   |   1 -
 34 files changed, 1394 insertions(+), 2110 deletions(-)
 create mode 100644 drivers/net/intel/common/tx_scalar_fns.h

--
2.51.0


^ permalink raw reply	[flat|nested] 274+ messages in thread

* [RFC PATCH 01/27] net/intel: create common Tx descriptor structure
  2025-12-19 17:25 [RFC PATCH 00/27] combine multiple Intel scalar Tx paths Bruce Richardson
@ 2025-12-19 17:25 ` Bruce Richardson
  2025-12-19 17:25 ` [RFC PATCH 02/27] net/intel: use common tx ring structure Bruce Richardson
                   ` (29 subsequent siblings)
  30 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2025-12-19 17:25 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Praveen Shetty, Vladimir Medvedkin,
	Anatoly Burakov, Jingjing Wu

The Tx descriptors used by the i40e, iavf, ice and idpf drivers are all
identical 16-byte descriptors, so define a common struct for them. Since
original struct definitions are in base code, leave them in place, but
only use the new structs in DPDK code.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h                 | 16 ++++++---
 drivers/net/intel/cpfl/cpfl_rxtx.c            |  2 +-
 drivers/net/intel/i40e/i40e_fdir.c            |  4 +--
 drivers/net/intel/i40e/i40e_rxtx.c            | 26 +++++++-------
 .../net/intel/i40e/i40e_rxtx_vec_altivec.c    |  6 ++--
 drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c   |  6 ++--
 drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c |  6 ++--
 drivers/net/intel/i40e/i40e_rxtx_vec_neon.c   |  6 ++--
 drivers/net/intel/i40e/i40e_rxtx_vec_sse.c    |  6 ++--
 drivers/net/intel/iavf/iavf_rxtx.c            | 16 ++++-----
 drivers/net/intel/iavf/iavf_rxtx.h            |  2 +-
 drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c   |  6 ++--
 drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c | 12 +++----
 drivers/net/intel/iavf/iavf_rxtx_vec_sse.c    |  6 ++--
 drivers/net/intel/ice/ice_dcf_ethdev.c        |  2 +-
 drivers/net/intel/ice/ice_rxtx.c              | 36 +++++++++----------
 drivers/net/intel/ice/ice_rxtx_vec_avx2.c     |  6 ++--
 drivers/net/intel/ice/ice_rxtx_vec_avx512.c   |  6 ++--
 drivers/net/intel/ice/ice_rxtx_vec_sse.c      |  6 ++--
 drivers/net/intel/idpf/idpf_common_rxtx.c     | 20 +++++------
 drivers/net/intel/idpf/idpf_common_rxtx.h     |  2 +-
 .../net/intel/idpf/idpf_common_rxtx_avx2.c    |  8 ++---
 .../net/intel/idpf/idpf_common_rxtx_avx512.c  |  8 ++---
 drivers/net/intel/idpf/idpf_rxtx.c            |  2 +-
 drivers/net/intel/idpf/idpf_rxtx_vec_common.h |  2 +-
 25 files changed, 113 insertions(+), 105 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index b259d98904..722f87a70c 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -10,6 +10,14 @@
 #include <rte_ethdev.h>
 #include <rte_vect.h>
 
+/*
+ * Structure of a 16-byte Tx descriptor common across i40e, ice, iavf and idpf drivers
+ */
+struct ci_tx_desc {
+	uint64_t buffer_addr; /* Address of descriptor's data buf */
+	uint64_t cmd_type_offset_bsz;
+};
+
 /* forward declaration of the common intel (ci) queue structure */
 struct ci_tx_queue;
 
@@ -33,10 +41,10 @@ typedef void (*ice_tx_release_mbufs_t)(struct ci_tx_queue *txq);
 
 struct ci_tx_queue {
 	union { /* TX ring virtual address */
-		volatile struct i40e_tx_desc *i40e_tx_ring;
-		volatile struct iavf_tx_desc *iavf_tx_ring;
-		volatile struct ice_tx_desc *ice_tx_ring;
-		volatile struct idpf_base_tx_desc *idpf_tx_ring;
+		volatile struct ci_tx_desc *i40e_tx_ring;
+		volatile struct ci_tx_desc *iavf_tx_ring;
+		volatile struct ci_tx_desc *ice_tx_ring;
+		volatile struct ci_tx_desc *idpf_tx_ring;
 		volatile union ixgbe_adv_tx_desc *ixgbe_tx_ring;
 	};
 	volatile uint8_t *qtx_tail;               /* register address of tail */
diff --git a/drivers/net/intel/cpfl/cpfl_rxtx.c b/drivers/net/intel/cpfl/cpfl_rxtx.c
index 2e4cf3b875..57c6f6e736 100644
--- a/drivers/net/intel/cpfl/cpfl_rxtx.c
+++ b/drivers/net/intel/cpfl/cpfl_rxtx.c
@@ -131,7 +131,7 @@ cpfl_dma_zone_reserve(struct rte_eth_dev *dev, uint16_t queue_idx,
 			ring_size = RTE_ALIGN(len * sizeof(struct idpf_flex_tx_sched_desc),
 					      CPFL_DMA_MEM_ALIGN);
 		else
-			ring_size = RTE_ALIGN(len * sizeof(struct idpf_base_tx_desc),
+			ring_size = RTE_ALIGN(len * sizeof(struct ci_tx_desc),
 					      CPFL_DMA_MEM_ALIGN);
 		memcpy(ring_name, "cpfl Tx ring", sizeof("cpfl Tx ring"));
 		break;
diff --git a/drivers/net/intel/i40e/i40e_fdir.c b/drivers/net/intel/i40e/i40e_fdir.c
index 55d18c5d4a..605df73c9e 100644
--- a/drivers/net/intel/i40e/i40e_fdir.c
+++ b/drivers/net/intel/i40e/i40e_fdir.c
@@ -1377,7 +1377,7 @@ i40e_find_available_buffer(struct rte_eth_dev *dev)
 	 */
 	if (fdir_info->txq_available_buf_count <= 0) {
 		uint16_t tmp_tail;
-		volatile struct i40e_tx_desc *tmp_txdp;
+		volatile struct ci_tx_desc *tmp_txdp;
 
 		tmp_tail = txq->tx_tail;
 		tmp_txdp = &txq->i40e_tx_ring[tmp_tail + 1];
@@ -1628,7 +1628,7 @@ i40e_flow_fdir_filter_programming(struct i40e_pf *pf,
 	struct ci_tx_queue *txq = pf->fdir.txq;
 	struct ci_rx_queue *rxq = pf->fdir.rxq;
 	const struct i40e_fdir_action *fdir_action = &filter->action;
-	volatile struct i40e_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	volatile struct i40e_filter_program_desc *fdirdp;
 	uint32_t td_cmd;
 	uint16_t vsi_id;
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 2db58c6b24..6307e9809f 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -384,7 +384,7 @@ static inline int
 i40e_xmit_cleanup(struct ci_tx_queue *txq)
 {
 	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	volatile struct i40e_tx_desc *txd = txq->i40e_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->i40e_tx_ring;
 	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
 	uint16_t nb_tx_desc = txq->nb_tx_desc;
 	uint16_t desc_to_clean_to;
@@ -1088,8 +1088,8 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	struct ci_tx_queue *txq;
 	struct ci_tx_entry *sw_ring;
 	struct ci_tx_entry *txe, *txn;
-	volatile struct i40e_tx_desc *txd;
-	volatile struct i40e_tx_desc *txr;
+	volatile struct ci_tx_desc *txd;
+	volatile struct ci_tx_desc *txr;
 	struct rte_mbuf *tx_pkt;
 	struct rte_mbuf *m_seg;
 	uint32_t cd_tunneling_params;
@@ -1389,7 +1389,7 @@ i40e_tx_free_bufs(struct ci_tx_queue *txq)
 
 /* Populate 4 descriptors with data from 4 mbufs */
 static inline void
-tx4(volatile struct i40e_tx_desc *txdp, struct rte_mbuf **pkts)
+tx4(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
 {
 	uint64_t dma_addr;
 	uint32_t i;
@@ -1405,7 +1405,7 @@ tx4(volatile struct i40e_tx_desc *txdp, struct rte_mbuf **pkts)
 
 /* Populate 1 descriptor with data from 1 mbuf */
 static inline void
-tx1(volatile struct i40e_tx_desc *txdp, struct rte_mbuf **pkts)
+tx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
 {
 	uint64_t dma_addr;
 
@@ -1422,7 +1422,7 @@ i40e_tx_fill_hw_ring(struct ci_tx_queue *txq,
 		     struct rte_mbuf **pkts,
 		     uint16_t nb_pkts)
 {
-	volatile struct i40e_tx_desc *txdp = &txq->i40e_tx_ring[txq->tx_tail];
+	volatile struct ci_tx_desc *txdp = &txq->i40e_tx_ring[txq->tx_tail];
 	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
 	const int N_PER_LOOP = 4;
 	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
@@ -1450,7 +1450,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
 	     uint16_t nb_pkts)
 {
-	volatile struct i40e_tx_desc *txr = txq->i40e_tx_ring;
+	volatile struct ci_tx_desc *txr = txq->i40e_tx_ring;
 	uint16_t n = 0;
 
 	/**
@@ -2604,7 +2604,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	}
 
 	/* Allocate TX hardware ring descriptors. */
-	ring_size = sizeof(struct i40e_tx_desc) * I40E_MAX_RING_DESC;
+	ring_size = sizeof(struct ci_tx_desc) * I40E_MAX_RING_DESC;
 	ring_size = RTE_ALIGN(ring_size, I40E_DMA_MEM_ALIGN);
 	tz = rte_eth_dma_zone_reserve(dev, "i40e_tx_ring", queue_idx,
 			      ring_size, I40E_RING_BASE_ALIGN, socket_id);
@@ -2626,7 +2626,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	txq->tx_deferred_start = tx_conf->tx_deferred_start;
 
 	txq->tx_ring_dma = tz->iova;
-	txq->i40e_tx_ring = (struct i40e_tx_desc *)tz->addr;
+	txq->i40e_tx_ring = (struct ci_tx_desc *)tz->addr;
 
 	/* Allocate software ring */
 	txq->sw_ring =
@@ -2899,13 +2899,13 @@ i40e_reset_tx_queue(struct ci_tx_queue *txq)
 	}
 
 	txe = txq->sw_ring;
-	size = sizeof(struct i40e_tx_desc) * txq->nb_tx_desc;
+	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
 		((volatile char *)txq->i40e_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		volatile struct i40e_tx_desc *txd = &txq->i40e_tx_ring[i];
+		volatile struct ci_tx_desc *txd = &txq->i40e_tx_ring[i];
 
 		txd->cmd_type_offset_bsz =
 			rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE);
@@ -3207,7 +3207,7 @@ i40e_fdir_setup_tx_resources(struct i40e_pf *pf)
 	}
 
 	/* Allocate TX hardware ring descriptors. */
-	ring_size = sizeof(struct i40e_tx_desc) * I40E_FDIR_NUM_TX_DESC;
+	ring_size = sizeof(struct ci_tx_desc) * I40E_FDIR_NUM_TX_DESC;
 	ring_size = RTE_ALIGN(ring_size, I40E_DMA_MEM_ALIGN);
 
 	tz = rte_eth_dma_zone_reserve(dev, "fdir_tx_ring",
@@ -3226,7 +3226,7 @@ i40e_fdir_setup_tx_resources(struct i40e_pf *pf)
 	txq->i40e_vsi = pf->fdir.fdir_vsi;
 
 	txq->tx_ring_dma = tz->iova;
-	txq->i40e_tx_ring = (struct i40e_tx_desc *)tz->addr;
+	txq->i40e_tx_ring = (struct ci_tx_desc *)tz->addr;
 
 	/*
 	 * don't need to allocate software ring and reset for the fdir
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
index bbb6d907cf..ef5b252898 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
@@ -446,7 +446,7 @@ i40e_recv_scattered_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 }
 
 static inline void
-vtx1(volatile struct i40e_tx_desc *txdp,
+vtx1(volatile struct ci_tx_desc *txdp,
 	struct rte_mbuf *pkt, uint64_t flags)
 {
 	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
@@ -459,7 +459,7 @@ vtx1(volatile struct i40e_tx_desc *txdp,
 }
 
 static inline void
-vtx(volatile struct i40e_tx_desc *txdp,
+vtx(volatile struct ci_tx_desc *txdp,
 	struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
 	int i;
@@ -473,7 +473,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 			  uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct i40e_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
index aeb2756e7a..b3ce08c039 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
@@ -681,7 +681,7 @@ i40e_recv_scattered_pkts_vec_avx2(void *rx_queue, struct rte_mbuf **rx_pkts,
 
 
 static inline void
-vtx1(volatile struct i40e_tx_desc *txdp,
+vtx1(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf *pkt, uint64_t flags)
 {
 	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
@@ -694,7 +694,7 @@ vtx1(volatile struct i40e_tx_desc *txdp,
 }
 
 static inline void
-vtx(volatile struct i40e_tx_desc *txdp,
+vtx(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
 	const uint64_t hi_qw_tmpl = (I40E_TX_DESC_DTYPE_DATA |
@@ -739,7 +739,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 			  uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct i40e_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
index 571987d27a..6971488750 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
@@ -750,7 +750,7 @@ i40e_recv_scattered_pkts_vec_avx512(void *rx_queue,
 }
 
 static inline void
-vtx1(volatile struct i40e_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags)
+vtx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags)
 {
 	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
 		((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT) |
@@ -762,7 +762,7 @@ vtx1(volatile struct i40e_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags)
 }
 
 static inline void
-vtx(volatile struct i40e_tx_desc *txdp,
+vtx(volatile struct ci_tx_desc *txdp,
 	struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
 	const uint64_t hi_qw_tmpl = (I40E_TX_DESC_DTYPE_DATA |
@@ -807,7 +807,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 				 uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct i40e_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
index b5be0c1b59..6404b70c56 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
@@ -597,7 +597,7 @@ i40e_recv_scattered_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 }
 
 static inline void
-vtx1(volatile struct i40e_tx_desc *txdp,
+vtx1(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf *pkt, uint64_t flags)
 {
 	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
@@ -609,7 +609,7 @@ vtx1(volatile struct i40e_tx_desc *txdp,
 }
 
 static inline void
-vtx(volatile struct i40e_tx_desc *txdp, struct rte_mbuf **pkt,
+vtx(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkt,
 		uint16_t nb_pkts,  uint64_t flags)
 {
 	int i;
@@ -623,7 +623,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict tx_queue,
 	struct rte_mbuf **__rte_restrict tx_pkts, uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct i40e_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_sse.c b/drivers/net/intel/i40e/i40e_rxtx_vec_sse.c
index c209135890..2a360c18ad 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_sse.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_sse.c
@@ -604,7 +604,7 @@ i40e_recv_scattered_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 }
 
 static inline void
-vtx1(volatile struct i40e_tx_desc *txdp,
+vtx1(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf *pkt, uint64_t flags)
 {
 	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
@@ -617,7 +617,7 @@ vtx1(volatile struct i40e_tx_desc *txdp,
 }
 
 static inline void
-vtx(volatile struct i40e_tx_desc *txdp,
+vtx(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
 	int i;
@@ -631,7 +631,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 			  uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct i40e_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index ee53e6e802..c5e469a1ae 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -277,7 +277,7 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	}
 
 	txe = txq->sw_ring;
-	size = sizeof(struct iavf_tx_desc) * txq->nb_tx_desc;
+	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
 		((volatile char *)txq->iavf_tx_ring)[i] = 0;
 
@@ -828,7 +828,7 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	}
 
 	/* Allocate TX hardware ring descriptors. */
-	ring_size = sizeof(struct iavf_tx_desc) * IAVF_MAX_RING_DESC;
+	ring_size = sizeof(struct ci_tx_desc) * IAVF_MAX_RING_DESC;
 	ring_size = RTE_ALIGN(ring_size, IAVF_DMA_MEM_ALIGN);
 	mz = rte_eth_dma_zone_reserve(dev, "iavf_tx_ring", queue_idx,
 				      ring_size, IAVF_RING_BASE_ALIGN,
@@ -840,7 +840,7 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
 		return -ENOMEM;
 	}
 	txq->tx_ring_dma = mz->iova;
-	txq->iavf_tx_ring = (struct iavf_tx_desc *)mz->addr;
+	txq->iavf_tx_ring = (struct ci_tx_desc *)mz->addr;
 
 	txq->mz = mz;
 	reset_tx_queue(txq);
@@ -2334,7 +2334,7 @@ iavf_xmit_cleanup(struct ci_tx_queue *txq)
 	uint16_t desc_to_clean_to;
 	uint16_t nb_tx_to_clean;
 
-	volatile struct iavf_tx_desc *txd = txq->iavf_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->iavf_tx_ring;
 
 	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
 	if (desc_to_clean_to >= nb_tx_desc)
@@ -2724,7 +2724,7 @@ iavf_calc_pkt_desc(struct rte_mbuf *tx_pkt)
 }
 
 static inline void
-iavf_fill_data_desc(volatile struct iavf_tx_desc *desc,
+iavf_fill_data_desc(volatile struct ci_tx_desc *desc,
 	uint64_t desc_template,	uint16_t buffsz,
 	uint64_t buffer_addr)
 {
@@ -2757,7 +2757,7 @@ uint16_t
 iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = tx_queue;
-	volatile struct iavf_tx_desc *txr = txq->iavf_tx_ring;
+	volatile struct ci_tx_desc *txr = txq->iavf_tx_ring;
 	struct ci_tx_entry *txe_ring = txq->sw_ring;
 	struct ci_tx_entry *txe, *txn;
 	struct rte_mbuf *mb, *mb_seg;
@@ -2775,7 +2775,7 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	txe = &txe_ring[desc_idx];
 
 	for (idx = 0; idx < nb_pkts; idx++) {
-		volatile struct iavf_tx_desc *ddesc;
+		volatile struct ci_tx_desc *ddesc;
 		struct iavf_ipsec_crypto_pkt_metadata *ipsec_md;
 
 		uint16_t nb_desc_ctx, nb_desc_ipsec;
@@ -2896,7 +2896,7 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		mb_seg = mb;
 
 		do {
-			ddesc = (volatile struct iavf_tx_desc *)
+			ddesc = (volatile struct ci_tx_desc *)
 					&txr[desc_idx];
 
 			txn = &txe_ring[txe->next_id];
diff --git a/drivers/net/intel/iavf/iavf_rxtx.h b/drivers/net/intel/iavf/iavf_rxtx.h
index bff456e509..14580c5b8b 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.h
+++ b/drivers/net/intel/iavf/iavf_rxtx.h
@@ -681,7 +681,7 @@ void iavf_dump_tx_descriptor(const struct ci_tx_queue *txq,
 			    const volatile void *desc, uint16_t tx_id)
 {
 	const char *name;
-	const volatile struct iavf_tx_desc *tx_desc = desc;
+	const volatile struct ci_tx_desc *tx_desc = desc;
 	enum iavf_tx_desc_dtype_value type;
 
 
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
index e417257086..c3d7083230 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
@@ -1630,7 +1630,7 @@ iavf_recv_scattered_pkts_vec_avx2_flex_rxd_offload(void *rx_queue,
 
 
 static __rte_always_inline void
-iavf_vtx1(volatile struct iavf_tx_desc *txdp,
+iavf_vtx1(volatile struct ci_tx_desc *txdp,
 	  struct rte_mbuf *pkt, uint64_t flags, bool offload, uint8_t vlan_flag)
 {
 	uint64_t high_qw =
@@ -1646,7 +1646,7 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp,
 }
 
 static __rte_always_inline void
-iavf_vtx(volatile struct iavf_tx_desc *txdp,
+iavf_vtx(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags, bool offload, uint8_t vlan_flag)
 {
 	const uint64_t hi_qw_tmpl = (IAVF_TX_DESC_DTYPE_DATA |
@@ -1713,7 +1713,7 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 			       uint16_t nb_pkts, bool offload)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct iavf_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	/* bit2 is reserved and must be set to 1 according to Spec */
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c b/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
index 7c0907b7cf..d79d96c7b7 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
@@ -1840,7 +1840,7 @@ tx_backlog_entry_avx512(struct ci_tx_entry_vec *txep,
 }
 
 static __rte_always_inline void
-iavf_vtx1(volatile struct iavf_tx_desc *txdp,
+iavf_vtx1(volatile struct ci_tx_desc *txdp,
 	  struct rte_mbuf *pkt, uint64_t flags,
 	  bool offload, uint8_t vlan_flag)
 {
@@ -1859,7 +1859,7 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp,
 #define IAVF_TX_LEN_MASK 0xAA
 #define IAVF_TX_OFF_MASK 0x55
 static __rte_always_inline void
-iavf_vtx(volatile struct iavf_tx_desc *txdp,
+iavf_vtx(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags,
 		bool offload, uint8_t vlan_flag)
 {
@@ -2068,7 +2068,7 @@ iavf_fill_ctx_desc_tunnelling_field(volatile uint64_t *qw0,
 }
 
 static __rte_always_inline void
-ctx_vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt,
+ctx_vtx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf *pkt,
 		uint64_t flags, bool offload, uint8_t vlan_flag)
 {
 	uint64_t high_ctx_qw = IAVF_TX_DESC_DTYPE_CONTEXT;
@@ -2106,7 +2106,7 @@ ctx_vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt,
 }
 
 static __rte_always_inline void
-ctx_vtx(volatile struct iavf_tx_desc *txdp,
+ctx_vtx(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags,
 		bool offload, uint8_t vlan_flag)
 {
@@ -2203,7 +2203,7 @@ iavf_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 				 uint16_t nb_pkts, bool offload)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct iavf_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	/* bit2 is reserved and must be set to 1 according to Spec */
@@ -2271,7 +2271,7 @@ iavf_xmit_fixed_burst_vec_avx512_ctx(void *tx_queue, struct rte_mbuf **tx_pkts,
 				 uint16_t nb_pkts, bool offload)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct iavf_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, nb_mbuf, tx_id;
 	/* bit2 is reserved and must be set to 1 according to Spec */
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_sse.c b/drivers/net/intel/iavf/iavf_rxtx_vec_sse.c
index 9dae0a79bf..cb086cd352 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_sse.c
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_sse.c
@@ -1242,7 +1242,7 @@ iavf_recv_scattered_pkts_vec_flex_rxd(void *rx_queue,
 }
 
 static inline void
-vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags)
+vtx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags)
 {
 	uint64_t high_qw =
 			(IAVF_TX_DESC_DTYPE_DATA |
@@ -1256,7 +1256,7 @@ vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags)
 }
 
 static inline void
-iavf_vtx(volatile struct iavf_tx_desc *txdp, struct rte_mbuf **pkt,
+iavf_vtx(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkt,
 	uint16_t nb_pkts,  uint64_t flags)
 {
 	int i;
@@ -1270,7 +1270,7 @@ iavf_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 			 uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct iavf_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = IAVF_TX_DESC_CMD_EOP | 0x04;  /* bit 2 must be set */
diff --git a/drivers/net/intel/ice/ice_dcf_ethdev.c b/drivers/net/intel/ice/ice_dcf_ethdev.c
index 81da5a4656..ab1d499cef 100644
--- a/drivers/net/intel/ice/ice_dcf_ethdev.c
+++ b/drivers/net/intel/ice/ice_dcf_ethdev.c
@@ -399,7 +399,7 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	}
 
 	txe = txq->sw_ring;
-	size = sizeof(struct ice_tx_desc) * txq->nb_tx_desc;
+	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
 		((volatile char *)txq->ice_tx_ring)[i] = 0;
 
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index f5d484c1e6..7358a95ce1 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -1111,13 +1111,13 @@ ice_reset_tx_queue(struct ci_tx_queue *txq)
 	}
 
 	txe = txq->sw_ring;
-	size = sizeof(struct ice_tx_desc) * txq->nb_tx_desc;
+	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
 		((volatile char *)txq->ice_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		volatile struct ice_tx_desc *txd = &txq->ice_tx_ring[i];
+		volatile struct ci_tx_desc *txd = &txq->ice_tx_ring[i];
 
 		txd->cmd_type_offset_bsz =
 			rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE);
@@ -1609,7 +1609,7 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
 	}
 
 	/* Allocate TX hardware ring descriptors. */
-	ring_size = sizeof(struct ice_tx_desc) * ICE_MAX_NUM_DESC_BY_MAC(hw);
+	ring_size = sizeof(struct ci_tx_desc) * ICE_MAX_NUM_DESC_BY_MAC(hw);
 	ring_size = RTE_ALIGN(ring_size, ICE_DMA_MEM_ALIGN);
 	tz = rte_eth_dma_zone_reserve(dev, "ice_tx_ring", queue_idx,
 				      ring_size, ICE_RING_BASE_ALIGN,
@@ -2611,7 +2611,7 @@ ice_fdir_setup_tx_resources(struct ice_pf *pf)
 	}
 
 	/* Allocate TX hardware ring descriptors. */
-	ring_size = sizeof(struct ice_tx_desc) * ICE_FDIR_NUM_TX_DESC;
+	ring_size = sizeof(struct ci_tx_desc) * ICE_FDIR_NUM_TX_DESC;
 	ring_size = RTE_ALIGN(ring_size, ICE_DMA_MEM_ALIGN);
 
 	tz = rte_eth_dma_zone_reserve(dev, "fdir_tx_ring",
@@ -2630,7 +2630,7 @@ ice_fdir_setup_tx_resources(struct ice_pf *pf)
 	txq->ice_vsi = pf->fdir.fdir_vsi;
 
 	txq->tx_ring_dma = tz->iova;
-	txq->ice_tx_ring = (struct ice_tx_desc *)tz->addr;
+	txq->ice_tx_ring = (struct ci_tx_desc *)tz->addr;
 	/*
 	 * don't need to allocate software ring and reset for the fdir
 	 * program queue just set the queue has been configured.
@@ -3019,7 +3019,7 @@ static inline int
 ice_xmit_cleanup(struct ci_tx_queue *txq)
 {
 	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	volatile struct ice_tx_desc *txd = txq->ice_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->ice_tx_ring;
 	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
 	uint16_t nb_tx_desc = txq->nb_tx_desc;
 	uint16_t desc_to_clean_to;
@@ -3140,8 +3140,8 @@ uint16_t
 ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq;
-	volatile struct ice_tx_desc *ice_tx_ring;
-	volatile struct ice_tx_desc *txd;
+	volatile struct ci_tx_desc *ice_tx_ring;
+	volatile struct ci_tx_desc *txd;
 	struct ci_tx_entry *sw_ring;
 	struct ci_tx_entry *txe, *txn;
 	struct rte_mbuf *tx_pkt;
@@ -3304,7 +3304,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 			while ((ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) &&
 				unlikely(slen > ICE_MAX_DATA_PER_TXD)) {
-				txd->buf_addr = rte_cpu_to_le_64(buf_dma_addr);
+				txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
 				txd->cmd_type_offset_bsz =
 				rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DATA |
 				((uint64_t)td_cmd << ICE_TXD_QW1_CMD_S) |
@@ -3323,7 +3323,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 				txn = &sw_ring[txe->next_id];
 			}
 
-			txd->buf_addr = rte_cpu_to_le_64(buf_dma_addr);
+			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
 			txd->cmd_type_offset_bsz =
 				rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DATA |
 				((uint64_t)td_cmd << ICE_TXD_QW1_CMD_S) |
@@ -3544,14 +3544,14 @@ ice_tx_done_cleanup(void *txq, uint32_t free_cnt)
 
 /* Populate 4 descriptors with data from 4 mbufs */
 static inline void
-tx4(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkts)
+tx4(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
 {
 	uint64_t dma_addr;
 	uint32_t i;
 
 	for (i = 0; i < 4; i++, txdp++, pkts++) {
 		dma_addr = rte_mbuf_data_iova(*pkts);
-		txdp->buf_addr = rte_cpu_to_le_64(dma_addr);
+		txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
 		txdp->cmd_type_offset_bsz =
 			ice_build_ctob((uint32_t)ICE_TD_CMD, 0,
 				       (*pkts)->data_len, 0);
@@ -3560,12 +3560,12 @@ tx4(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkts)
 
 /* Populate 1 descriptor with data from 1 mbuf */
 static inline void
-tx1(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkts)
+tx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
 {
 	uint64_t dma_addr;
 
 	dma_addr = rte_mbuf_data_iova(*pkts);
-	txdp->buf_addr = rte_cpu_to_le_64(dma_addr);
+	txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
 	txdp->cmd_type_offset_bsz =
 		ice_build_ctob((uint32_t)ICE_TD_CMD, 0,
 			       (*pkts)->data_len, 0);
@@ -3575,7 +3575,7 @@ static inline void
 ice_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
 		    uint16_t nb_pkts)
 {
-	volatile struct ice_tx_desc *txdp = &txq->ice_tx_ring[txq->tx_tail];
+	volatile struct ci_tx_desc *txdp = &txq->ice_tx_ring[txq->tx_tail];
 	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
 	const int N_PER_LOOP = 4;
 	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
@@ -3608,7 +3608,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
 	     uint16_t nb_pkts)
 {
-	volatile struct ice_tx_desc *txr = txq->ice_tx_ring;
+	volatile struct ci_tx_desc *txr = txq->ice_tx_ring;
 	uint16_t n = 0;
 
 	/**
@@ -4891,7 +4891,7 @@ ice_fdir_programming(struct ice_pf *pf, struct ice_fltr_desc *fdir_desc)
 	struct ci_tx_queue *txq = pf->fdir.txq;
 	struct ci_rx_queue *rxq = pf->fdir.rxq;
 	volatile struct ice_fltr_desc *fdirdp;
-	volatile struct ice_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	uint32_t td_cmd;
 	uint16_t i;
 
@@ -4901,7 +4901,7 @@ ice_fdir_programming(struct ice_pf *pf, struct ice_fltr_desc *fdir_desc)
 	fdirdp->dtype_cmd_vsi_fdid = fdir_desc->dtype_cmd_vsi_fdid;
 
 	txdp = &txq->ice_tx_ring[txq->tx_tail + 1];
-	txdp->buf_addr = rte_cpu_to_le_64(pf->fdir.dma_addr);
+	txdp->buffer_addr = rte_cpu_to_le_64(pf->fdir.dma_addr);
 	td_cmd = ICE_TX_DESC_CMD_EOP |
 		ICE_TX_DESC_CMD_RS  |
 		ICE_TX_DESC_CMD_DUMMY;
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
index b952b8dddc..95c4f4569c 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
@@ -774,7 +774,7 @@ ice_recv_scattered_pkts_vec_avx2_offload(void *rx_queue,
 }
 
 static __rte_always_inline void
-ice_vtx1(volatile struct ice_tx_desc *txdp,
+ice_vtx1(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf *pkt, uint64_t flags, bool offload)
 {
 	uint64_t high_qw =
@@ -789,7 +789,7 @@ ice_vtx1(volatile struct ice_tx_desc *txdp,
 }
 
 static __rte_always_inline void
-ice_vtx(volatile struct ice_tx_desc *txdp,
+ice_vtx(volatile struct ci_tx_desc *txdp,
 	struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags, bool offload)
 {
 	const uint64_t hi_qw_tmpl = (ICE_TX_DESC_DTYPE_DATA |
@@ -852,7 +852,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 			      uint16_t nb_pkts, bool offload)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct ice_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = ICE_TD_CMD;
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
index 7c6fe82072..1f6bf5fc8e 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
@@ -847,7 +847,7 @@ ice_recv_scattered_pkts_vec_avx512_offload(void *rx_queue,
 }
 
 static __rte_always_inline void
-ice_vtx1(volatile struct ice_tx_desc *txdp,
+ice_vtx1(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf *pkt, uint64_t flags, bool do_offload)
 {
 	uint64_t high_qw =
@@ -863,7 +863,7 @@ ice_vtx1(volatile struct ice_tx_desc *txdp,
 }
 
 static __rte_always_inline void
-ice_vtx(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkt,
+ice_vtx(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkt,
 	uint16_t nb_pkts,  uint64_t flags, bool do_offload)
 {
 	const uint64_t hi_qw_tmpl = (ICE_TX_DESC_DTYPE_DATA |
@@ -916,7 +916,7 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 				uint16_t nb_pkts, bool do_offload)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct ice_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = ICE_TD_CMD;
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_sse.c b/drivers/net/intel/ice/ice_rxtx_vec_sse.c
index 4fc1b7e881..44f3fc0fa5 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_sse.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_sse.c
@@ -596,7 +596,7 @@ ice_recv_scattered_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 }
 
 static inline void
-ice_vtx1(volatile struct ice_tx_desc *txdp, struct rte_mbuf *pkt,
+ice_vtx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf *pkt,
 	 uint64_t flags)
 {
 	uint64_t high_qw =
@@ -609,7 +609,7 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, struct rte_mbuf *pkt,
 }
 
 static inline void
-ice_vtx(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkt,
+ice_vtx(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkt,
 	uint16_t nb_pkts, uint64_t flags)
 {
 	int i;
@@ -623,7 +623,7 @@ ice_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 			 uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct ice_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = ICE_TD_CMD;
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 797ee515dd..be3c1ef216 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -264,13 +264,13 @@ idpf_qc_single_tx_queue_reset(struct ci_tx_queue *txq)
 	}
 
 	txe = txq->sw_ring;
-	size = sizeof(struct idpf_base_tx_desc) * txq->nb_tx_desc;
+	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
 		((volatile char *)txq->idpf_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		txq->idpf_tx_ring[i].qw1 =
+		txq->idpf_tx_ring[i].cmd_type_offset_bsz =
 			rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
@@ -1335,14 +1335,14 @@ idpf_xmit_cleanup(struct ci_tx_queue *txq)
 	uint16_t desc_to_clean_to;
 	uint16_t nb_tx_to_clean;
 
-	volatile struct idpf_base_tx_desc *txd = txq->idpf_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->idpf_tx_ring;
 
 	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
 	if (desc_to_clean_to >= nb_tx_desc)
 		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
 
 	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-	if ((txd[desc_to_clean_to].qw1 &
+	if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
 	     rte_cpu_to_le_64(IDPF_TXD_QW1_DTYPE_M)) !=
 	    rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE)) {
 		TX_LOG(DEBUG, "TX descriptor %4u is not done "
@@ -1358,7 +1358,7 @@ idpf_xmit_cleanup(struct ci_tx_queue *txq)
 		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
 					    last_desc_cleaned);
 
-	txd[desc_to_clean_to].qw1 = 0;
+	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
 
 	txq->last_desc_cleaned = desc_to_clean_to;
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
@@ -1372,8 +1372,8 @@ uint16_t
 idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			  uint16_t nb_pkts)
 {
-	volatile struct idpf_base_tx_desc *txd;
-	volatile struct idpf_base_tx_desc *txr;
+	volatile struct ci_tx_desc *txd;
+	volatile struct ci_tx_desc *txr;
 	union idpf_tx_offload tx_offload = {0};
 	struct ci_tx_entry *txe, *txn;
 	struct ci_tx_entry *sw_ring;
@@ -1491,8 +1491,8 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			/* Setup TX Descriptor */
 			slen = m_seg->data_len;
 			buf_dma_addr = rte_mbuf_data_iova(m_seg);
-			txd->buf_addr = rte_cpu_to_le_64(buf_dma_addr);
-			txd->qw1 = rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DATA |
+			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
+			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DATA |
 				((uint64_t)td_cmd  << IDPF_TXD_QW1_CMD_S) |
 				((uint64_t)td_offset << IDPF_TXD_QW1_OFFSET_S) |
 				((uint64_t)slen << IDPF_TXD_QW1_TX_BUF_SZ_S));
@@ -1519,7 +1519,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			txq->nb_tx_used = 0;
 		}
 
-		txd->qw1 |= rte_cpu_to_le_16(td_cmd << IDPF_TXD_QW1_CMD_S);
+		txd->cmd_type_offset_bsz |= rte_cpu_to_le_16(td_cmd << IDPF_TXD_QW1_CMD_S);
 	}
 
 end_of_tx:
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.h b/drivers/net/intel/idpf/idpf_common_rxtx.h
index 7c6ff5d047..2f2fa153b2 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.h
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.h
@@ -182,7 +182,7 @@ union idpf_tx_offload {
 };
 
 union idpf_tx_desc {
-	struct idpf_base_tx_desc *tx_ring;
+	struct ci_tx_desc *tx_ring;
 	struct idpf_flex_tx_sched_desc *desc_ring;
 	struct idpf_splitq_tx_compl_desc *compl_ring;
 };
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c b/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
index 21c8f79254..5f5d538dcb 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
@@ -483,7 +483,7 @@ idpf_dp_singleq_recv_pkts_avx2(void *rx_queue, struct rte_mbuf **rx_pkts, uint16
 }
 
 static inline void
-idpf_singleq_vtx1(volatile struct idpf_base_tx_desc *txdp,
+idpf_singleq_vtx1(volatile struct ci_tx_desc *txdp,
 		  struct rte_mbuf *pkt, uint64_t flags)
 {
 	uint64_t high_qw =
@@ -497,7 +497,7 @@ idpf_singleq_vtx1(volatile struct idpf_base_tx_desc *txdp,
 }
 
 static inline void
-idpf_singleq_vtx(volatile struct idpf_base_tx_desc *txdp,
+idpf_singleq_vtx(volatile struct ci_tx_desc *txdp,
 		 struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
 	const uint64_t hi_qw_tmpl = (IDPF_TX_DESC_DTYPE_DATA |
@@ -556,7 +556,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts
 				       uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct idpf_base_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = IDPF_TX_DESC_CMD_EOP;
@@ -604,7 +604,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->idpf_tx_ring[txq->tx_next_rs].qw1 |=
+		txq->idpf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)IDPF_TX_DESC_CMD_RS) <<
 					 IDPF_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
index bc2cadd738..c1ec3d1222 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
@@ -1000,7 +1000,7 @@ idpf_dp_splitq_recv_pkts_avx512(void *rx_queue, struct rte_mbuf **rx_pkts,
 }
 
 static __rte_always_inline void
-idpf_singleq_vtx1(volatile struct idpf_base_tx_desc *txdp,
+idpf_singleq_vtx1(volatile struct ci_tx_desc *txdp,
 	  struct rte_mbuf *pkt, uint64_t flags)
 {
 	uint64_t high_qw =
@@ -1016,7 +1016,7 @@ idpf_singleq_vtx1(volatile struct idpf_base_tx_desc *txdp,
 #define IDPF_TX_LEN_MASK 0xAA
 #define IDPF_TX_OFF_MASK 0x55
 static __rte_always_inline void
-idpf_singleq_vtx(volatile struct idpf_base_tx_desc *txdp,
+idpf_singleq_vtx(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
 	const uint64_t hi_qw_tmpl = (IDPF_TX_DESC_DTYPE_DATA  |
@@ -1072,7 +1072,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pk
 					 uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = tx_queue;
-	volatile struct idpf_base_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = IDPF_TX_DESC_CMD_EOP;
@@ -1123,7 +1123,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pk
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->idpf_tx_ring[txq->tx_next_rs].qw1 |=
+		txq->idpf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)IDPF_TX_DESC_CMD_RS) <<
 					 IDPF_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/idpf/idpf_rxtx.c b/drivers/net/intel/idpf/idpf_rxtx.c
index 47f8347b41..9b63e44341 100644
--- a/drivers/net/intel/idpf/idpf_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_rxtx.c
@@ -72,7 +72,7 @@ idpf_dma_zone_reserve(struct rte_eth_dev *dev, uint16_t queue_idx,
 			ring_size = RTE_ALIGN(len * sizeof(struct idpf_flex_tx_sched_desc),
 					      IDPF_DMA_MEM_ALIGN);
 		else
-			ring_size = RTE_ALIGN(len * sizeof(struct idpf_base_tx_desc),
+			ring_size = RTE_ALIGN(len * sizeof(struct ci_tx_desc),
 					      IDPF_DMA_MEM_ALIGN);
 		rte_memcpy(ring_name, "idpf Tx ring", sizeof("idpf Tx ring"));
 		break;
diff --git a/drivers/net/intel/idpf/idpf_rxtx_vec_common.h b/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
index 425f0792a1..4702061484 100644
--- a/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
+++ b/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
@@ -31,7 +31,7 @@ idpf_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 	if (txq->complq != NULL)
 		return 1;
 
-	return (txq->idpf_tx_ring[idx].qw1 &
+	return (txq->idpf_tx_ring[idx].cmd_type_offset_bsz &
 			rte_cpu_to_le_64(IDPF_TXD_QW1_DTYPE_M)) ==
 				rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE);
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [RFC PATCH 02/27] net/intel: use common tx ring structure
  2025-12-19 17:25 [RFC PATCH 00/27] combine multiple Intel scalar Tx paths Bruce Richardson
  2025-12-19 17:25 ` [RFC PATCH 01/27] net/intel: create common Tx descriptor structure Bruce Richardson
@ 2025-12-19 17:25 ` Bruce Richardson
  2025-12-19 17:25 ` [RFC PATCH 03/27] net/intel: create common post-Tx cleanup function Bruce Richardson
                   ` (28 subsequent siblings)
  30 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2025-12-19 17:25 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Praveen Shetty, Vladimir Medvedkin,
	Anatoly Burakov, Jingjing Wu

Rather than having separate per-driver ring pointers in a union, since
we now have a common descriptor type, we can merge all but the ixgbe
pointer into one value.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h                 |  5 +--
 drivers/net/intel/cpfl/cpfl_rxtx.c            |  2 +-
 drivers/net/intel/i40e/i40e_fdir.c            |  6 ++--
 drivers/net/intel/i40e/i40e_rxtx.c            | 22 ++++++------
 .../net/intel/i40e/i40e_rxtx_vec_altivec.c    |  6 ++--
 drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c   |  6 ++--
 drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c |  6 ++--
 drivers/net/intel/i40e/i40e_rxtx_vec_common.h |  2 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_neon.c   |  6 ++--
 drivers/net/intel/i40e/i40e_rxtx_vec_sse.c    |  6 ++--
 drivers/net/intel/iavf/iavf_rxtx.c            | 14 ++++----
 drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c   |  6 ++--
 drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c | 12 +++----
 drivers/net/intel/iavf/iavf_rxtx_vec_common.h |  2 +-
 drivers/net/intel/iavf/iavf_rxtx_vec_sse.c    |  6 ++--
 drivers/net/intel/ice/ice_dcf_ethdev.c        |  4 +--
 drivers/net/intel/ice/ice_rxtx.c              | 34 +++++++++----------
 drivers/net/intel/ice/ice_rxtx_vec_avx2.c     |  6 ++--
 drivers/net/intel/ice/ice_rxtx_vec_avx512.c   |  6 ++--
 drivers/net/intel/ice/ice_rxtx_vec_common.h   |  2 +-
 drivers/net/intel/ice/ice_rxtx_vec_sse.c      |  6 ++--
 drivers/net/intel/idpf/idpf_common_rxtx.c     |  8 ++---
 .../net/intel/idpf/idpf_common_rxtx_avx2.c    |  6 ++--
 .../net/intel/idpf/idpf_common_rxtx_avx512.c  |  6 ++--
 drivers/net/intel/idpf/idpf_rxtx.c            |  2 +-
 drivers/net/intel/idpf/idpf_rxtx_vec_common.h |  2 +-
 26 files changed, 93 insertions(+), 96 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 722f87a70c..a9ff3bebd5 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -41,10 +41,7 @@ typedef void (*ice_tx_release_mbufs_t)(struct ci_tx_queue *txq);
 
 struct ci_tx_queue {
 	union { /* TX ring virtual address */
-		volatile struct ci_tx_desc *i40e_tx_ring;
-		volatile struct ci_tx_desc *iavf_tx_ring;
-		volatile struct ci_tx_desc *ice_tx_ring;
-		volatile struct ci_tx_desc *idpf_tx_ring;
+		volatile struct ci_tx_desc *ci_tx_ring;
 		volatile union ixgbe_adv_tx_desc *ixgbe_tx_ring;
 	};
 	volatile uint8_t *qtx_tail;               /* register address of tail */
diff --git a/drivers/net/intel/cpfl/cpfl_rxtx.c b/drivers/net/intel/cpfl/cpfl_rxtx.c
index 57c6f6e736..a3127e7c97 100644
--- a/drivers/net/intel/cpfl/cpfl_rxtx.c
+++ b/drivers/net/intel/cpfl/cpfl_rxtx.c
@@ -594,7 +594,7 @@ cpfl_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 	}
 
 	if (!is_splitq) {
-		txq->idpf_tx_ring = mz->addr;
+		txq->ci_tx_ring = mz->addr;
 		idpf_qc_single_tx_queue_reset(txq);
 	} else {
 		txq->desc_ring = mz->addr;
diff --git a/drivers/net/intel/i40e/i40e_fdir.c b/drivers/net/intel/i40e/i40e_fdir.c
index 605df73c9e..8a01aec0e2 100644
--- a/drivers/net/intel/i40e/i40e_fdir.c
+++ b/drivers/net/intel/i40e/i40e_fdir.c
@@ -1380,7 +1380,7 @@ i40e_find_available_buffer(struct rte_eth_dev *dev)
 		volatile struct ci_tx_desc *tmp_txdp;
 
 		tmp_tail = txq->tx_tail;
-		tmp_txdp = &txq->i40e_tx_ring[tmp_tail + 1];
+		tmp_txdp = &txq->ci_tx_ring[tmp_tail + 1];
 
 		do {
 			if ((tmp_txdp->cmd_type_offset_bsz &
@@ -1637,7 +1637,7 @@ i40e_flow_fdir_filter_programming(struct i40e_pf *pf,
 
 	PMD_DRV_LOG(INFO, "filling filter programming descriptor.");
 	fdirdp = (volatile struct i40e_filter_program_desc *)
-				(&txq->i40e_tx_ring[txq->tx_tail]);
+				(&txq->ci_tx_ring[txq->tx_tail]);
 
 	fdirdp->qindex_flex_ptype_vsi =
 			rte_cpu_to_le_32((fdir_action->rx_queue <<
@@ -1707,7 +1707,7 @@ i40e_flow_fdir_filter_programming(struct i40e_pf *pf,
 	fdirdp->fd_id = rte_cpu_to_le_32(filter->soft_id);
 
 	PMD_DRV_LOG(INFO, "filling transmit descriptor.");
-	txdp = &txq->i40e_tx_ring[txq->tx_tail + 1];
+	txdp = &txq->ci_tx_ring[txq->tx_tail + 1];
 	txdp->buffer_addr = rte_cpu_to_le_64(pf->fdir.dma_addr[txq->tx_tail >> 1]);
 
 	td_cmd = I40E_TX_DESC_CMD_EOP |
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 6307e9809f..2af3098f81 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -384,7 +384,7 @@ static inline int
 i40e_xmit_cleanup(struct ci_tx_queue *txq)
 {
 	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	volatile struct ci_tx_desc *txd = txq->i40e_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
 	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
 	uint16_t nb_tx_desc = txq->nb_tx_desc;
 	uint16_t desc_to_clean_to;
@@ -1108,7 +1108,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 	txq = tx_queue;
 	sw_ring = txq->sw_ring;
-	txr = txq->i40e_tx_ring;
+	txr = txq->ci_tx_ring;
 	tx_id = txq->tx_tail;
 	txe = &sw_ring[tx_id];
 
@@ -1343,7 +1343,7 @@ i40e_tx_free_bufs(struct ci_tx_queue *txq)
 	const uint16_t k = RTE_ALIGN_FLOOR(tx_rs_thresh, I40E_TX_MAX_FREE_BUF_SZ);
 	const uint16_t m = tx_rs_thresh % I40E_TX_MAX_FREE_BUF_SZ;
 
-	if ((txq->i40e_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
+	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
 			rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) !=
 			rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE))
 		return 0;
@@ -1422,7 +1422,7 @@ i40e_tx_fill_hw_ring(struct ci_tx_queue *txq,
 		     struct rte_mbuf **pkts,
 		     uint16_t nb_pkts)
 {
-	volatile struct ci_tx_desc *txdp = &txq->i40e_tx_ring[txq->tx_tail];
+	volatile struct ci_tx_desc *txdp = &txq->ci_tx_ring[txq->tx_tail];
 	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
 	const int N_PER_LOOP = 4;
 	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
@@ -1450,7 +1450,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
 	     uint16_t nb_pkts)
 {
-	volatile struct ci_tx_desc *txr = txq->i40e_tx_ring;
+	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
 	uint16_t n = 0;
 
 	/**
@@ -2409,7 +2409,7 @@ i40e_dev_tx_descriptor_status(void *tx_queue, uint16_t offset)
 			desc -= txq->nb_tx_desc;
 	}
 
-	status = &txq->i40e_tx_ring[desc].cmd_type_offset_bsz;
+	status = &txq->ci_tx_ring[desc].cmd_type_offset_bsz;
 	mask = rte_le_to_cpu_64(I40E_TXD_QW1_DTYPE_MASK);
 	expect = rte_cpu_to_le_64(
 		I40E_TX_DESC_DTYPE_DESC_DONE << I40E_TXD_QW1_DTYPE_SHIFT);
@@ -2606,7 +2606,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	/* Allocate TX hardware ring descriptors. */
 	ring_size = sizeof(struct ci_tx_desc) * I40E_MAX_RING_DESC;
 	ring_size = RTE_ALIGN(ring_size, I40E_DMA_MEM_ALIGN);
-	tz = rte_eth_dma_zone_reserve(dev, "i40e_tx_ring", queue_idx,
+	tz = rte_eth_dma_zone_reserve(dev, "ci_tx_ring", queue_idx,
 			      ring_size, I40E_RING_BASE_ALIGN, socket_id);
 	if (!tz) {
 		i40e_tx_queue_release(txq);
@@ -2626,7 +2626,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	txq->tx_deferred_start = tx_conf->tx_deferred_start;
 
 	txq->tx_ring_dma = tz->iova;
-	txq->i40e_tx_ring = (struct ci_tx_desc *)tz->addr;
+	txq->ci_tx_ring = (struct ci_tx_desc *)tz->addr;
 
 	/* Allocate software ring */
 	txq->sw_ring =
@@ -2901,11 +2901,11 @@ i40e_reset_tx_queue(struct ci_tx_queue *txq)
 	txe = txq->sw_ring;
 	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
-		((volatile char *)txq->i40e_tx_ring)[i] = 0;
+		((volatile char *)txq->ci_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		volatile struct ci_tx_desc *txd = &txq->i40e_tx_ring[i];
+		volatile struct ci_tx_desc *txd = &txq->ci_tx_ring[i];
 
 		txd->cmd_type_offset_bsz =
 			rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE);
@@ -3226,7 +3226,7 @@ i40e_fdir_setup_tx_resources(struct i40e_pf *pf)
 	txq->i40e_vsi = pf->fdir.fdir_vsi;
 
 	txq->tx_ring_dma = tz->iova;
-	txq->i40e_tx_ring = (struct ci_tx_desc *)tz->addr;
+	txq->ci_tx_ring = (struct ci_tx_desc *)tz->addr;
 
 	/*
 	 * don't need to allocate software ring and reset for the fdir
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
index ef5b252898..81e9e2bc0b 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
@@ -489,7 +489,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->i40e_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = &txq->sw_ring_vec[tx_id];
 
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
@@ -509,7 +509,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->i40e_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = &txq->sw_ring_vec[tx_id];
 	}
 
@@ -519,7 +519,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->i40e_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
 						I40E_TXD_QW1_CMD_SHIFT);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
index b3ce08c039..b25b05d79d 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
@@ -753,7 +753,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->i40e_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = &txq->sw_ring_vec[tx_id];
 
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
@@ -774,7 +774,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->i40e_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = &txq->sw_ring_vec[tx_id];
 	}
 
@@ -784,7 +784,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->i40e_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
 						I40E_TXD_QW1_CMD_SHIFT);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
index 6971488750..9a967faeee 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
@@ -821,7 +821,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->i40e_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = (void *)txq->sw_ring;
 	txep += tx_id;
 
@@ -843,7 +843,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = txq->i40e_tx_ring;
+		txdp = txq->ci_tx_ring;
 		txep = (void *)txq->sw_ring;
 	}
 
@@ -853,7 +853,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->i40e_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
 						I40E_TXD_QW1_CMD_SHIFT);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_common.h b/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
index 14651f2f06..1fd7fc75bf 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
@@ -15,7 +15,7 @@
 static inline int
 i40e_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 {
-	return (txq->i40e_tx_ring[idx].cmd_type_offset_bsz &
+	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
 			rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) ==
 				rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE);
 }
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
index 6404b70c56..0b95152232 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
@@ -638,7 +638,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict tx_queue,
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->i40e_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = &txq->sw_ring_vec[tx_id];
 
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
@@ -658,7 +658,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict tx_queue,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->i40e_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = &txq->sw_ring_vec[tx_id];
 	}
 
@@ -668,7 +668,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict tx_queue,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->i40e_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
 						I40E_TXD_QW1_CMD_SHIFT);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_sse.c b/drivers/net/intel/i40e/i40e_rxtx_vec_sse.c
index 2a360c18ad..2a3baa415e 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_sse.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_sse.c
@@ -646,7 +646,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->i40e_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = &txq->sw_ring_vec[tx_id];
 
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
@@ -666,7 +666,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->i40e_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = &txq->sw_ring_vec[tx_id];
 	}
 
@@ -676,7 +676,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->i40e_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
 						I40E_TXD_QW1_CMD_SHIFT);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index c5e469a1ae..2ed778a872 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -279,11 +279,11 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	txe = txq->sw_ring;
 	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
-		((volatile char *)txq->iavf_tx_ring)[i] = 0;
+		((volatile char *)txq->ci_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		txq->iavf_tx_ring[i].cmd_type_offset_bsz =
+		txq->ci_tx_ring[i].cmd_type_offset_bsz =
 			rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
@@ -830,7 +830,7 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	/* Allocate TX hardware ring descriptors. */
 	ring_size = sizeof(struct ci_tx_desc) * IAVF_MAX_RING_DESC;
 	ring_size = RTE_ALIGN(ring_size, IAVF_DMA_MEM_ALIGN);
-	mz = rte_eth_dma_zone_reserve(dev, "iavf_tx_ring", queue_idx,
+	mz = rte_eth_dma_zone_reserve(dev, "ci_tx_ring", queue_idx,
 				      ring_size, IAVF_RING_BASE_ALIGN,
 				      socket_id);
 	if (!mz) {
@@ -840,7 +840,7 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
 		return -ENOMEM;
 	}
 	txq->tx_ring_dma = mz->iova;
-	txq->iavf_tx_ring = (struct ci_tx_desc *)mz->addr;
+	txq->ci_tx_ring = (struct ci_tx_desc *)mz->addr;
 
 	txq->mz = mz;
 	reset_tx_queue(txq);
@@ -2334,7 +2334,7 @@ iavf_xmit_cleanup(struct ci_tx_queue *txq)
 	uint16_t desc_to_clean_to;
 	uint16_t nb_tx_to_clean;
 
-	volatile struct ci_tx_desc *txd = txq->iavf_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
 
 	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
 	if (desc_to_clean_to >= nb_tx_desc)
@@ -2757,7 +2757,7 @@ uint16_t
 iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = tx_queue;
-	volatile struct ci_tx_desc *txr = txq->iavf_tx_ring;
+	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
 	struct ci_tx_entry *txe_ring = txq->sw_ring;
 	struct ci_tx_entry *txe, *txn;
 	struct rte_mbuf *mb, *mb_seg;
@@ -4504,7 +4504,7 @@ iavf_dev_tx_desc_status(void *tx_queue, uint16_t offset)
 			desc -= txq->nb_tx_desc;
 	}
 
-	status = &txq->iavf_tx_ring[desc].cmd_type_offset_bsz;
+	status = &txq->ci_tx_ring[desc].cmd_type_offset_bsz;
 	mask = rte_le_to_cpu_64(IAVF_TXD_QW1_DTYPE_MASK);
 	expect = rte_cpu_to_le_64(
 		 IAVF_TX_DESC_DTYPE_DESC_DONE << IAVF_TXD_QW1_DTYPE_SHIFT);
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
index c3d7083230..82861b8398 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
@@ -1729,7 +1729,7 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	nb_commit = nb_pkts;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->iavf_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = &txq->sw_ring_vec[tx_id];
 
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
@@ -1750,7 +1750,7 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->iavf_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = &txq->sw_ring_vec[tx_id];
 	}
 
@@ -1760,7 +1760,7 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->iavf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
 					 IAVF_TXD_QW1_CMD_SHIFT);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c b/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
index d79d96c7b7..ad1b0b90cd 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
@@ -2219,7 +2219,7 @@ iavf_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	nb_commit = nb_pkts;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->iavf_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = (void *)txq->sw_ring;
 	txep += tx_id;
 
@@ -2241,7 +2241,7 @@ iavf_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->iavf_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = (void *)txq->sw_ring;
 		txep += tx_id;
 	}
@@ -2252,7 +2252,7 @@ iavf_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->iavf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
 					 IAVF_TXD_QW1_CMD_SHIFT);
 		txq->tx_next_rs =
@@ -2288,7 +2288,7 @@ iavf_xmit_fixed_burst_vec_avx512_ctx(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	nb_pkts = nb_commit >> 1;
 	tx_id = txq->tx_tail;
-	txdp = &txq->iavf_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = (void *)txq->sw_ring;
 	txep += (tx_id >> 1);
 
@@ -2309,7 +2309,7 @@ iavf_xmit_fixed_burst_vec_avx512_ctx(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 		tx_id = 0;
 		/* avoid reach the end of ring */
-		txdp = txq->iavf_tx_ring;
+		txdp = txq->ci_tx_ring;
 		txep = (void *)txq->sw_ring;
 	}
 
@@ -2320,7 +2320,7 @@ iavf_xmit_fixed_burst_vec_avx512_ctx(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 
 	if (tx_id > txq->tx_next_rs) {
-		txq->iavf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
 					 IAVF_TXD_QW1_CMD_SHIFT);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_common.h b/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
index f1ea57034f..1832b76f89 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
@@ -14,7 +14,7 @@
 static inline int
 iavf_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 {
-	return (txq->iavf_tx_ring[idx].cmd_type_offset_bsz &
+	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
 			rte_cpu_to_le_64(IAVF_TXD_QW1_DTYPE_MASK)) ==
 				rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE);
 }
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_sse.c b/drivers/net/intel/iavf/iavf_rxtx_vec_sse.c
index cb086cd352..89ec05fa5d 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_sse.c
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_sse.c
@@ -1286,7 +1286,7 @@ iavf_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 	nb_commit = nb_pkts;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->iavf_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = &txq->sw_ring_vec[tx_id];
 
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
@@ -1306,7 +1306,7 @@ iavf_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->iavf_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = &txq->sw_ring_vec[tx_id];
 	}
 
@@ -1316,7 +1316,7 @@ iavf_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->iavf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
 					 IAVF_TXD_QW1_CMD_SHIFT);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/ice/ice_dcf_ethdev.c b/drivers/net/intel/ice/ice_dcf_ethdev.c
index ab1d499cef..5f537b4c12 100644
--- a/drivers/net/intel/ice/ice_dcf_ethdev.c
+++ b/drivers/net/intel/ice/ice_dcf_ethdev.c
@@ -401,11 +401,11 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	txe = txq->sw_ring;
 	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
-		((volatile char *)txq->ice_tx_ring)[i] = 0;
+		((volatile char *)txq->ci_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		txq->ice_tx_ring[i].cmd_type_offset_bsz =
+		txq->ci_tx_ring[i].cmd_type_offset_bsz =
 			rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 7358a95ce1..4aded194ce 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -1113,11 +1113,11 @@ ice_reset_tx_queue(struct ci_tx_queue *txq)
 	txe = txq->sw_ring;
 	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
-		((volatile char *)txq->ice_tx_ring)[i] = 0;
+		((volatile char *)txq->ci_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		volatile struct ci_tx_desc *txd = &txq->ice_tx_ring[i];
+		volatile struct ci_tx_desc *txd = &txq->ci_tx_ring[i];
 
 		txd->cmd_type_offset_bsz =
 			rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE);
@@ -1611,7 +1611,7 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
 	/* Allocate TX hardware ring descriptors. */
 	ring_size = sizeof(struct ci_tx_desc) * ICE_MAX_NUM_DESC_BY_MAC(hw);
 	ring_size = RTE_ALIGN(ring_size, ICE_DMA_MEM_ALIGN);
-	tz = rte_eth_dma_zone_reserve(dev, "ice_tx_ring", queue_idx,
+	tz = rte_eth_dma_zone_reserve(dev, "ci_tx_ring", queue_idx,
 				      ring_size, ICE_RING_BASE_ALIGN,
 				      socket_id);
 	if (!tz) {
@@ -1633,7 +1633,7 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
 	txq->tx_deferred_start = tx_conf->tx_deferred_start;
 
 	txq->tx_ring_dma = tz->iova;
-	txq->ice_tx_ring = tz->addr;
+	txq->ci_tx_ring = tz->addr;
 
 	/* Allocate software ring */
 	txq->sw_ring =
@@ -2547,7 +2547,7 @@ ice_tx_descriptor_status(void *tx_queue, uint16_t offset)
 			desc -= txq->nb_tx_desc;
 	}
 
-	status = &txq->ice_tx_ring[desc].cmd_type_offset_bsz;
+	status = &txq->ci_tx_ring[desc].cmd_type_offset_bsz;
 	mask = rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M);
 	expect = rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE <<
 				  ICE_TXD_QW1_DTYPE_S);
@@ -2630,7 +2630,7 @@ ice_fdir_setup_tx_resources(struct ice_pf *pf)
 	txq->ice_vsi = pf->fdir.fdir_vsi;
 
 	txq->tx_ring_dma = tz->iova;
-	txq->ice_tx_ring = (struct ci_tx_desc *)tz->addr;
+	txq->ci_tx_ring = (struct ci_tx_desc *)tz->addr;
 	/*
 	 * don't need to allocate software ring and reset for the fdir
 	 * program queue just set the queue has been configured.
@@ -3019,7 +3019,7 @@ static inline int
 ice_xmit_cleanup(struct ci_tx_queue *txq)
 {
 	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	volatile struct ci_tx_desc *txd = txq->ice_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
 	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
 	uint16_t nb_tx_desc = txq->nb_tx_desc;
 	uint16_t desc_to_clean_to;
@@ -3140,7 +3140,7 @@ uint16_t
 ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq;
-	volatile struct ci_tx_desc *ice_tx_ring;
+	volatile struct ci_tx_desc *ci_tx_ring;
 	volatile struct ci_tx_desc *txd;
 	struct ci_tx_entry *sw_ring;
 	struct ci_tx_entry *txe, *txn;
@@ -3163,7 +3163,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 	txq = tx_queue;
 	sw_ring = txq->sw_ring;
-	ice_tx_ring = txq->ice_tx_ring;
+	ci_tx_ring = txq->ci_tx_ring;
 	tx_id = txq->tx_tail;
 	txe = &sw_ring[tx_id];
 
@@ -3249,7 +3249,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			/* Setup TX context descriptor if required */
 			volatile struct ice_tx_ctx_desc *ctx_txd =
 				(volatile struct ice_tx_ctx_desc *)
-					&ice_tx_ring[tx_id];
+					&ci_tx_ring[tx_id];
 			uint16_t cd_l2tag2 = 0;
 			uint64_t cd_type_cmd_tso_mss = ICE_TX_DESC_DTYPE_CTX;
 
@@ -3291,7 +3291,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		m_seg = tx_pkt;
 
 		do {
-			txd = &ice_tx_ring[tx_id];
+			txd = &ci_tx_ring[tx_id];
 			txn = &sw_ring[txe->next_id];
 
 			if (txe->mbuf)
@@ -3319,7 +3319,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 				txe->last_id = tx_last;
 				tx_id = txe->next_id;
 				txe = txn;
-				txd = &ice_tx_ring[tx_id];
+				txd = &ci_tx_ring[tx_id];
 				txn = &sw_ring[txe->next_id];
 			}
 
@@ -3402,7 +3402,7 @@ ice_tx_free_bufs(struct ci_tx_queue *txq)
 	struct ci_tx_entry *txep;
 	uint16_t i;
 
-	if ((txq->ice_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
+	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
 	     rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M)) !=
 	    rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE))
 		return 0;
@@ -3575,7 +3575,7 @@ static inline void
 ice_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
 		    uint16_t nb_pkts)
 {
-	volatile struct ci_tx_desc *txdp = &txq->ice_tx_ring[txq->tx_tail];
+	volatile struct ci_tx_desc *txdp = &txq->ci_tx_ring[txq->tx_tail];
 	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
 	const int N_PER_LOOP = 4;
 	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
@@ -3608,7 +3608,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
 	     uint16_t nb_pkts)
 {
-	volatile struct ci_tx_desc *txr = txq->ice_tx_ring;
+	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
 	uint16_t n = 0;
 
 	/**
@@ -4896,11 +4896,11 @@ ice_fdir_programming(struct ice_pf *pf, struct ice_fltr_desc *fdir_desc)
 	uint16_t i;
 
 	fdirdp = (volatile struct ice_fltr_desc *)
-		(&txq->ice_tx_ring[txq->tx_tail]);
+		(&txq->ci_tx_ring[txq->tx_tail]);
 	fdirdp->qidx_compq_space_stat = fdir_desc->qidx_compq_space_stat;
 	fdirdp->dtype_cmd_vsi_fdid = fdir_desc->dtype_cmd_vsi_fdid;
 
-	txdp = &txq->ice_tx_ring[txq->tx_tail + 1];
+	txdp = &txq->ci_tx_ring[txq->tx_tail + 1];
 	txdp->buffer_addr = rte_cpu_to_le_64(pf->fdir.dma_addr);
 	td_cmd = ICE_TX_DESC_CMD_EOP |
 		ICE_TX_DESC_CMD_RS  |
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
index 95c4f4569c..d553c438f8 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
@@ -869,7 +869,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->ice_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = &txq->sw_ring_vec[tx_id];
 
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
@@ -890,7 +890,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->ice_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = &txq->sw_ring_vec[tx_id];
 	}
 
@@ -900,7 +900,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->ice_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS) <<
 					 ICE_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
index 1f6bf5fc8e..d42f41461f 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
@@ -933,7 +933,7 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->ice_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = (void *)txq->sw_ring;
 	txep += tx_id;
 
@@ -955,7 +955,7 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = txq->ice_tx_ring;
+		txdp = txq->ci_tx_ring;
 		txep = (void *)txq->sw_ring;
 	}
 
@@ -965,7 +965,7 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->ice_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS) <<
 					 ICE_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_common.h b/drivers/net/intel/ice/ice_rxtx_vec_common.h
index ff46a8fb49..8ba591e403 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_common.h
+++ b/drivers/net/intel/ice/ice_rxtx_vec_common.h
@@ -11,7 +11,7 @@
 static inline int
 ice_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 {
-	return (txq->ice_tx_ring[idx].cmd_type_offset_bsz &
+	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
 			rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M)) ==
 				rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE);
 }
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_sse.c b/drivers/net/intel/ice/ice_rxtx_vec_sse.c
index 44f3fc0fa5..c65240d659 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_sse.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_sse.c
@@ -642,7 +642,7 @@ ice_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->ice_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = &txq->sw_ring_vec[tx_id];
 
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
@@ -662,7 +662,7 @@ ice_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->ice_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = &txq->sw_ring_vec[tx_id];
 	}
 
@@ -672,7 +672,7 @@ ice_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->ice_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS) <<
 					 ICE_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index be3c1ef216..51074bda3a 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -266,11 +266,11 @@ idpf_qc_single_tx_queue_reset(struct ci_tx_queue *txq)
 	txe = txq->sw_ring;
 	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
-		((volatile char *)txq->idpf_tx_ring)[i] = 0;
+		((volatile char *)txq->ci_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		txq->idpf_tx_ring[i].cmd_type_offset_bsz =
+		txq->ci_tx_ring[i].cmd_type_offset_bsz =
 			rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
@@ -1335,7 +1335,7 @@ idpf_xmit_cleanup(struct ci_tx_queue *txq)
 	uint16_t desc_to_clean_to;
 	uint16_t nb_tx_to_clean;
 
-	volatile struct ci_tx_desc *txd = txq->idpf_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
 
 	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
 	if (desc_to_clean_to >= nb_tx_desc)
@@ -1398,7 +1398,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		return nb_tx;
 
 	sw_ring = txq->sw_ring;
-	txr = txq->idpf_tx_ring;
+	txr = txq->ci_tx_ring;
 	tx_id = txq->tx_tail;
 	txe = &sw_ring[tx_id];
 
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c b/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
index 5f5d538dcb..04efee3722 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
@@ -573,7 +573,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->idpf_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = &txq->sw_ring_vec[tx_id];
 
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
@@ -594,7 +594,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->idpf_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = &txq->sw_ring_vec[tx_id];
 	}
 
@@ -604,7 +604,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->idpf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)IDPF_TX_DESC_CMD_RS) <<
 					 IDPF_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
index c1ec3d1222..d5e5a2ca5f 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
@@ -1090,7 +1090,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pk
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->idpf_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = (void *)txq->sw_ring;
 	txep += tx_id;
 
@@ -1112,7 +1112,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pk
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->idpf_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = (void *)txq->sw_ring;
 		txep += tx_id;
 	}
@@ -1123,7 +1123,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pk
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->idpf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)IDPF_TX_DESC_CMD_RS) <<
 					 IDPF_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/idpf/idpf_rxtx.c b/drivers/net/intel/idpf/idpf_rxtx.c
index 9b63e44341..e974eb44b0 100644
--- a/drivers/net/intel/idpf/idpf_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_rxtx.c
@@ -469,7 +469,7 @@ idpf_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 	}
 
 	if (!is_splitq) {
-		txq->idpf_tx_ring = mz->addr;
+		txq->ci_tx_ring = mz->addr;
 		idpf_qc_single_tx_queue_reset(txq);
 	} else {
 		txq->desc_ring = mz->addr;
diff --git a/drivers/net/intel/idpf/idpf_rxtx_vec_common.h b/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
index 4702061484..b5e8574667 100644
--- a/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
+++ b/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
@@ -31,7 +31,7 @@ idpf_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 	if (txq->complq != NULL)
 		return 1;
 
-	return (txq->idpf_tx_ring[idx].cmd_type_offset_bsz &
+	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
 			rte_cpu_to_le_64(IDPF_TXD_QW1_DTYPE_M)) ==
 				rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE);
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [RFC PATCH 03/27] net/intel: create common post-Tx cleanup function
  2025-12-19 17:25 [RFC PATCH 00/27] combine multiple Intel scalar Tx paths Bruce Richardson
  2025-12-19 17:25 ` [RFC PATCH 01/27] net/intel: create common Tx descriptor structure Bruce Richardson
  2025-12-19 17:25 ` [RFC PATCH 02/27] net/intel: use common tx ring structure Bruce Richardson
@ 2025-12-19 17:25 ` Bruce Richardson
  2025-12-19 17:25 ` [RFC PATCH 04/27] net/intel: consolidate definitions for Tx desc fields Bruce Richardson
                   ` (27 subsequent siblings)
  30 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2025-12-19 17:25 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Vladimir Medvedkin, Anatoly Burakov,
	Jingjing Wu, Praveen Shetty

The code used in ice, iavf, idpf and i40e for doing cleanup of mbufs
after they had been transmitted was identical. Therefore deduplicate it
by moving to common and remove the driver-specific versions.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h             | 53 ++++++++++++++++++++
 drivers/net/intel/i40e/i40e_rxtx.c        | 49 ++----------------
 drivers/net/intel/iavf/iavf_rxtx.c        | 50 ++-----------------
 drivers/net/intel/ice/ice_rxtx.c          | 60 ++---------------------
 drivers/net/intel/idpf/idpf_common_rxtx.c | 46 ++---------------
 5 files changed, 71 insertions(+), 187 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index a9ff3bebd5..5b87c15da0 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -249,6 +249,59 @@ ci_tx_free_bufs_vec(struct ci_tx_queue *txq, ci_desc_done_fn desc_done, bool ctx
 	return txq->tx_rs_thresh;
 }
 
+/*
+ * Common transmit descriptor cleanup function for Intel drivers.
+ * Used by ice, i40e, iavf, and idpf drivers.
+ *
+ * Returns:
+ *   0 on success
+ *  -1 if cleanup cannot proceed (descriptors not yet processed by HW)
+ */
+static __rte_always_inline int
+ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
+{
+	struct ci_tx_entry *sw_ring = txq->sw_ring;
+	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
+	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
+	uint16_t nb_tx_desc = txq->nb_tx_desc;
+	uint16_t desc_to_clean_to;
+	uint16_t nb_tx_to_clean;
+
+	/* Determine the last descriptor needing to be cleaned */
+	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
+	if (desc_to_clean_to >= nb_tx_desc)
+		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
+
+	/* Check to make sure the last descriptor to clean is done */
+	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
+
+	/* Check if descriptor is done - all drivers use 0xF as done value in bits 3:0 */
+	if ((txd[desc_to_clean_to].cmd_type_offset_bsz & rte_cpu_to_le_64(0xFUL)) !=
+			rte_cpu_to_le_64(0xFUL)) {
+		/* Descriptor not yet processed by hardware */
+		return -1;
+	}
+
+	/* Figure out how many descriptors will be cleaned */
+	if (last_desc_cleaned > desc_to_clean_to)
+		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) + desc_to_clean_to);
+	else
+		nb_tx_to_clean = (uint16_t)(desc_to_clean_to - last_desc_cleaned);
+
+	/* The last descriptor to clean is done, so that means all the
+	 * descriptors from the last descriptor that was cleaned
+	 * up to the last descriptor with the RS bit set
+	 * are done. Only reset the threshold descriptor.
+	 */
+	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
+
+	/* Update the txq to reflect the last descriptor that was cleaned */
+	txq->last_desc_cleaned = desc_to_clean_to;
+	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
+
+	return 0;
+}
+
 static inline void
 ci_txq_release_all_mbufs(struct ci_tx_queue *txq, bool use_ctx)
 {
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 2af3098f81..880013a515 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -380,45 +380,6 @@ i40e_build_ctob(uint32_t td_cmd,
 			((uint64_t)td_tag  << I40E_TXD_QW1_L2TAG1_SHIFT));
 }
 
-static inline int
-i40e_xmit_cleanup(struct ci_tx_queue *txq)
-{
-	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
-	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
-	uint16_t nb_tx_desc = txq->nb_tx_desc;
-	uint16_t desc_to_clean_to;
-	uint16_t nb_tx_to_clean;
-
-	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
-	if (desc_to_clean_to >= nb_tx_desc)
-		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
-
-	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-	if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) !=
-			rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE)) {
-		PMD_TX_LOG(DEBUG, "TX descriptor %4u is not done "
-			   "(port=%d queue=%d)", desc_to_clean_to,
-			   txq->port_id, txq->queue_id);
-		return -1;
-	}
-
-	if (last_desc_cleaned > desc_to_clean_to)
-		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) +
-							desc_to_clean_to);
-	else
-		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
-					last_desc_cleaned);
-
-	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
-
-	txq->last_desc_cleaned = desc_to_clean_to;
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
-
-	return 0;
-}
-
 static inline int
 #ifdef RTE_LIBRTE_I40E_RX_ALLOW_BULK_ALLOC
 check_rx_burst_bulk_alloc_preconditions(struct ci_rx_queue *rxq)
@@ -1114,7 +1075,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 	/* Check if the descriptor ring needs to be cleaned. */
 	if (txq->nb_tx_free < txq->tx_free_thresh)
-		(void)i40e_xmit_cleanup(txq);
+		(void)ci_tx_xmit_cleanup(txq);
 
 	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
 		td_cmd = 0;
@@ -1155,14 +1116,14 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
 
 		if (nb_used > txq->nb_tx_free) {
-			if (i40e_xmit_cleanup(txq) != 0) {
+			if (ci_tx_xmit_cleanup(txq) != 0) {
 				if (nb_tx == 0)
 					return 0;
 				goto end_of_tx;
 			}
 			if (unlikely(nb_used > txq->tx_rs_thresh)) {
 				while (nb_used > txq->nb_tx_free) {
-					if (i40e_xmit_cleanup(txq) != 0) {
+					if (ci_tx_xmit_cleanup(txq) != 0) {
 						if (nb_tx == 0)
 							return 0;
 						goto end_of_tx;
@@ -2794,7 +2755,7 @@ i40e_tx_done_cleanup_full(struct ci_tx_queue *txq,
 	tx_last = txq->tx_tail;
 	tx_id  = swr_ring[tx_last].next_id;
 
-	if (txq->nb_tx_free == 0 && i40e_xmit_cleanup(txq))
+	if (txq->nb_tx_free == 0 && ci_tx_xmit_cleanup(txq))
 		return 0;
 
 	nb_tx_to_clean = txq->nb_tx_free;
@@ -2828,7 +2789,7 @@ i40e_tx_done_cleanup_full(struct ci_tx_queue *txq,
 			break;
 
 		if (pkt_cnt < free_cnt) {
-			if (i40e_xmit_cleanup(txq))
+			if (ci_tx_xmit_cleanup(txq))
 				break;
 
 			nb_tx_to_clean = txq->nb_tx_free - nb_tx_free_last;
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index 2ed778a872..4605523673 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -2325,46 +2325,6 @@ iavf_recv_pkts_bulk_alloc(void *rx_queue,
 	return nb_rx;
 }
 
-static inline int
-iavf_xmit_cleanup(struct ci_tx_queue *txq)
-{
-	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
-	uint16_t nb_tx_desc = txq->nb_tx_desc;
-	uint16_t desc_to_clean_to;
-	uint16_t nb_tx_to_clean;
-
-	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
-
-	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
-	if (desc_to_clean_to >= nb_tx_desc)
-		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
-
-	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-	if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(IAVF_TXD_QW1_DTYPE_MASK)) !=
-			rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE)) {
-		PMD_TX_LOG(DEBUG, "TX descriptor %4u is not done "
-			   "(port=%d queue=%d)", desc_to_clean_to,
-			   txq->port_id, txq->queue_id);
-		return -1;
-	}
-
-	if (last_desc_cleaned > desc_to_clean_to)
-		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) +
-							desc_to_clean_to);
-	else
-		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
-					last_desc_cleaned);
-
-	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
-
-	txq->last_desc_cleaned = desc_to_clean_to;
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
-
-	return 0;
-}
-
 /* Check if the context descriptor is needed for TX offloading */
 static inline uint16_t
 iavf_calc_context_desc(struct rte_mbuf *mb, uint8_t vlan_flag)
@@ -2769,7 +2729,7 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 	/* Check if the descriptor ring needs to be cleaned. */
 	if (txq->nb_tx_free < txq->tx_free_thresh)
-		iavf_xmit_cleanup(txq);
+		ci_tx_xmit_cleanup(txq);
 
 	desc_idx = txq->tx_tail;
 	txe = &txe_ring[desc_idx];
@@ -2824,14 +2784,14 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			txq->port_id, txq->queue_id, desc_idx, desc_idx_last);
 
 		if (nb_desc_required > txq->nb_tx_free) {
-			if (iavf_xmit_cleanup(txq)) {
+			if (ci_tx_xmit_cleanup(txq)) {
 				if (idx == 0)
 					return 0;
 				goto end_of_tx;
 			}
 			if (unlikely(nb_desc_required > txq->tx_rs_thresh)) {
 				while (nb_desc_required > txq->nb_tx_free) {
-					if (iavf_xmit_cleanup(txq)) {
+					if (ci_tx_xmit_cleanup(txq)) {
 						if (idx == 0)
 							return 0;
 						goto end_of_tx;
@@ -4342,7 +4302,7 @@ iavf_tx_done_cleanup_full(struct ci_tx_queue *txq,
 	tx_id = txq->tx_tail;
 	tx_last = tx_id;
 
-	if (txq->nb_tx_free == 0 && iavf_xmit_cleanup(txq))
+	if (txq->nb_tx_free == 0 && ci_tx_xmit_cleanup(txq))
 		return 0;
 
 	nb_tx_to_clean = txq->nb_tx_free;
@@ -4374,7 +4334,7 @@ iavf_tx_done_cleanup_full(struct ci_tx_queue *txq,
 			break;
 
 		if (pkt_cnt < free_cnt) {
-			if (iavf_xmit_cleanup(txq))
+			if (ci_tx_xmit_cleanup(txq))
 				break;
 
 			nb_tx_to_clean = txq->nb_tx_free - nb_tx_free_last;
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 4aded194ce..0a6ca993c6 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3015,56 +3015,6 @@ ice_txd_enable_checksum(uint64_t ol_flags,
 	}
 }
 
-static inline int
-ice_xmit_cleanup(struct ci_tx_queue *txq)
-{
-	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
-	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
-	uint16_t nb_tx_desc = txq->nb_tx_desc;
-	uint16_t desc_to_clean_to;
-	uint16_t nb_tx_to_clean;
-
-	/* Determine the last descriptor needing to be cleaned */
-	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
-	if (desc_to_clean_to >= nb_tx_desc)
-		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
-
-	/* Check to make sure the last descriptor to clean is done */
-	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-	if (!(txd[desc_to_clean_to].cmd_type_offset_bsz &
-	    rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE))) {
-		PMD_TX_LOG(DEBUG, "TX descriptor %4u is not done "
-			   "(port=%d queue=%d) value=0x%"PRIx64,
-			   desc_to_clean_to,
-			   txq->port_id, txq->queue_id,
-			   txd[desc_to_clean_to].cmd_type_offset_bsz);
-		/* Failed to clean any descriptors */
-		return -1;
-	}
-
-	/* Figure out how many descriptors will be cleaned */
-	if (last_desc_cleaned > desc_to_clean_to)
-		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) +
-					    desc_to_clean_to);
-	else
-		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
-					    last_desc_cleaned);
-
-	/* The last descriptor to clean is done, so that means all the
-	 * descriptors from the last descriptor that was cleaned
-	 * up to the last descriptor with the RS bit set
-	 * are done. Only reset the threshold descriptor.
-	 */
-	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
-
-	/* Update the txq to reflect the last descriptor that was cleaned */
-	txq->last_desc_cleaned = desc_to_clean_to;
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
-
-	return 0;
-}
-
 /* Construct the tx flags */
 static inline uint64_t
 ice_build_ctob(uint32_t td_cmd,
@@ -3172,7 +3122,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 	/* Check if the descriptor ring needs to be cleaned. */
 	if (txq->nb_tx_free < txq->tx_free_thresh)
-		(void)ice_xmit_cleanup(txq);
+		(void)ci_tx_xmit_cleanup(txq);
 
 	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
 		tx_pkt = *tx_pkts++;
@@ -3209,14 +3159,14 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
 
 		if (nb_used > txq->nb_tx_free) {
-			if (ice_xmit_cleanup(txq) != 0) {
+			if (ci_tx_xmit_cleanup(txq) != 0) {
 				if (nb_tx == 0)
 					return 0;
 				goto end_of_tx;
 			}
 			if (unlikely(nb_used > txq->tx_rs_thresh)) {
 				while (nb_used > txq->nb_tx_free) {
-					if (ice_xmit_cleanup(txq) != 0) {
+					if (ci_tx_xmit_cleanup(txq) != 0) {
 						if (nb_tx == 0)
 							return 0;
 						goto end_of_tx;
@@ -3446,7 +3396,7 @@ ice_tx_done_cleanup_full(struct ci_tx_queue *txq,
 	tx_last = txq->tx_tail;
 	tx_id  = swr_ring[tx_last].next_id;
 
-	if (txq->nb_tx_free == 0 && ice_xmit_cleanup(txq))
+	if (txq->nb_tx_free == 0 && ci_tx_xmit_cleanup(txq))
 		return 0;
 
 	nb_tx_to_clean = txq->nb_tx_free;
@@ -3480,7 +3430,7 @@ ice_tx_done_cleanup_full(struct ci_tx_queue *txq,
 			break;
 
 		if (pkt_cnt < free_cnt) {
-			if (ice_xmit_cleanup(txq))
+			if (ci_tx_xmit_cleanup(txq))
 				break;
 
 			nb_tx_to_clean = txq->nb_tx_free - nb_tx_free_last;
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 51074bda3a..23666539ab 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -1326,46 +1326,6 @@ idpf_dp_singleq_recv_scatter_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 	return nb_rx;
 }
 
-static inline int
-idpf_xmit_cleanup(struct ci_tx_queue *txq)
-{
-	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
-	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	uint16_t nb_tx_desc = txq->nb_tx_desc;
-	uint16_t desc_to_clean_to;
-	uint16_t nb_tx_to_clean;
-
-	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
-
-	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
-	if (desc_to_clean_to >= nb_tx_desc)
-		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
-
-	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-	if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
-	     rte_cpu_to_le_64(IDPF_TXD_QW1_DTYPE_M)) !=
-	    rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE)) {
-		TX_LOG(DEBUG, "TX descriptor %4u is not done "
-		       "(port=%d queue=%d)", desc_to_clean_to,
-		       txq->port_id, txq->queue_id);
-		return -1;
-	}
-
-	if (last_desc_cleaned > desc_to_clean_to)
-		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) +
-					    desc_to_clean_to);
-	else
-		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
-					    last_desc_cleaned);
-
-	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
-
-	txq->last_desc_cleaned = desc_to_clean_to;
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
-
-	return 0;
-}
-
 /* TX function */
 RTE_EXPORT_INTERNAL_SYMBOL(idpf_dp_singleq_xmit_pkts)
 uint16_t
@@ -1404,7 +1364,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	/* Check if the descriptor ring needs to be cleaned. */
 	if (txq->nb_tx_free < txq->tx_free_thresh)
-		(void)idpf_xmit_cleanup(txq);
+		(void)ci_tx_xmit_cleanup(txq);
 
 	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
 		td_cmd = 0;
@@ -1437,14 +1397,14 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		       txq->port_id, txq->queue_id, tx_id, tx_last);
 
 		if (nb_used > txq->nb_tx_free) {
-			if (idpf_xmit_cleanup(txq) != 0) {
+			if (ci_tx_xmit_cleanup(txq) != 0) {
 				if (nb_tx == 0)
 					return 0;
 				goto end_of_tx;
 			}
 			if (unlikely(nb_used > txq->tx_rs_thresh)) {
 				while (nb_used > txq->nb_tx_free) {
-					if (idpf_xmit_cleanup(txq) != 0) {
+					if (ci_tx_xmit_cleanup(txq) != 0) {
 						if (nb_tx == 0)
 							return 0;
 						goto end_of_tx;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [RFC PATCH 04/27] net/intel: consolidate definitions for Tx desc fields
  2025-12-19 17:25 [RFC PATCH 00/27] combine multiple Intel scalar Tx paths Bruce Richardson
                   ` (2 preceding siblings ...)
  2025-12-19 17:25 ` [RFC PATCH 03/27] net/intel: create common post-Tx cleanup function Bruce Richardson
@ 2025-12-19 17:25 ` Bruce Richardson
  2025-12-19 17:25 ` [RFC PATCH 05/27] net/intel: create separate header for Tx scalar fns Bruce Richardson
                   ` (26 subsequent siblings)
  30 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2025-12-19 17:25 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Vladimir Medvedkin, Anatoly Burakov,
	Jingjing Wu, Praveen Shetty

The offsets of the various fields within the Tx descriptors are common
for i40e, iavf, ice and idpf, so put a single set of defines in tx.h and
use those throughout all drivers. (NOTE: there was a small difference in
mask of CMD field between drivers depending on whether reserved fields
or not were included. Those can be ignored as those bits are unused in
the drivers for which they are reserved). Similarly, the various flag
fields, such as End-of-packet (EOP) and Report-status (RS) are the same,
as are offload definitions so consolidate them.

Original definitions are in base code, and are left in place because of
that, but are unused.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h                 |  64 +++++++-
 drivers/net/intel/i40e/i40e_fdir.c            |  24 +--
 drivers/net/intel/i40e/i40e_rxtx.c            |  92 ++++++------
 drivers/net/intel/i40e/i40e_rxtx.h            |  17 +--
 .../net/intel/i40e/i40e_rxtx_vec_altivec.c    |  11 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c   |  22 ++-
 drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c |  38 ++---
 drivers/net/intel/i40e/i40e_rxtx_vec_common.h |   4 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_neon.c   |  11 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_sse.c    |  11 +-
 drivers/net/intel/iavf/iavf_rxtx.c            |  68 +++++----
 drivers/net/intel/iavf/iavf_rxtx.h            |  20 +--
 drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c   |  41 ++----
 drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c |  80 ++++------
 drivers/net/intel/iavf/iavf_rxtx_vec_common.h |  34 ++---
 drivers/net/intel/iavf/iavf_rxtx_vec_sse.c    |  15 +-
 drivers/net/intel/ice/ice_dcf_ethdev.c        |   2 +-
 drivers/net/intel/ice/ice_rxtx.c              | 137 ++++++++----------
 drivers/net/intel/ice/ice_rxtx.h              |  15 +-
 drivers/net/intel/ice/ice_rxtx_vec_avx2.c     |  41 ++----
 drivers/net/intel/ice/ice_rxtx_vec_avx512.c   |  39 ++---
 drivers/net/intel/ice/ice_rxtx_vec_common.h   |  41 +++---
 drivers/net/intel/ice/ice_rxtx_vec_sse.c      |  12 +-
 drivers/net/intel/idpf/idpf_common_rxtx.c     |  22 +--
 drivers/net/intel/idpf/idpf_common_rxtx.h     |  12 --
 .../net/intel/idpf/idpf_common_rxtx_avx2.c    |  41 ++----
 .../net/intel/idpf/idpf_common_rxtx_avx512.c  |  41 ++----
 drivers/net/intel/idpf/idpf_rxtx_vec_common.h |   4 +-
 28 files changed, 424 insertions(+), 535 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 5b87c15da0..3d3d9ad8e3 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -10,6 +10,66 @@
 #include <rte_ethdev.h>
 #include <rte_vect.h>
 
+/* Common TX Descriptor QW1 Field Definitions */
+#define CI_TXD_QW1_DTYPE_S      0
+#define CI_TXD_QW1_DTYPE_M      (0xFUL << CI_TXD_QW1_DTYPE_S)
+#define CI_TXD_QW1_CMD_S        4
+#define CI_TXD_QW1_CMD_M        (0xFFFUL << CI_TXD_QW1_CMD_S)
+#define CI_TXD_QW1_OFFSET_S     16
+#define CI_TXD_QW1_OFFSET_M     (0x3FFFFULL << CI_TXD_QW1_OFFSET_S)
+#define CI_TXD_QW1_TX_BUF_SZ_S  34
+#define CI_TXD_QW1_TX_BUF_SZ_M  (0x3FFFULL << CI_TXD_QW1_TX_BUF_SZ_S)
+#define CI_TXD_QW1_L2TAG1_S     48
+#define CI_TXD_QW1_L2TAG1_M     (0xFFFFULL << CI_TXD_QW1_L2TAG1_S)
+
+/* Common Descriptor Types */
+#define CI_TX_DESC_DTYPE_DATA           0x0
+#define CI_TX_DESC_DTYPE_CTX            0x1
+#define CI_TX_DESC_DTYPE_DESC_DONE      0xF
+
+/* Common TX Descriptor Command Flags */
+#define CI_TX_DESC_CMD_EOP              0x0001
+#define CI_TX_DESC_CMD_RS               0x0002
+#define CI_TX_DESC_CMD_ICRC             0x0004
+#define CI_TX_DESC_CMD_IL2TAG1          0x0008
+#define CI_TX_DESC_CMD_DUMMY            0x0010
+#define CI_TX_DESC_CMD_IIPT_IPV6        0x0020
+#define CI_TX_DESC_CMD_IIPT_IPV4        0x0040
+#define CI_TX_DESC_CMD_IIPT_IPV4_CSUM   0x0060
+#define CI_TX_DESC_CMD_L4T_EOFT_TCP     0x0100
+#define CI_TX_DESC_CMD_L4T_EOFT_SCTP    0x0200
+#define CI_TX_DESC_CMD_L4T_EOFT_UDP     0x0300
+
+/* Common TX Context Descriptor Commands */
+#define CI_TX_CTX_DESC_TSO              0x01
+#define CI_TX_CTX_DESC_TSYN             0x02
+#define CI_TX_CTX_DESC_IL2TAG2          0x04
+
+/* Common TX Descriptor Length Field Shifts */
+#define CI_TX_DESC_LEN_MACLEN_S         0  /* 7 BITS */
+#define CI_TX_DESC_LEN_IPLEN_S          7  /* 7 BITS */
+#define CI_TX_DESC_LEN_L4_LEN_S         14 /* 4 BITS */
+
+/* Common maximum data per TX descriptor */
+#define CI_MAX_DATA_PER_TXD     (CI_TXD_QW1_TX_BUF_SZ_M >> CI_TXD_QW1_TX_BUF_SZ_S)
+
+/**
+ * Common TX offload union for Intel drivers.
+ * Supports both basic offloads (l2_len, l3_len, l4_len, tso_segsz) and
+ * extended offloads (outer_l2_len, outer_l3_len) for tunneling support.
+ */
+union ci_tx_offload {
+	uint64_t data;
+	struct {
+		uint64_t l2_len:7;        /**< L2 (MAC) Header Length. */
+		uint64_t l3_len:9;        /**< L3 (IP) Header Length. */
+		uint64_t l4_len:8;        /**< L4 Header Length. */
+		uint64_t tso_segsz:16;    /**< TCP TSO segment size */
+		uint64_t outer_l2_len:8;  /**< outer L2 Header Length */
+		uint64_t outer_l3_len:16; /**< outer L3 Header Length */
+	};
+};
+
 /*
  * Structure of a 16-byte Tx descriptor common across i40e, ice, iavf and idpf drivers
  */
@@ -276,8 +336,8 @@ ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
 	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
 
 	/* Check if descriptor is done - all drivers use 0xF as done value in bits 3:0 */
-	if ((txd[desc_to_clean_to].cmd_type_offset_bsz & rte_cpu_to_le_64(0xFUL)) !=
-			rte_cpu_to_le_64(0xFUL)) {
+	if ((txd[desc_to_clean_to].cmd_type_offset_bsz & rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE)) {
 		/* Descriptor not yet processed by hardware */
 		return -1;
 	}
diff --git a/drivers/net/intel/i40e/i40e_fdir.c b/drivers/net/intel/i40e/i40e_fdir.c
index 8a01aec0e2..3b099d5a9e 100644
--- a/drivers/net/intel/i40e/i40e_fdir.c
+++ b/drivers/net/intel/i40e/i40e_fdir.c
@@ -916,11 +916,11 @@ i40e_build_ctob(uint32_t td_cmd,
 		unsigned int size,
 		uint32_t td_tag)
 {
-	return rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DATA |
-			((uint64_t)td_cmd  << I40E_TXD_QW1_CMD_SHIFT) |
-			((uint64_t)td_offset << I40E_TXD_QW1_OFFSET_SHIFT) |
-			((uint64_t)size  << I40E_TXD_QW1_TX_BUF_SZ_SHIFT) |
-			((uint64_t)td_tag  << I40E_TXD_QW1_L2TAG1_SHIFT));
+	return rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+			((uint64_t)td_cmd  << CI_TXD_QW1_CMD_S) |
+			((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+			((uint64_t)size  << CI_TXD_QW1_TX_BUF_SZ_S) |
+			((uint64_t)td_tag  << CI_TXD_QW1_L2TAG1_S));
 }
 
 /*
@@ -1384,8 +1384,8 @@ i40e_find_available_buffer(struct rte_eth_dev *dev)
 
 		do {
 			if ((tmp_txdp->cmd_type_offset_bsz &
-					rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) ==
-					rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE))
+					rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+					rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
 				fdir_info->txq_available_buf_count++;
 			else
 				break;
@@ -1710,9 +1710,9 @@ i40e_flow_fdir_filter_programming(struct i40e_pf *pf,
 	txdp = &txq->ci_tx_ring[txq->tx_tail + 1];
 	txdp->buffer_addr = rte_cpu_to_le_64(pf->fdir.dma_addr[txq->tx_tail >> 1]);
 
-	td_cmd = I40E_TX_DESC_CMD_EOP |
-		 I40E_TX_DESC_CMD_RS  |
-		 I40E_TX_DESC_CMD_DUMMY;
+	td_cmd = CI_TX_DESC_CMD_EOP |
+		 CI_TX_DESC_CMD_RS  |
+		 CI_TX_DESC_CMD_DUMMY;
 
 	txdp->cmd_type_offset_bsz =
 		i40e_build_ctob(td_cmd, 0, I40E_FDIR_PKT_LEN, 0);
@@ -1731,8 +1731,8 @@ i40e_flow_fdir_filter_programming(struct i40e_pf *pf,
 	if (wait_status) {
 		for (i = 0; i < I40E_FDIR_MAX_WAIT_US; i++) {
 			if ((txdp->cmd_type_offset_bsz &
-					rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) ==
-					rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE))
+					rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+					rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
 				break;
 			rte_delay_us(1);
 		}
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 880013a515..892069372f 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -41,7 +41,7 @@
 /* Base address of the HW descriptor ring should be 128B aligned. */
 #define I40E_RING_BASE_ALIGN	128
 
-#define I40E_TXD_CMD (I40E_TX_DESC_CMD_EOP | I40E_TX_DESC_CMD_RS)
+#define I40E_TXD_CMD (CI_TX_DESC_CMD_EOP | CI_TX_DESC_CMD_RS)
 
 #ifdef RTE_LIBRTE_IEEE1588
 #define I40E_TX_IEEE1588_TMST RTE_MBUF_F_TX_IEEE1588_TMST
@@ -256,7 +256,7 @@ i40e_rxd_build_fdir(volatile union ci_rx_desc *rxdp, struct rte_mbuf *mb)
 
 static inline void
 i40e_parse_tunneling_params(uint64_t ol_flags,
-			    union i40e_tx_offload tx_offload,
+			    union ci_tx_offload tx_offload,
 			    uint32_t *cd_tunneling)
 {
 	/* EIPT: External (outer) IP header type */
@@ -315,51 +315,51 @@ static inline void
 i40e_txd_enable_checksum(uint64_t ol_flags,
 			uint32_t *td_cmd,
 			uint32_t *td_offset,
-			union i40e_tx_offload tx_offload)
+			union ci_tx_offload tx_offload)
 {
 	/* Set MACLEN */
 	if (!(ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK))
 		*td_offset |= (tx_offload.l2_len >> 1)
-			<< I40E_TX_DESC_LENGTH_MACLEN_SHIFT;
+			<< CI_TX_DESC_LEN_MACLEN_S;
 
 	/* Enable L3 checksum offloads */
 	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
-		*td_cmd |= I40E_TX_DESC_CMD_IIPT_IPV4_CSUM;
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
 		*td_offset |= (tx_offload.l3_len >> 2)
-				<< I40E_TX_DESC_LENGTH_IPLEN_SHIFT;
+				<< CI_TX_DESC_LEN_IPLEN_S;
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
-		*td_cmd |= I40E_TX_DESC_CMD_IIPT_IPV4;
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
 		*td_offset |= (tx_offload.l3_len >> 2)
-				<< I40E_TX_DESC_LENGTH_IPLEN_SHIFT;
+				<< CI_TX_DESC_LEN_IPLEN_S;
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
-		*td_cmd |= I40E_TX_DESC_CMD_IIPT_IPV6;
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
 		*td_offset |= (tx_offload.l3_len >> 2)
-				<< I40E_TX_DESC_LENGTH_IPLEN_SHIFT;
+				<< CI_TX_DESC_LEN_IPLEN_S;
 	}
 
 	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
-		*td_cmd |= I40E_TX_DESC_CMD_L4T_EOFT_TCP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
 		*td_offset |= (tx_offload.l4_len >> 2)
-			<< I40E_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+			<< CI_TX_DESC_LEN_L4_LEN_S;
 		return;
 	}
 
 	/* Enable L4 checksum offloads */
 	switch (ol_flags & RTE_MBUF_F_TX_L4_MASK) {
 	case RTE_MBUF_F_TX_TCP_CKSUM:
-		*td_cmd |= I40E_TX_DESC_CMD_L4T_EOFT_TCP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
 		*td_offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
-				I40E_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+				CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_SCTP_CKSUM:
-		*td_cmd |= I40E_TX_DESC_CMD_L4T_EOFT_SCTP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
 		*td_offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
-				I40E_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+				CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_UDP_CKSUM:
-		*td_cmd |= I40E_TX_DESC_CMD_L4T_EOFT_UDP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
 		*td_offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
-				I40E_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+				CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	default:
 		break;
@@ -373,11 +373,11 @@ i40e_build_ctob(uint32_t td_cmd,
 		unsigned int size,
 		uint32_t td_tag)
 {
-	return rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DATA |
-			((uint64_t)td_cmd  << I40E_TXD_QW1_CMD_SHIFT) |
-			((uint64_t)td_offset << I40E_TXD_QW1_OFFSET_SHIFT) |
-			((uint64_t)size  << I40E_TXD_QW1_TX_BUF_SZ_SHIFT) |
-			((uint64_t)td_tag  << I40E_TXD_QW1_L2TAG1_SHIFT));
+	return rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+			((uint64_t)td_cmd  << CI_TXD_QW1_CMD_S) |
+			((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+			((uint64_t)size  << CI_TXD_QW1_TX_BUF_SZ_S) |
+			((uint64_t)td_tag  << CI_TXD_QW1_L2TAG1_S));
 }
 
 static inline int
@@ -1000,7 +1000,7 @@ i40e_calc_context_desc(uint64_t flags)
 
 /* set i40e TSO context descriptor */
 static inline uint64_t
-i40e_set_tso_ctx(struct rte_mbuf *mbuf, union i40e_tx_offload tx_offload)
+i40e_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 {
 	uint64_t ctx_desc = 0;
 	uint32_t cd_cmd, hdr_len, cd_tso_len;
@@ -1025,9 +1025,6 @@ i40e_set_tso_ctx(struct rte_mbuf *mbuf, union i40e_tx_offload tx_offload)
 	return ctx_desc;
 }
 
-/* HW requires that Tx buffer size ranges from 1B up to (16K-1)B. */
-#define I40E_MAX_DATA_PER_TXD \
-	(I40E_TXD_QW1_TX_BUF_SZ_MASK >> I40E_TXD_QW1_TX_BUF_SZ_SHIFT)
 /* Calculate the number of TX descriptors needed for each pkt */
 static inline uint16_t
 i40e_calc_pkt_desc(struct rte_mbuf *tx_pkt)
@@ -1036,7 +1033,7 @@ i40e_calc_pkt_desc(struct rte_mbuf *tx_pkt)
 	uint16_t count = 0;
 
 	while (txd != NULL) {
-		count += DIV_ROUND_UP(txd->data_len, I40E_MAX_DATA_PER_TXD);
+		count += DIV_ROUND_UP(txd->data_len, CI_MAX_DATA_PER_TXD);
 		txd = txd->next;
 	}
 
@@ -1065,7 +1062,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	uint16_t tx_last;
 	uint16_t slen;
 	uint64_t buf_dma_addr;
-	union i40e_tx_offload tx_offload = {0};
+	union ci_tx_offload tx_offload = {0};
 
 	txq = tx_queue;
 	sw_ring = txq->sw_ring;
@@ -1134,18 +1131,18 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		/* Descriptor based VLAN insertion */
 		if (ol_flags & (RTE_MBUF_F_TX_VLAN | RTE_MBUF_F_TX_QINQ)) {
-			td_cmd |= I40E_TX_DESC_CMD_IL2TAG1;
+			td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
 			td_tag = tx_pkt->vlan_tci;
 		}
 
 		/* Always enable CRC offload insertion */
-		td_cmd |= I40E_TX_DESC_CMD_ICRC;
+		td_cmd |= CI_TX_DESC_CMD_ICRC;
 
 		/* Fill in tunneling parameters if necessary */
 		cd_tunneling_params = 0;
 		if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) {
 			td_offset |= (tx_offload.outer_l2_len >> 1)
-					<< I40E_TX_DESC_LENGTH_MACLEN_SHIFT;
+					<< CI_TX_DESC_LEN_MACLEN_S;
 			i40e_parse_tunneling_params(ol_flags, tx_offload,
 						    &cd_tunneling_params);
 		}
@@ -1225,16 +1222,16 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			buf_dma_addr = rte_mbuf_data_iova(m_seg);
 
 			while ((ol_flags & RTE_MBUF_F_TX_TCP_SEG) &&
-				unlikely(slen > I40E_MAX_DATA_PER_TXD)) {
+				unlikely(slen > CI_MAX_DATA_PER_TXD)) {
 				txd->buffer_addr =
 					rte_cpu_to_le_64(buf_dma_addr);
 				txd->cmd_type_offset_bsz =
 					i40e_build_ctob(td_cmd,
-					td_offset, I40E_MAX_DATA_PER_TXD,
+					td_offset, CI_MAX_DATA_PER_TXD,
 					td_tag);
 
-				buf_dma_addr += I40E_MAX_DATA_PER_TXD;
-				slen -= I40E_MAX_DATA_PER_TXD;
+				buf_dma_addr += CI_MAX_DATA_PER_TXD;
+				slen -= CI_MAX_DATA_PER_TXD;
 
 				txe->last_id = tx_last;
 				tx_id = txe->next_id;
@@ -1261,7 +1258,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		} while (m_seg != NULL);
 
 		/* The last packet data descriptor needs End Of Packet (EOP) */
-		td_cmd |= I40E_TX_DESC_CMD_EOP;
+		td_cmd |= CI_TX_DESC_CMD_EOP;
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
@@ -1271,15 +1268,14 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 				   "%4u (port=%d queue=%d)",
 				   tx_last, txq->port_id, txq->queue_id);
 
-			td_cmd |= I40E_TX_DESC_CMD_RS;
+			td_cmd |= CI_TX_DESC_CMD_RS;
 
 			/* Update txq RS bit counters */
 			txq->nb_tx_used = 0;
 		}
 
 		txd->cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)td_cmd) <<
-					I40E_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S);
 	}
 
 end_of_tx:
@@ -1305,8 +1301,8 @@ i40e_tx_free_bufs(struct ci_tx_queue *txq)
 	const uint16_t m = tx_rs_thresh % I40E_TX_MAX_FREE_BUF_SZ;
 
 	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) !=
-			rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE))
+			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
 		return 0;
 
 	txep = &txq->sw_ring[txq->tx_next_dd - (tx_rs_thresh - 1)];
@@ -1432,8 +1428,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 		n = (uint16_t)(txq->nb_tx_desc - txq->tx_tail);
 		i40e_tx_fill_hw_ring(txq, tx_pkts, n);
 		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
-						I40E_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 		txq->tx_tail = 0;
 	}
@@ -1445,8 +1440,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	/* Determine if RS bit needs to be set */
 	if (txq->tx_tail > txq->tx_next_rs) {
 		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
-						I40E_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 		if (txq->tx_next_rs >= txq->nb_tx_desc)
@@ -2371,9 +2365,9 @@ i40e_dev_tx_descriptor_status(void *tx_queue, uint16_t offset)
 	}
 
 	status = &txq->ci_tx_ring[desc].cmd_type_offset_bsz;
-	mask = rte_le_to_cpu_64(I40E_TXD_QW1_DTYPE_MASK);
+	mask = rte_le_to_cpu_64(CI_TXD_QW1_DTYPE_M);
 	expect = rte_cpu_to_le_64(
-		I40E_TX_DESC_DTYPE_DESC_DONE << I40E_TXD_QW1_DTYPE_SHIFT);
+		CI_TX_DESC_DTYPE_DESC_DONE << CI_TXD_QW1_DTYPE_S);
 	if ((*status & mask) == expect)
 		return RTE_ETH_TX_DESC_DONE;
 
@@ -2869,7 +2863,7 @@ i40e_reset_tx_queue(struct ci_tx_queue *txq)
 		volatile struct ci_tx_desc *txd = &txq->ci_tx_ring[i];
 
 		txd->cmd_type_offset_bsz =
-			rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
 		txe[prev].next_id = i;
diff --git a/drivers/net/intel/i40e/i40e_rxtx.h b/drivers/net/intel/i40e/i40e_rxtx.h
index ed173d8f17..307ffa3049 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.h
+++ b/drivers/net/intel/i40e/i40e_rxtx.h
@@ -47,8 +47,8 @@
 #define I40E_RX_DESC_EXT_STATUS_FLEXBL_MASK   0x03
 #define I40E_RX_DESC_EXT_STATUS_FLEXBL_FLEX   0x01
 
-#define I40E_TD_CMD (I40E_TX_DESC_CMD_ICRC |\
-		     I40E_TX_DESC_CMD_EOP)
+#define I40E_TD_CMD (CI_TX_DESC_CMD_ICRC |\
+		     CI_TX_DESC_CMD_EOP)
 
 enum i40e_header_split_mode {
 	i40e_header_split_none = 0,
@@ -110,19 +110,6 @@ enum i40e_header_split_mode {
 
 #define I40E_TX_VECTOR_OFFLOADS RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE
 
-/** Offload features */
-union i40e_tx_offload {
-	uint64_t data;
-	struct {
-		uint64_t l2_len:7; /**< L2 (MAC) Header Length. */
-		uint64_t l3_len:9; /**< L3 (IP) Header Length. */
-		uint64_t l4_len:8; /**< L4 Header Length. */
-		uint64_t tso_segsz:16; /**< TCP TSO segment size */
-		uint64_t outer_l2_len:8; /**< outer L2 Header Length */
-		uint64_t outer_l3_len:16; /**< outer L3 Header Length */
-	};
-};
-
 int i40e_dev_rx_queue_start(struct rte_eth_dev *dev, uint16_t rx_queue_id);
 int i40e_dev_rx_queue_stop(struct rte_eth_dev *dev, uint16_t rx_queue_id);
 int i40e_dev_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id);
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
index 81e9e2bc0b..9196916a04 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
@@ -449,9 +449,9 @@ static inline void
 vtx1(volatile struct ci_tx_desc *txdp,
 	struct rte_mbuf *pkt, uint64_t flags)
 {
-	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
-		((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT) |
-		((uint64_t)pkt->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		((uint64_t)flags  << CI_TXD_QW1_CMD_S) |
+		((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	__vector unsigned long descriptor = (__vector unsigned long){
 		pkt->buf_iova + pkt->data_off, high_qw};
@@ -477,7 +477,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
-	uint64_t rs = I40E_TX_DESC_CMD_RS | I40E_TD_CMD;
+	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
 	int i;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
@@ -520,8 +520,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
-						I40E_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
index b25b05d79d..012283d3ca 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
@@ -684,9 +684,9 @@ static inline void
 vtx1(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf *pkt, uint64_t flags)
 {
-	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT) |
-			((uint64_t)pkt->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+			((uint64_t)flags  << CI_TXD_QW1_CMD_S) |
+			((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	__m128i descriptor = _mm_set_epi64x(high_qw,
 				pkt->buf_iova + pkt->data_off);
@@ -697,8 +697,7 @@ static inline void
 vtx(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
-	const uint64_t hi_qw_tmpl = (I40E_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | ((uint64_t)flags  << CI_TXD_QW1_CMD_S));
 
 	/* if unaligned on 32-bit boundary, do one to align */
 	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
@@ -709,13 +708,13 @@ vtx(volatile struct ci_tx_desc *txdp,
 	/* do two at a time while possible, in bursts */
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
 		uint64_t hi_qw3 = hi_qw_tmpl |
-				((uint64_t)pkt[3]->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
+				((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		uint64_t hi_qw2 = hi_qw_tmpl |
-				((uint64_t)pkt[2]->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
+				((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		uint64_t hi_qw1 = hi_qw_tmpl |
-				((uint64_t)pkt[1]->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
+				((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		uint64_t hi_qw0 = hi_qw_tmpl |
-				((uint64_t)pkt[0]->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
+				((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 
 		__m256i desc2_3 = _mm256_set_epi64x(
 				hi_qw3, pkt[3]->buf_iova + pkt[3]->data_off,
@@ -743,7 +742,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
-	uint64_t rs = I40E_TX_DESC_CMD_RS | I40E_TD_CMD;
+	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
 		ci_tx_free_bufs_vec(txq, i40e_tx_desc_done, false);
@@ -785,8 +784,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
-						I40E_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
index 9a967faeee..def03e14e3 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
@@ -752,9 +752,9 @@ i40e_recv_scattered_pkts_vec_avx512(void *rx_queue,
 static inline void
 vtx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags)
 {
-	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
-		((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT) |
-		((uint64_t)pkt->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		((uint64_t)flags  << CI_TXD_QW1_CMD_S) |
+		((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	__m128i descriptor = _mm_set_epi64x(high_qw,
 				pkt->buf_iova + pkt->data_off);
@@ -765,26 +765,17 @@ static inline void
 vtx(volatile struct ci_tx_desc *txdp,
 	struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
-	const uint64_t hi_qw_tmpl = (I40E_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | ((uint64_t)flags  << CI_TXD_QW1_CMD_S));
 
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
-		uint64_t hi_qw3 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[3]->data_len <<
-			 I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
-		uint64_t hi_qw2 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[2]->data_len <<
-			 I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
-		uint64_t hi_qw1 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[1]->data_len <<
-			 I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
-		uint64_t hi_qw0 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[0]->data_len <<
-			 I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
+		uint64_t hi_qw3 = hi_qw_tmpl |
+			((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw2 = hi_qw_tmpl |
+			((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw1 = hi_qw_tmpl |
+			((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw0 = hi_qw_tmpl |
+			((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 
 		__m512i desc0_3 =
 			_mm512_set_epi64
@@ -811,7 +802,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
-	uint64_t rs = I40E_TX_DESC_CMD_RS | I40E_TD_CMD;
+	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
 		ci_tx_free_bufs_vec(txq, i40e_tx_desc_done, false);
@@ -854,8 +845,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
-						I40E_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_common.h b/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
index 1fd7fc75bf..292a39501e 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
@@ -16,8 +16,8 @@ static inline int
 i40e_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 {
 	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) ==
-				rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+				rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 }
 
 static inline void
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
index 0b95152232..839e53e93e 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
@@ -600,9 +600,9 @@ static inline void
 vtx1(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf *pkt, uint64_t flags)
 {
-	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT) |
-			((uint64_t)pkt->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+			((uint64_t)flags  << CI_TXD_QW1_CMD_S) |
+			((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off, high_qw};
 	vst1q_u64(RTE_CAST_PTR(uint64_t *, txdp), descriptor);
@@ -627,7 +627,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict tx_queue,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
-	uint64_t rs = I40E_TX_DESC_CMD_RS | I40E_TD_CMD;
+	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
 	int i;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
@@ -669,8 +669,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict tx_queue,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
-						I40E_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_sse.c b/drivers/net/intel/i40e/i40e_rxtx_vec_sse.c
index 2a3baa415e..6b9a291173 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_sse.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_sse.c
@@ -607,9 +607,9 @@ static inline void
 vtx1(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf *pkt, uint64_t flags)
 {
-	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT) |
-			((uint64_t)pkt->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+			((uint64_t)flags  << CI_TXD_QW1_CMD_S) |
+			((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	__m128i descriptor = _mm_set_epi64x(high_qw,
 				pkt->buf_iova + pkt->data_off);
@@ -635,7 +635,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
-	uint64_t rs = I40E_TX_DESC_CMD_RS | I40E_TD_CMD;
+	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
 	int i;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
@@ -677,8 +677,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
-						I40E_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index 4605523673..9946e112e8 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -284,7 +284,7 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
 		txq->ci_tx_ring[i].cmd_type_offset_bsz =
-			rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
 		txe[prev].next_id = i;
@@ -2352,12 +2352,12 @@ iavf_fill_ctx_desc_cmd_field(volatile uint64_t *field, struct rte_mbuf *m,
 
 	/* TSO enabled */
 	if (m->ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
-		cmd = IAVF_TX_CTX_DESC_TSO << IAVF_TXD_CTX_QW1_CMD_SHIFT;
+		cmd = CI_TX_CTX_DESC_TSO << IAVF_TXD_CTX_QW1_CMD_SHIFT;
 
 	if ((m->ol_flags & RTE_MBUF_F_TX_VLAN &&
 			vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2) ||
 			m->ol_flags & RTE_MBUF_F_TX_QINQ) {
-		cmd |= IAVF_TX_CTX_DESC_IL2TAG2
+		cmd |= CI_TX_CTX_DESC_IL2TAG2
 			<< IAVF_TXD_CTX_QW1_CMD_SHIFT;
 	}
 
@@ -2578,20 +2578,20 @@ iavf_build_data_desc_cmd_offset_fields(volatile uint64_t *qw1,
 	uint64_t offset = 0;
 	uint64_t l2tag1 = 0;
 
-	*qw1 = IAVF_TX_DESC_DTYPE_DATA;
+	*qw1 = CI_TX_DESC_DTYPE_DATA;
 
-	command = (uint64_t)IAVF_TX_DESC_CMD_ICRC;
+	command = (uint64_t)CI_TX_DESC_CMD_ICRC;
 
 	/* Descriptor based VLAN insertion */
 	if ((vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1) &&
 			m->ol_flags & RTE_MBUF_F_TX_VLAN) {
-		command |= (uint64_t)IAVF_TX_DESC_CMD_IL2TAG1;
+		command |= (uint64_t)CI_TX_DESC_CMD_IL2TAG1;
 		l2tag1 |= m->vlan_tci;
 	}
 
 	/* Descriptor based QinQ insertion. vlan_flag specifies outer tag location. */
 	if (m->ol_flags & RTE_MBUF_F_TX_QINQ) {
-		command |= (uint64_t)IAVF_TX_DESC_CMD_IL2TAG1;
+		command |= (uint64_t)CI_TX_DESC_CMD_IL2TAG1;
 		l2tag1 = vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1 ? m->vlan_tci_outer :
 									m->vlan_tci;
 	}
@@ -2604,32 +2604,32 @@ iavf_build_data_desc_cmd_offset_fields(volatile uint64_t *qw1,
 	if (m->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK &&
 			!(m->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD))
 		offset |= (m->outer_l2_len >> 1)
-			<< IAVF_TX_DESC_LENGTH_MACLEN_SHIFT;
+			<< CI_TX_DESC_LEN_MACLEN_S;
 	else
 		offset |= (m->l2_len >> 1)
-			<< IAVF_TX_DESC_LENGTH_MACLEN_SHIFT;
+			<< CI_TX_DESC_LEN_MACLEN_S;
 
 	/* Enable L3 checksum offloading inner */
 	if (m->ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
 		if (m->ol_flags & RTE_MBUF_F_TX_IPV4) {
-			command |= IAVF_TX_DESC_CMD_IIPT_IPV4_CSUM;
-			offset |= (m->l3_len >> 2) << IAVF_TX_DESC_LENGTH_IPLEN_SHIFT;
+			command |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
+			offset |= (m->l3_len >> 2) << CI_TX_DESC_LEN_IPLEN_S;
 		}
 	} else if (m->ol_flags & RTE_MBUF_F_TX_IPV4) {
-		command |= IAVF_TX_DESC_CMD_IIPT_IPV4;
-		offset |= (m->l3_len >> 2) << IAVF_TX_DESC_LENGTH_IPLEN_SHIFT;
+		command |= CI_TX_DESC_CMD_IIPT_IPV4;
+		offset |= (m->l3_len >> 2) << CI_TX_DESC_LEN_IPLEN_S;
 	} else if (m->ol_flags & RTE_MBUF_F_TX_IPV6) {
-		command |= IAVF_TX_DESC_CMD_IIPT_IPV6;
-		offset |= (m->l3_len >> 2) << IAVF_TX_DESC_LENGTH_IPLEN_SHIFT;
+		command |= CI_TX_DESC_CMD_IIPT_IPV6;
+		offset |= (m->l3_len >> 2) << CI_TX_DESC_LEN_IPLEN_S;
 	}
 
 	if (m->ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) {
 		if (m->ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-			command |= IAVF_TX_DESC_CMD_L4T_EOFT_TCP;
+			command |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
 		else
-			command |= IAVF_TX_DESC_CMD_L4T_EOFT_UDP;
+			command |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
 		offset |= (m->l4_len >> 2) <<
-			      IAVF_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+			      CI_TX_DESC_LEN_L4_LEN_S;
 
 		*qw1 = rte_cpu_to_le_64((((uint64_t)command <<
 			IAVF_TXD_DATA_QW1_CMD_SHIFT) & IAVF_TXD_DATA_QW1_CMD_MASK) |
@@ -2643,19 +2643,19 @@ iavf_build_data_desc_cmd_offset_fields(volatile uint64_t *qw1,
 	/* Enable L4 checksum offloads */
 	switch (m->ol_flags & RTE_MBUF_F_TX_L4_MASK) {
 	case RTE_MBUF_F_TX_TCP_CKSUM:
-		command |= IAVF_TX_DESC_CMD_L4T_EOFT_TCP;
+		command |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
 		offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
-				IAVF_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+				CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_SCTP_CKSUM:
-		command |= IAVF_TX_DESC_CMD_L4T_EOFT_SCTP;
+		command |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
 		offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
-				IAVF_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+				CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_UDP_CKSUM:
-		command |= IAVF_TX_DESC_CMD_L4T_EOFT_UDP;
+		command |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
 		offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
-				IAVF_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+				CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	}
 
@@ -2675,8 +2675,7 @@ iavf_calc_pkt_desc(struct rte_mbuf *tx_pkt)
 	uint16_t count = 0;
 
 	while (txd != NULL) {
-		count += (txd->data_len + IAVF_MAX_DATA_PER_TXD - 1) /
-			IAVF_MAX_DATA_PER_TXD;
+		count += (txd->data_len + CI_MAX_DATA_PER_TXD - 1) / CI_MAX_DATA_PER_TXD;
 		txd = txd->next;
 	}
 
@@ -2882,14 +2881,14 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			buf_dma_addr = rte_mbuf_data_iova(mb_seg);
 			while ((mb_seg->ol_flags & (RTE_MBUF_F_TX_TCP_SEG |
 					RTE_MBUF_F_TX_UDP_SEG)) &&
-					unlikely(slen > IAVF_MAX_DATA_PER_TXD)) {
+					unlikely(slen > CI_MAX_DATA_PER_TXD)) {
 				iavf_fill_data_desc(ddesc, ddesc_template,
-					IAVF_MAX_DATA_PER_TXD, buf_dma_addr);
+					CI_MAX_DATA_PER_TXD, buf_dma_addr);
 
 				IAVF_DUMP_TX_DESC(txq, ddesc, desc_idx);
 
-				buf_dma_addr += IAVF_MAX_DATA_PER_TXD;
-				slen -= IAVF_MAX_DATA_PER_TXD;
+				buf_dma_addr += CI_MAX_DATA_PER_TXD;
+				slen -= CI_MAX_DATA_PER_TXD;
 
 				txe->last_id = desc_idx_last;
 				desc_idx = txe->next_id;
@@ -2910,7 +2909,7 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		} while (mb_seg);
 
 		/* The last packet data descriptor needs End Of Packet (EOP) */
-		ddesc_cmd = IAVF_TX_DESC_CMD_EOP;
+		ddesc_cmd = CI_TX_DESC_CMD_EOP;
 
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_desc_required);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_desc_required);
@@ -2920,7 +2919,7 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 				   "%4u (port=%d queue=%d)",
 				   desc_idx_last, txq->port_id, txq->queue_id);
 
-			ddesc_cmd |= IAVF_TX_DESC_CMD_RS;
+			ddesc_cmd |= CI_TX_DESC_CMD_RS;
 
 			/* Update txq RS bit counters */
 			txq->nb_tx_used = 0;
@@ -4465,9 +4464,8 @@ iavf_dev_tx_desc_status(void *tx_queue, uint16_t offset)
 	}
 
 	status = &txq->ci_tx_ring[desc].cmd_type_offset_bsz;
-	mask = rte_le_to_cpu_64(IAVF_TXD_QW1_DTYPE_MASK);
-	expect = rte_cpu_to_le_64(
-		 IAVF_TX_DESC_DTYPE_DESC_DONE << IAVF_TXD_QW1_DTYPE_SHIFT);
+	mask = rte_le_to_cpu_64(CI_TXD_QW1_DTYPE_M);
+	expect = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE << CI_TXD_QW1_DTYPE_S);
 	if ((*status & mask) == expect)
 		return RTE_ETH_TX_DESC_DONE;
 
diff --git a/drivers/net/intel/iavf/iavf_rxtx.h b/drivers/net/intel/iavf/iavf_rxtx.h
index 14580c5b8b..86281aa965 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.h
+++ b/drivers/net/intel/iavf/iavf_rxtx.h
@@ -162,10 +162,6 @@
 #define IAVF_TX_OFFLOAD_NOTSUP_MASK \
 		(RTE_MBUF_F_TX_OFFLOAD_MASK ^ IAVF_TX_OFFLOAD_MASK)
 
-/* HW requires that TX buffer size ranges from 1B up to (16K-1)B. */
-#define IAVF_MAX_DATA_PER_TXD \
-	(IAVF_TXD_QW1_TX_BUF_SZ_MASK >> IAVF_TXD_QW1_TX_BUF_SZ_SHIFT)
-
 #define IAVF_TX_LLDP_DYNFIELD "intel_pmd_dynfield_tx_lldp"
 #define IAVF_CHECK_TX_LLDP(m) \
 	((rte_pmd_iavf_tx_lldp_dynfield_offset > 0) && \
@@ -195,18 +191,6 @@ struct iavf_rx_queue_stats {
 	struct iavf_ipsec_crypto_stats ipsec_crypto;
 };
 
-/* Offload features */
-union iavf_tx_offload {
-	uint64_t data;
-	struct {
-		uint64_t l2_len:7; /* L2 (MAC) Header Length. */
-		uint64_t l3_len:9; /* L3 (IP) Header Length. */
-		uint64_t l4_len:8; /* L4 Header Length. */
-		uint64_t tso_segsz:16; /* TCP TSO segment size */
-		/* uint64_t unused : 24; */
-	};
-};
-
 /* Rx Flex Descriptor
  * RxDID Profile ID 16-21
  * Flex-field 0: RSS hash lower 16-bits
@@ -410,7 +394,7 @@ enum iavf_rx_flex_desc_ipsec_crypto_status {
 
 
 #define IAVF_TXD_DATA_QW1_DTYPE_SHIFT	(0)
-#define IAVF_TXD_DATA_QW1_DTYPE_MASK	(0xFUL << IAVF_TXD_QW1_DTYPE_SHIFT)
+#define IAVF_TXD_DATA_QW1_DTYPE_MASK	(0xFUL << CI_TXD_QW1_DTYPE_S)
 
 #define IAVF_TXD_DATA_QW1_CMD_SHIFT	(4)
 #define IAVF_TXD_DATA_QW1_CMD_MASK	(0x3FFUL << IAVF_TXD_DATA_QW1_CMD_SHIFT)
@@ -689,7 +673,7 @@ void iavf_dump_tx_descriptor(const struct ci_tx_queue *txq,
 		rte_le_to_cpu_64(tx_desc->cmd_type_offset_bsz &
 			rte_cpu_to_le_64(IAVF_TXD_DATA_QW1_DTYPE_MASK));
 	switch (type) {
-	case IAVF_TX_DESC_DTYPE_DATA:
+	case CI_TX_DESC_DTYPE_DATA:
 		name = "Tx_data_desc";
 		break;
 	case IAVF_TX_DESC_DTYPE_CONTEXT:
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
index 82861b8398..e92a84a51a 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
@@ -1633,10 +1633,9 @@ static __rte_always_inline void
 iavf_vtx1(volatile struct ci_tx_desc *txdp,
 	  struct rte_mbuf *pkt, uint64_t flags, bool offload, uint8_t vlan_flag)
 {
-	uint64_t high_qw =
-		(IAVF_TX_DESC_DTYPE_DATA |
-		 ((uint64_t)flags  << IAVF_TXD_QW1_CMD_SHIFT) |
-		 ((uint64_t)pkt->data_len << IAVF_TXD_QW1_TX_BUF_SZ_SHIFT));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		 ((uint64_t)flags  << CI_TXD_QW1_CMD_S) |
+		 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 	if (offload)
 		iavf_txd_enable_offload(pkt, &high_qw, vlan_flag);
 
@@ -1649,8 +1648,7 @@ static __rte_always_inline void
 iavf_vtx(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags, bool offload, uint8_t vlan_flag)
 {
-	const uint64_t hi_qw_tmpl = (IAVF_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << IAVF_TXD_QW1_CMD_SHIFT));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | ((uint64_t)flags  << CI_TXD_QW1_CMD_S));
 
 	/* if unaligned on 32-bit boundary, do one to align */
 	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
@@ -1660,28 +1658,20 @@ iavf_vtx(volatile struct ci_tx_desc *txdp,
 
 	/* do two at a time while possible, in bursts */
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
-		uint64_t hi_qw3 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[3]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
+		uint64_t hi_qw3 = hi_qw_tmpl |
+			((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			iavf_txd_enable_offload(pkt[3], &hi_qw3, vlan_flag);
-		uint64_t hi_qw2 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[2]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
+		uint64_t hi_qw2 = hi_qw_tmpl |
+			((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			iavf_txd_enable_offload(pkt[2], &hi_qw2, vlan_flag);
-		uint64_t hi_qw1 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[1]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
+		uint64_t hi_qw1 = hi_qw_tmpl |
+			((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			iavf_txd_enable_offload(pkt[1], &hi_qw1, vlan_flag);
-		uint64_t hi_qw0 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[0]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
+		uint64_t hi_qw0 = hi_qw_tmpl |
+			((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			iavf_txd_enable_offload(pkt[0], &hi_qw0, vlan_flag);
 
@@ -1717,8 +1707,8 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	/* bit2 is reserved and must be set to 1 according to Spec */
-	uint64_t flags = IAVF_TX_DESC_CMD_EOP | IAVF_TX_DESC_CMD_ICRC;
-	uint64_t rs = IAVF_TX_DESC_CMD_RS | flags;
+	uint64_t flags = CI_TX_DESC_CMD_EOP | CI_TX_DESC_CMD_ICRC;
+	uint64_t rs = CI_TX_DESC_CMD_RS | flags;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
 		ci_tx_free_bufs_vec(txq, iavf_tx_desc_done, false);
@@ -1761,8 +1751,7 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
-					 IAVF_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c b/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
index ad1b0b90cd..ff9d3c009a 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
@@ -1844,10 +1844,9 @@ iavf_vtx1(volatile struct ci_tx_desc *txdp,
 	  struct rte_mbuf *pkt, uint64_t flags,
 	  bool offload, uint8_t vlan_flag)
 {
-	uint64_t high_qw =
-		(IAVF_TX_DESC_DTYPE_DATA |
-		 ((uint64_t)flags  << IAVF_TXD_QW1_CMD_SHIFT) |
-		 ((uint64_t)pkt->data_len << IAVF_TXD_QW1_TX_BUF_SZ_SHIFT));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		 ((uint64_t)flags  << CI_TXD_QW1_CMD_S) |
+		 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 	if (offload)
 		iavf_txd_enable_offload(pkt, &high_qw, vlan_flag);
 
@@ -1863,8 +1862,7 @@ iavf_vtx(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags,
 		bool offload, uint8_t vlan_flag)
 {
-	const uint64_t hi_qw_tmpl = (IAVF_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << IAVF_TXD_QW1_CMD_SHIFT));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags  << CI_TXD_QW1_CMD_S));
 
 	/* if unaligned on 32-bit boundary, do one to align */
 	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
@@ -1874,22 +1872,14 @@ iavf_vtx(volatile struct ci_tx_desc *txdp,
 
 	/* do 4 at a time while possible, in bursts */
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
-		uint64_t hi_qw3 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[3]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
-		uint64_t hi_qw2 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[2]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
-		uint64_t hi_qw1 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[1]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
-		uint64_t hi_qw0 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[0]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
+		uint64_t hi_qw3 = hi_qw_tmpl |
+			((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw2 = hi_qw_tmpl |
+			((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw1 = hi_qw_tmpl |
+			((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw0 = hi_qw_tmpl |
+			((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload) {
 			iavf_txd_enable_offload(pkt[3], &hi_qw3, vlan_flag);
 			iavf_txd_enable_offload(pkt[2], &hi_qw2, vlan_flag);
@@ -2093,9 +2083,9 @@ ctx_vtx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf *pkt,
 	if (IAVF_CHECK_TX_LLDP(pkt))
 		high_ctx_qw |= IAVF_TX_CTX_DESC_SWTCH_UPLINK
 			<< IAVF_TXD_CTX_QW1_CMD_SHIFT;
-	uint64_t high_data_qw = (IAVF_TX_DESC_DTYPE_DATA |
-				((uint64_t)flags  << IAVF_TXD_QW1_CMD_SHIFT) |
-				((uint64_t)pkt->data_len << IAVF_TXD_QW1_TX_BUF_SZ_SHIFT));
+	uint64_t high_data_qw = (CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)flags  << CI_TXD_QW1_CMD_S) |
+				((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 	if (offload)
 		iavf_txd_enable_offload(pkt, &high_data_qw, vlan_flag);
 
@@ -2110,8 +2100,7 @@ ctx_vtx(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags,
 		bool offload, uint8_t vlan_flag)
 {
-	uint64_t hi_data_qw_tmpl = (IAVF_TX_DESC_DTYPE_DATA |
-					((uint64_t)flags  << IAVF_TXD_QW1_CMD_SHIFT));
+	uint64_t hi_data_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags  << CI_TXD_QW1_CMD_S));
 
 	/* if unaligned on 32-bit boundary, do one to align */
 	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
@@ -2128,11 +2117,9 @@ ctx_vtx(volatile struct ci_tx_desc *txdp,
 		uint64_t hi_data_qw0 = 0;
 
 		hi_data_qw1 = hi_data_qw_tmpl |
-				((uint64_t)pkt[1]->data_len <<
-					IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
+				((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		hi_data_qw0 = hi_data_qw_tmpl |
-				((uint64_t)pkt[0]->data_len <<
-					IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
+				((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 
 #ifdef IAVF_TX_VLAN_QINQ_OFFLOAD
 		if (offload) {
@@ -2140,13 +2127,11 @@ ctx_vtx(volatile struct ci_tx_desc *txdp,
 				uint64_t qinq_tag = vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2 ?
 					(uint64_t)pkt[1]->vlan_tci :
 					(uint64_t)pkt[1]->vlan_tci_outer;
-				hi_ctx_qw1 |= IAVF_TX_CTX_DESC_IL2TAG2 <<
-						IAVF_TXD_CTX_QW1_CMD_SHIFT;
+				hi_ctx_qw1 |= CI_TX_CTX_DESC_IL2TAG2 << CI_TXD_QW1_CMD_S;
 				low_ctx_qw1 |= qinq_tag << IAVF_TXD_CTX_QW0_L2TAG2_PARAM;
 			} else if (pkt[1]->ol_flags & RTE_MBUF_F_TX_VLAN &&
 					vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2) {
-				hi_ctx_qw1 |=
-					IAVF_TX_CTX_DESC_IL2TAG2 << IAVF_TXD_CTX_QW1_CMD_SHIFT;
+				hi_ctx_qw1 |= IAVF_TX_CTX_DESC_IL2TAG2 << CI_TXD_QW1_CMD_S;
 				low_ctx_qw1 |=
 					(uint64_t)pkt[1]->vlan_tci << IAVF_TXD_CTX_QW0_L2TAG2_PARAM;
 			}
@@ -2154,7 +2139,7 @@ ctx_vtx(volatile struct ci_tx_desc *txdp,
 #endif
 		if (IAVF_CHECK_TX_LLDP(pkt[1]))
 			hi_ctx_qw1 |= IAVF_TX_CTX_DESC_SWTCH_UPLINK
-				<< IAVF_TXD_CTX_QW1_CMD_SHIFT;
+				<< CI_TXD_QW1_CMD_S;
 
 #ifdef IAVF_TX_VLAN_QINQ_OFFLOAD
 		if (offload) {
@@ -2162,21 +2147,18 @@ ctx_vtx(volatile struct ci_tx_desc *txdp,
 				uint64_t qinq_tag = vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2 ?
 					(uint64_t)pkt[0]->vlan_tci :
 					(uint64_t)pkt[0]->vlan_tci_outer;
-				hi_ctx_qw0 |= IAVF_TX_CTX_DESC_IL2TAG2 <<
-						IAVF_TXD_CTX_QW1_CMD_SHIFT;
+				hi_ctx_qw0 |= IAVF_TX_CTX_DESC_IL2TAG2 << CI_TXD_QW1_CMD_S;
 				low_ctx_qw0 |= qinq_tag << IAVF_TXD_CTX_QW0_L2TAG2_PARAM;
 			} else if (pkt[0]->ol_flags & RTE_MBUF_F_TX_VLAN &&
 					vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2) {
-				hi_ctx_qw0 |=
-					IAVF_TX_CTX_DESC_IL2TAG2 << IAVF_TXD_CTX_QW1_CMD_SHIFT;
+				hi_ctx_qw0 |= IAVF_TX_CTX_DESC_IL2TAG2 << CI_TXD_QW1_CMD_S;
 				low_ctx_qw0 |=
 					(uint64_t)pkt[0]->vlan_tci << IAVF_TXD_CTX_QW0_L2TAG2_PARAM;
 			}
 		}
 #endif
 		if (IAVF_CHECK_TX_LLDP(pkt[0]))
-			hi_ctx_qw0 |= IAVF_TX_CTX_DESC_SWTCH_UPLINK
-				<< IAVF_TXD_CTX_QW1_CMD_SHIFT;
+			hi_ctx_qw0 |= IAVF_TX_CTX_DESC_SWTCH_UPLINK << CI_TXD_QW1_CMD_S;
 
 		if (offload) {
 			iavf_txd_enable_offload(pkt[1], &hi_data_qw1, vlan_flag);
@@ -2207,8 +2189,8 @@ iavf_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	/* bit2 is reserved and must be set to 1 according to Spec */
-	uint64_t flags = IAVF_TX_DESC_CMD_EOP | IAVF_TX_DESC_CMD_ICRC;
-	uint64_t rs = IAVF_TX_DESC_CMD_RS | flags;
+	uint64_t flags = CI_TX_DESC_CMD_EOP | CI_TX_DESC_CMD_ICRC;
+	uint64_t rs = CI_TX_DESC_CMD_RS | flags;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
 		ci_tx_free_bufs_vec(txq, iavf_tx_desc_done, false);
@@ -2253,8 +2235,7 @@ iavf_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
-					 IAVF_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
@@ -2275,8 +2256,8 @@ iavf_xmit_fixed_burst_vec_avx512_ctx(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, nb_mbuf, tx_id;
 	/* bit2 is reserved and must be set to 1 according to Spec */
-	uint64_t flags = IAVF_TX_DESC_CMD_EOP | IAVF_TX_DESC_CMD_ICRC;
-	uint64_t rs = IAVF_TX_DESC_CMD_RS | flags;
+	uint64_t flags = CI_TX_DESC_CMD_EOP | CI_TX_DESC_CMD_ICRC;
+	uint64_t rs = CI_TX_DESC_CMD_RS | flags;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
 		ci_tx_free_bufs_vec(txq, iavf_tx_desc_done, true);
@@ -2321,8 +2302,7 @@ iavf_xmit_fixed_burst_vec_avx512_ctx(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
-					 IAVF_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_common.h b/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
index 1832b76f89..1538a44892 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
@@ -15,8 +15,8 @@ static inline int
 iavf_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 {
 	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(IAVF_TXD_QW1_DTYPE_MASK)) ==
-				rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+				rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 }
 
 static inline void
@@ -147,26 +147,26 @@ iavf_txd_enable_offload(__rte_unused struct rte_mbuf *tx_pkt,
 	/* Set MACLEN */
 	if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)
 		td_offset |= (tx_pkt->outer_l2_len >> 1)
-			<< IAVF_TX_DESC_LENGTH_MACLEN_SHIFT;
+			<< CI_TX_DESC_LEN_MACLEN_S;
 	else
 		td_offset |= (tx_pkt->l2_len >> 1)
-			<< IAVF_TX_DESC_LENGTH_MACLEN_SHIFT;
+			<< CI_TX_DESC_LEN_MACLEN_S;
 
 	/* Enable L3 checksum offloads */
 	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
 		if (ol_flags & RTE_MBUF_F_TX_IPV4) {
-			td_cmd |= IAVF_TX_DESC_CMD_IIPT_IPV4_CSUM;
+			td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
 			td_offset |= (tx_pkt->l3_len >> 2) <<
-				     IAVF_TX_DESC_LENGTH_IPLEN_SHIFT;
+				     CI_TX_DESC_LEN_IPLEN_S;
 		}
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
-		td_cmd |= IAVF_TX_DESC_CMD_IIPT_IPV4;
+		td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
 		td_offset |= (tx_pkt->l3_len >> 2) <<
-			     IAVF_TX_DESC_LENGTH_IPLEN_SHIFT;
+			     CI_TX_DESC_LEN_IPLEN_S;
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
-		td_cmd |= IAVF_TX_DESC_CMD_IIPT_IPV6;
+		td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
 		td_offset |= (tx_pkt->l3_len >> 2) <<
-			     IAVF_TX_DESC_LENGTH_IPLEN_SHIFT;
+			     CI_TX_DESC_LEN_IPLEN_S;
 	}
 
 	/* Enable L4 checksum offloads */
@@ -190,7 +190,7 @@ iavf_txd_enable_offload(__rte_unused struct rte_mbuf *tx_pkt,
 		break;
 	}
 
-	*txd_hi |= ((uint64_t)td_offset) << IAVF_TXD_QW1_OFFSET_SHIFT;
+	*txd_hi |= ((uint64_t)td_offset) << CI_TXD_QW1_OFFSET_S;
 #endif
 
 #ifdef IAVF_TX_VLAN_QINQ_OFFLOAD
@@ -198,17 +198,15 @@ iavf_txd_enable_offload(__rte_unused struct rte_mbuf *tx_pkt,
 		td_cmd |= IAVF_TX_DESC_CMD_IL2TAG1;
 		/* vlan_flag specifies outer tag location for QinQ. */
 		if (vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1)
-			*txd_hi |= ((uint64_t)tx_pkt->vlan_tci_outer <<
-					IAVF_TXD_QW1_L2TAG1_SHIFT);
+			*txd_hi |= ((uint64_t)tx_pkt->vlan_tci_outer << CI_TXD_QW1_L2TAG1_S);
 		else
-			*txd_hi |= ((uint64_t)tx_pkt->vlan_tci <<
-					IAVF_TXD_QW1_L2TAG1_SHIFT);
+			*txd_hi |= ((uint64_t)tx_pkt->vlan_tci << CI_TXD_QW1_L2TAG1_S);
 	} else if (ol_flags & RTE_MBUF_F_TX_VLAN && vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1) {
-		td_cmd |= IAVF_TX_DESC_CMD_IL2TAG1;
-		*txd_hi |= ((uint64_t)tx_pkt->vlan_tci << IAVF_TXD_QW1_L2TAG1_SHIFT);
+		td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
+		*txd_hi |= ((uint64_t)tx_pkt->vlan_tci << CI_TXD_QW1_L2TAG1_S);
 	}
 #endif
 
-	*txd_hi |= ((uint64_t)td_cmd) << IAVF_TXD_QW1_CMD_SHIFT;
+	*txd_hi |= ((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S;
 }
 #endif
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_sse.c b/drivers/net/intel/iavf/iavf_rxtx_vec_sse.c
index 89ec05fa5d..7c65ce0873 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_sse.c
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_sse.c
@@ -1244,11 +1244,9 @@ iavf_recv_scattered_pkts_vec_flex_rxd(void *rx_queue,
 static inline void
 vtx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags)
 {
-	uint64_t high_qw =
-			(IAVF_TX_DESC_DTYPE_DATA |
-			 ((uint64_t)flags  << IAVF_TXD_QW1_CMD_SHIFT) |
-			 ((uint64_t)pkt->data_len <<
-			  IAVF_TXD_QW1_TX_BUF_SZ_SHIFT));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+			 ((uint64_t)flags  << CI_TXD_QW1_CMD_S) |
+			 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	__m128i descriptor = _mm_set_epi64x(high_qw,
 					    pkt->buf_iova + pkt->data_off);
@@ -1273,8 +1271,8 @@ iavf_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
-	uint64_t flags = IAVF_TX_DESC_CMD_EOP | 0x04;  /* bit 2 must be set */
-	uint64_t rs = IAVF_TX_DESC_CMD_RS | flags;
+	uint64_t flags = CI_TX_DESC_CMD_EOP | 0x04;  /* bit 2 must be set */
+	uint64_t rs = CI_TX_DESC_CMD_RS | flags;
 	int i;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
@@ -1317,8 +1315,7 @@ iavf_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
-					 IAVF_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/ice/ice_dcf_ethdev.c b/drivers/net/intel/ice/ice_dcf_ethdev.c
index 5f537b4c12..4ceecc15c6 100644
--- a/drivers/net/intel/ice/ice_dcf_ethdev.c
+++ b/drivers/net/intel/ice/ice_dcf_ethdev.c
@@ -406,7 +406,7 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
 		txq->ci_tx_ring[i].cmd_type_offset_bsz =
-			rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
 		txe[prev].next_id = i;
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 0a6ca993c6..5864238092 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -1120,7 +1120,7 @@ ice_reset_tx_queue(struct ci_tx_queue *txq)
 		volatile struct ci_tx_desc *txd = &txq->ci_tx_ring[i];
 
 		txd->cmd_type_offset_bsz =
-			rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
 		txe[prev].next_id = i;
@@ -2548,9 +2548,8 @@ ice_tx_descriptor_status(void *tx_queue, uint16_t offset)
 	}
 
 	status = &txq->ci_tx_ring[desc].cmd_type_offset_bsz;
-	mask = rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M);
-	expect = rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE <<
-				  ICE_TXD_QW1_DTYPE_S);
+	mask = rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M);
+	expect = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE << CI_TXD_QW1_DTYPE_S);
 	if ((*status & mask) == expect)
 		return RTE_ETH_TX_DESC_DONE;
 
@@ -2896,7 +2895,7 @@ ice_recv_pkts(void *rx_queue,
 
 static inline void
 ice_parse_tunneling_params(uint64_t ol_flags,
-			    union ice_tx_offload tx_offload,
+			    union ci_tx_offload tx_offload,
 			    uint32_t *cd_tunneling)
 {
 	/* EIPT: External (outer) IP header type */
@@ -2957,58 +2956,58 @@ static inline void
 ice_txd_enable_checksum(uint64_t ol_flags,
 			uint32_t *td_cmd,
 			uint32_t *td_offset,
-			union ice_tx_offload tx_offload)
+			union ci_tx_offload tx_offload)
 {
 	/* Set MACLEN */
 	if (!(ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK))
 		*td_offset |= (tx_offload.l2_len >> 1)
-			<< ICE_TX_DESC_LEN_MACLEN_S;
+			<< CI_TX_DESC_LEN_MACLEN_S;
 
 	/* Enable L3 checksum offloads */
 	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
-		*td_cmd |= ICE_TX_DESC_CMD_IIPT_IPV4_CSUM;
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
 		*td_offset |= (tx_offload.l3_len >> 2) <<
-			ICE_TX_DESC_LEN_IPLEN_S;
+			CI_TX_DESC_LEN_IPLEN_S;
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
-		*td_cmd |= ICE_TX_DESC_CMD_IIPT_IPV4;
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
 		*td_offset |= (tx_offload.l3_len >> 2) <<
-			ICE_TX_DESC_LEN_IPLEN_S;
+			CI_TX_DESC_LEN_IPLEN_S;
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
-		*td_cmd |= ICE_TX_DESC_CMD_IIPT_IPV6;
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
 		*td_offset |= (tx_offload.l3_len >> 2) <<
-			ICE_TX_DESC_LEN_IPLEN_S;
+			CI_TX_DESC_LEN_IPLEN_S;
 	}
 
 	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
-		*td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_TCP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
 		*td_offset |= (tx_offload.l4_len >> 2) <<
-			      ICE_TX_DESC_LEN_L4_LEN_S;
+			      CI_TX_DESC_LEN_L4_LEN_S;
 		return;
 	}
 
 	if (ol_flags & RTE_MBUF_F_TX_UDP_SEG) {
-		*td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_UDP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
 		*td_offset |= (tx_offload.l4_len >> 2) <<
-			      ICE_TX_DESC_LEN_L4_LEN_S;
+			      CI_TX_DESC_LEN_L4_LEN_S;
 		return;
 	}
 
 	/* Enable L4 checksum offloads */
 	switch (ol_flags & RTE_MBUF_F_TX_L4_MASK) {
 	case RTE_MBUF_F_TX_TCP_CKSUM:
-		*td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_TCP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
 		*td_offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
-			      ICE_TX_DESC_LEN_L4_LEN_S;
+			      CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_SCTP_CKSUM:
-		*td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_SCTP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
 		*td_offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
-			      ICE_TX_DESC_LEN_L4_LEN_S;
+			      CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_UDP_CKSUM:
-		*td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_UDP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
 		*td_offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
-			      ICE_TX_DESC_LEN_L4_LEN_S;
+			      CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	default:
 		break;
@@ -3022,11 +3021,11 @@ ice_build_ctob(uint32_t td_cmd,
 	       uint16_t size,
 	       uint32_t td_tag)
 {
-	return rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DATA |
-				((uint64_t)td_cmd << ICE_TXD_QW1_CMD_S) |
-				((uint64_t)td_offset << ICE_TXD_QW1_OFFSET_S) |
-				((uint64_t)size << ICE_TXD_QW1_TX_BUF_SZ_S) |
-				((uint64_t)td_tag << ICE_TXD_QW1_L2TAG1_S));
+	return rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
+				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+				((uint64_t)size << CI_TXD_QW1_TX_BUF_SZ_S) |
+				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
 }
 
 /* Check if the context descriptor is needed for TX offloading */
@@ -3045,7 +3044,7 @@ ice_calc_context_desc(uint64_t flags)
 
 /* set ice TSO context descriptor */
 static inline uint64_t
-ice_set_tso_ctx(struct rte_mbuf *mbuf, union ice_tx_offload tx_offload)
+ice_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 {
 	uint64_t ctx_desc = 0;
 	uint32_t cd_cmd, hdr_len, cd_tso_len;
@@ -3059,18 +3058,15 @@ ice_set_tso_ctx(struct rte_mbuf *mbuf, union ice_tx_offload tx_offload)
 	hdr_len += (mbuf->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) ?
 		   tx_offload.outer_l2_len + tx_offload.outer_l3_len : 0;
 
-	cd_cmd = ICE_TX_CTX_DESC_TSO;
+	cd_cmd = CI_TX_CTX_DESC_TSO;
 	cd_tso_len = mbuf->pkt_len - hdr_len;
-	ctx_desc |= ((uint64_t)cd_cmd << ICE_TXD_CTX_QW1_CMD_S) |
+	ctx_desc |= ((uint64_t)cd_cmd << CI_TXD_QW1_CMD_S) |
 		    ((uint64_t)cd_tso_len << ICE_TXD_CTX_QW1_TSO_LEN_S) |
 		    ((uint64_t)mbuf->tso_segsz << ICE_TXD_CTX_QW1_MSS_S);
 
 	return ctx_desc;
 }
 
-/* HW requires that TX buffer size ranges from 1B up to (16K-1)B. */
-#define ICE_MAX_DATA_PER_TXD \
-	(ICE_TXD_QW1_TX_BUF_SZ_M >> ICE_TXD_QW1_TX_BUF_SZ_S)
 /* Calculate the number of TX descriptors needed for each pkt */
 static inline uint16_t
 ice_calc_pkt_desc(struct rte_mbuf *tx_pkt)
@@ -3079,7 +3075,7 @@ ice_calc_pkt_desc(struct rte_mbuf *tx_pkt)
 	uint16_t count = 0;
 
 	while (txd != NULL) {
-		count += DIV_ROUND_UP(txd->data_len, ICE_MAX_DATA_PER_TXD);
+		count += DIV_ROUND_UP(txd->data_len, CI_MAX_DATA_PER_TXD);
 		txd = txd->next;
 	}
 
@@ -3109,7 +3105,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	uint16_t slen;
 	uint64_t buf_dma_addr;
 	uint64_t ol_flags;
-	union ice_tx_offload tx_offload = {0};
+	union ci_tx_offload tx_offload = {0};
 
 	txq = tx_queue;
 	sw_ring = txq->sw_ring;
@@ -3177,7 +3173,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		/* Descriptor based VLAN insertion */
 		if (ol_flags & (RTE_MBUF_F_TX_VLAN | RTE_MBUF_F_TX_QINQ)) {
-			td_cmd |= ICE_TX_DESC_CMD_IL2TAG1;
+			td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
 			td_tag = tx_pkt->vlan_tci;
 		}
 
@@ -3185,7 +3181,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		cd_tunneling_params = 0;
 		if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) {
 			td_offset |= (tx_offload.outer_l2_len >> 1)
-				<< ICE_TX_DESC_LEN_MACLEN_S;
+				<< CI_TX_DESC_LEN_MACLEN_S;
 			ice_parse_tunneling_params(ol_flags, tx_offload,
 						   &cd_tunneling_params);
 		}
@@ -3215,8 +3211,8 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 					ice_set_tso_ctx(tx_pkt, tx_offload);
 			else if (ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST)
 				cd_type_cmd_tso_mss |=
-					((uint64_t)ICE_TX_CTX_DESC_TSYN <<
-					ICE_TXD_CTX_QW1_CMD_S) |
+					((uint64_t)CI_TX_CTX_DESC_TSYN <<
+					CI_TXD_QW1_CMD_S) |
 					 (((uint64_t)txq->ice_vsi->adapter->ptp_tx_index <<
 					 ICE_TXD_CTX_QW1_TSYN_S) & ICE_TXD_CTX_QW1_TSYN_M);
 
@@ -3227,8 +3223,8 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			if (ol_flags & RTE_MBUF_F_TX_QINQ) {
 				cd_l2tag2 = tx_pkt->vlan_tci_outer;
 				cd_type_cmd_tso_mss |=
-					((uint64_t)ICE_TX_CTX_DESC_IL2TAG2 <<
-					 ICE_TXD_CTX_QW1_CMD_S);
+					((uint64_t)CI_TX_CTX_DESC_IL2TAG2 <<
+					 CI_TXD_QW1_CMD_S);
 			}
 			ctx_txd->l2tag2 = rte_cpu_to_le_16(cd_l2tag2);
 			ctx_txd->qw1 =
@@ -3253,18 +3249,16 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			buf_dma_addr = rte_mbuf_data_iova(m_seg);
 
 			while ((ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) &&
-				unlikely(slen > ICE_MAX_DATA_PER_TXD)) {
+					unlikely(slen > CI_MAX_DATA_PER_TXD)) {
 				txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-				txd->cmd_type_offset_bsz =
-				rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DATA |
-				((uint64_t)td_cmd << ICE_TXD_QW1_CMD_S) |
-				((uint64_t)td_offset << ICE_TXD_QW1_OFFSET_S) |
-				((uint64_t)ICE_MAX_DATA_PER_TXD <<
-				 ICE_TXD_QW1_TX_BUF_SZ_S) |
-				((uint64_t)td_tag << ICE_TXD_QW1_L2TAG1_S));
+				txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+					((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
+					((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+					((uint64_t)CI_MAX_DATA_PER_TXD << CI_TXD_QW1_TX_BUF_SZ_S) |
+					((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
 
-				buf_dma_addr += ICE_MAX_DATA_PER_TXD;
-				slen -= ICE_MAX_DATA_PER_TXD;
+				buf_dma_addr += CI_MAX_DATA_PER_TXD;
+				slen -= CI_MAX_DATA_PER_TXD;
 
 				txe->last_id = tx_last;
 				tx_id = txe->next_id;
@@ -3274,12 +3268,11 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			}
 
 			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-			txd->cmd_type_offset_bsz =
-				rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DATA |
-				((uint64_t)td_cmd << ICE_TXD_QW1_CMD_S) |
-				((uint64_t)td_offset << ICE_TXD_QW1_OFFSET_S) |
-				((uint64_t)slen << ICE_TXD_QW1_TX_BUF_SZ_S) |
-				((uint64_t)td_tag << ICE_TXD_QW1_L2TAG1_S));
+			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
+				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+				((uint64_t)slen << CI_TXD_QW1_TX_BUF_SZ_S) |
+				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
 
 			txe->last_id = tx_last;
 			tx_id = txe->next_id;
@@ -3288,7 +3281,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		} while (m_seg);
 
 		/* fill the last descriptor with End of Packet (EOP) bit */
-		td_cmd |= ICE_TX_DESC_CMD_EOP;
+		td_cmd |= CI_TX_DESC_CMD_EOP;
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
@@ -3299,14 +3292,13 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 				   "%4u (port=%d queue=%d)",
 				   tx_last, txq->port_id, txq->queue_id);
 
-			td_cmd |= ICE_TX_DESC_CMD_RS;
+			td_cmd |= CI_TX_DESC_CMD_RS;
 
 			/* Update txq RS bit counters */
 			txq->nb_tx_used = 0;
 		}
 		txd->cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)td_cmd) <<
-					 ICE_TXD_QW1_CMD_S);
+			rte_cpu_to_le_64(((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S);
 
 		if (txq->tsq != NULL && txq->tsq->ts_flag > 0) {
 			uint64_t txtime = *RTE_MBUF_DYNFIELD(tx_pkt,
@@ -3353,8 +3345,8 @@ ice_tx_free_bufs(struct ci_tx_queue *txq)
 	uint16_t i;
 
 	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
-	     rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M)) !=
-	    rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE))
+	     rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
+	    rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
 		return 0;
 
 	txep = &txq->sw_ring[txq->tx_next_dd - (txq->tx_rs_thresh - 1)];
@@ -3579,8 +3571,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 		n = (uint16_t)(txq->nb_tx_desc - txq->tx_tail);
 		ice_tx_fill_hw_ring(txq, tx_pkts, n);
 		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS) <<
-					 ICE_TXD_QW1_CMD_S);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 		txq->tx_tail = 0;
 	}
@@ -3592,8 +3583,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	/* Determine if RS bit needs to be set */
 	if (txq->tx_tail > txq->tx_next_rs) {
 		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS) <<
-					 ICE_TXD_QW1_CMD_S);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 		if (txq->tx_next_rs >= txq->nb_tx_desc)
@@ -4852,9 +4842,9 @@ ice_fdir_programming(struct ice_pf *pf, struct ice_fltr_desc *fdir_desc)
 
 	txdp = &txq->ci_tx_ring[txq->tx_tail + 1];
 	txdp->buffer_addr = rte_cpu_to_le_64(pf->fdir.dma_addr);
-	td_cmd = ICE_TX_DESC_CMD_EOP |
-		ICE_TX_DESC_CMD_RS  |
-		ICE_TX_DESC_CMD_DUMMY;
+	td_cmd = CI_TX_DESC_CMD_EOP |
+		CI_TX_DESC_CMD_RS  |
+		CI_TX_DESC_CMD_DUMMY;
 
 	txdp->cmd_type_offset_bsz =
 		ice_build_ctob(td_cmd, 0, ICE_FDIR_PKT_LEN, 0);
@@ -4865,9 +4855,8 @@ ice_fdir_programming(struct ice_pf *pf, struct ice_fltr_desc *fdir_desc)
 	/* Update the tx tail register */
 	ICE_PCI_REG_WRITE(txq->qtx_tail, txq->tx_tail);
 	for (i = 0; i < ICE_FDIR_MAX_WAIT_US; i++) {
-		if ((txdp->cmd_type_offset_bsz &
-		     rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M)) ==
-		    rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE))
+		if ((txdp->cmd_type_offset_bsz & rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+		    rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
 			break;
 		rte_delay_us(1);
 	}
diff --git a/drivers/net/intel/ice/ice_rxtx.h b/drivers/net/intel/ice/ice_rxtx.h
index d7e8c1b0c4..3462196f6f 100644
--- a/drivers/net/intel/ice/ice_rxtx.h
+++ b/drivers/net/intel/ice/ice_rxtx.h
@@ -46,7 +46,7 @@
 
 #define ICE_SUPPORT_CHAIN_NUM 5
 
-#define ICE_TD_CMD                      ICE_TX_DESC_CMD_EOP
+#define ICE_TD_CMD                      CI_TX_DESC_CMD_EOP
 
 #define ICE_VPMD_RX_BURST            CI_VPMD_RX_BURST
 #define ICE_VPMD_TX_BURST            32
@@ -169,19 +169,6 @@ struct ice_txtime {
 	const struct rte_memzone *ts_mz;
 };
 
-/* Offload features */
-union ice_tx_offload {
-	uint64_t data;
-	struct {
-		uint64_t l2_len:7; /* L2 (MAC) Header Length. */
-		uint64_t l3_len:9; /* L3 (IP) Header Length. */
-		uint64_t l4_len:8; /* L4 Header Length. */
-		uint64_t tso_segsz:16; /* TCP TSO segment size */
-		uint64_t outer_l2_len:8; /* outer L2 Header Length */
-		uint64_t outer_l3_len:16; /* outer L3 Header Length */
-	};
-};
-
 /* Rx Flex Descriptor for Comms Package Profile
  * RxDID Profile ID 22 (swap Hash and FlowID)
  * Flex-field 0: Flow ID lower 16-bits
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
index d553c438f8..d0237a0c82 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
@@ -777,10 +777,9 @@ static __rte_always_inline void
 ice_vtx1(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf *pkt, uint64_t flags, bool offload)
 {
-	uint64_t high_qw =
-		(ICE_TX_DESC_DTYPE_DATA |
-		 ((uint64_t)flags  << ICE_TXD_QW1_CMD_S) |
-		 ((uint64_t)pkt->data_len << ICE_TXD_QW1_TX_BUF_SZ_S));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		 ((uint64_t)flags  << CI_TXD_QW1_CMD_S) |
+		 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 	if (offload)
 		ice_txd_enable_offload(pkt, &high_qw);
 
@@ -792,8 +791,7 @@ static __rte_always_inline void
 ice_vtx(volatile struct ci_tx_desc *txdp,
 	struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags, bool offload)
 {
-	const uint64_t hi_qw_tmpl = (ICE_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << ICE_TXD_QW1_CMD_S));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags  << CI_TXD_QW1_CMD_S));
 
 	/* if unaligned on 32-bit boundary, do one to align */
 	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
@@ -801,30 +799,22 @@ ice_vtx(volatile struct ci_tx_desc *txdp,
 		nb_pkts--, txdp++, pkt++;
 	}
 
-	/* do two at a time while possible, in bursts */
+	/* do four at a time while possible, in bursts */
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
-		uint64_t hi_qw3 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[3]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw3 = hi_qw_tmpl |
+			((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			ice_txd_enable_offload(pkt[3], &hi_qw3);
-		uint64_t hi_qw2 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[2]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw2 = hi_qw_tmpl |
+			((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			ice_txd_enable_offload(pkt[2], &hi_qw2);
-		uint64_t hi_qw1 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[1]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw1 = hi_qw_tmpl |
+			((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			ice_txd_enable_offload(pkt[1], &hi_qw1);
-		uint64_t hi_qw0 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[0]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw0 = hi_qw_tmpl |
+			((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			ice_txd_enable_offload(pkt[0], &hi_qw0);
 
@@ -856,7 +846,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = ICE_TD_CMD;
-	uint64_t rs = ICE_TX_DESC_CMD_RS | ICE_TD_CMD;
+	uint64_t rs = CI_TX_DESC_CMD_RS | ICE_TD_CMD;
 
 	/* cross rx_thresh boundary is not allowed */
 	nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);
@@ -901,8 +891,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS) <<
-					 ICE_TXD_QW1_CMD_S);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
index d42f41461f..9ef0777b9b 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
@@ -850,10 +850,9 @@ static __rte_always_inline void
 ice_vtx1(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf *pkt, uint64_t flags, bool do_offload)
 {
-	uint64_t high_qw =
-		(ICE_TX_DESC_DTYPE_DATA |
-		 ((uint64_t)flags  << ICE_TXD_QW1_CMD_S) |
-		 ((uint64_t)pkt->data_len << ICE_TXD_QW1_TX_BUF_SZ_S));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		 ((uint64_t)flags  << CI_TXD_QW1_CMD_S) |
+		 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	if (do_offload)
 		ice_txd_enable_offload(pkt, &high_qw);
@@ -866,32 +865,23 @@ static __rte_always_inline void
 ice_vtx(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkt,
 	uint16_t nb_pkts,  uint64_t flags, bool do_offload)
 {
-	const uint64_t hi_qw_tmpl = (ICE_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << ICE_TXD_QW1_CMD_S));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags  << CI_TXD_QW1_CMD_S));
 
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
-		uint64_t hi_qw3 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[3]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw3 = hi_qw_tmpl |
+			((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (do_offload)
 			ice_txd_enable_offload(pkt[3], &hi_qw3);
-		uint64_t hi_qw2 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[2]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw2 = hi_qw_tmpl |
+			((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (do_offload)
 			ice_txd_enable_offload(pkt[2], &hi_qw2);
-		uint64_t hi_qw1 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[1]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw1 = hi_qw_tmpl |
+			((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (do_offload)
 			ice_txd_enable_offload(pkt[1], &hi_qw1);
-		uint64_t hi_qw0 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[0]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw0 = hi_qw_tmpl |
+			((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (do_offload)
 			ice_txd_enable_offload(pkt[0], &hi_qw0);
 
@@ -920,7 +910,7 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = ICE_TD_CMD;
-	uint64_t rs = ICE_TX_DESC_CMD_RS | ICE_TD_CMD;
+	uint64_t rs = CI_TX_DESC_CMD_RS | ICE_TD_CMD;
 
 	/* cross rx_thresh boundary is not allowed */
 	nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);
@@ -966,8 +956,7 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS) <<
-					 ICE_TXD_QW1_CMD_S);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_common.h b/drivers/net/intel/ice/ice_rxtx_vec_common.h
index 8ba591e403..1d83a087cc 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_common.h
+++ b/drivers/net/intel/ice/ice_rxtx_vec_common.h
@@ -12,8 +12,8 @@ static inline int
 ice_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 {
 	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M)) ==
-				rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+				rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 }
 
 static inline void
@@ -124,53 +124,52 @@ ice_txd_enable_offload(struct rte_mbuf *tx_pkt,
 	/* Tx Checksum Offload */
 	/* SET MACLEN */
 	td_offset |= (tx_pkt->l2_len >> 1) <<
-		ICE_TX_DESC_LEN_MACLEN_S;
+		CI_TX_DESC_LEN_MACLEN_S;
 
 	/* Enable L3 checksum offload */
 	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
-		td_cmd |= ICE_TX_DESC_CMD_IIPT_IPV4_CSUM;
+		td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
 		td_offset |= (tx_pkt->l3_len >> 2) <<
-			ICE_TX_DESC_LEN_IPLEN_S;
+			CI_TX_DESC_LEN_IPLEN_S;
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
-		td_cmd |= ICE_TX_DESC_CMD_IIPT_IPV4;
+		td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
 		td_offset |= (tx_pkt->l3_len >> 2) <<
-			ICE_TX_DESC_LEN_IPLEN_S;
+			CI_TX_DESC_LEN_IPLEN_S;
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
-		td_cmd |= ICE_TX_DESC_CMD_IIPT_IPV6;
+		td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
 		td_offset |= (tx_pkt->l3_len >> 2) <<
-			ICE_TX_DESC_LEN_IPLEN_S;
+			CI_TX_DESC_LEN_IPLEN_S;
 	}
 
 	/* Enable L4 checksum offloads */
 	switch (ol_flags & RTE_MBUF_F_TX_L4_MASK) {
 	case RTE_MBUF_F_TX_TCP_CKSUM:
-		td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_TCP;
+		td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
 		td_offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
-			ICE_TX_DESC_LEN_L4_LEN_S;
+			CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_SCTP_CKSUM:
-		td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_SCTP;
+		td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
 		td_offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
-			ICE_TX_DESC_LEN_L4_LEN_S;
+			CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_UDP_CKSUM:
-		td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_UDP;
+		td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
 		td_offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
-			ICE_TX_DESC_LEN_L4_LEN_S;
+			CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	default:
 		break;
 	}
 
-	*txd_hi |= ((uint64_t)td_offset) << ICE_TXD_QW1_OFFSET_S;
+	*txd_hi |= ((uint64_t)td_offset) << CI_TXD_QW1_OFFSET_S;
 
-	/* Tx VLAN insertion Offload */
+	/* Tx VLAN/QINQ insertion Offload */
 	if (ol_flags & RTE_MBUF_F_TX_VLAN) {
-		td_cmd |= ICE_TX_DESC_CMD_IL2TAG1;
-		*txd_hi |= ((uint64_t)tx_pkt->vlan_tci <<
-				ICE_TXD_QW1_L2TAG1_S);
+		td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
+		*txd_hi |= ((uint64_t)tx_pkt->vlan_tci << CI_TXD_QW1_L2TAG1_S);
 	}
 
-	*txd_hi |= ((uint64_t)td_cmd) << ICE_TXD_QW1_CMD_S;
+	*txd_hi |= ((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S;
 }
 #endif
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_sse.c b/drivers/net/intel/ice/ice_rxtx_vec_sse.c
index c65240d659..c4920a1360 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_sse.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_sse.c
@@ -599,10 +599,9 @@ static inline void
 ice_vtx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf *pkt,
 	 uint64_t flags)
 {
-	uint64_t high_qw =
-		(ICE_TX_DESC_DTYPE_DATA |
-		 ((uint64_t)flags  << ICE_TXD_QW1_CMD_S) |
-		 ((uint64_t)pkt->data_len << ICE_TXD_QW1_TX_BUF_SZ_S));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		 ((uint64_t)flags  << CI_TXD_QW1_CMD_S) |
+		 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	__m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt));
 	_mm_store_si128(RTE_CAST_PTR(__m128i *, txdp), descriptor);
@@ -627,7 +626,7 @@ ice_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = ICE_TD_CMD;
-	uint64_t rs = ICE_TX_DESC_CMD_RS | ICE_TD_CMD;
+	uint64_t rs = CI_TX_DESC_CMD_RS | ICE_TD_CMD;
 	int i;
 
 	/* cross rx_thresh boundary is not allowed */
@@ -673,8 +672,7 @@ ice_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS) <<
-					 ICE_TXD_QW1_CMD_S);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 23666539ab..587871b54a 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -271,7 +271,7 @@ idpf_qc_single_tx_queue_reset(struct ci_tx_queue *txq)
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
 		txq->ci_tx_ring[i].cmd_type_offset_bsz =
-			rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
 		txe[prev].next_id = i;
@@ -849,7 +849,7 @@ idpf_calc_context_desc(uint64_t flags)
  */
 static inline void
 idpf_set_splitq_tso_ctx(struct rte_mbuf *mbuf,
-			union idpf_tx_offload tx_offload,
+			union ci_tx_offload tx_offload,
 			volatile union idpf_flex_tx_ctx_desc *ctx_desc)
 {
 	uint16_t cmd_dtype;
@@ -887,7 +887,7 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	volatile struct idpf_flex_tx_sched_desc *txr;
 	volatile struct idpf_flex_tx_sched_desc *txd;
 	struct ci_tx_entry *sw_ring;
-	union idpf_tx_offload tx_offload = {0};
+	union ci_tx_offload tx_offload = {0};
 	struct ci_tx_entry *txe, *txn;
 	uint16_t nb_used, tx_id, sw_id;
 	struct rte_mbuf *tx_pkt;
@@ -1334,7 +1334,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 {
 	volatile struct ci_tx_desc *txd;
 	volatile struct ci_tx_desc *txr;
-	union idpf_tx_offload tx_offload = {0};
+	union ci_tx_offload tx_offload = {0};
 	struct ci_tx_entry *txe, *txn;
 	struct ci_tx_entry *sw_ring;
 	struct ci_tx_queue *txq;
@@ -1452,10 +1452,10 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			slen = m_seg->data_len;
 			buf_dma_addr = rte_mbuf_data_iova(m_seg);
 			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DATA |
-				((uint64_t)td_cmd  << IDPF_TXD_QW1_CMD_S) |
-				((uint64_t)td_offset << IDPF_TXD_QW1_OFFSET_S) |
-				((uint64_t)slen << IDPF_TXD_QW1_TX_BUF_SZ_S));
+			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)td_cmd  << CI_TXD_QW1_CMD_S) |
+				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+				((uint64_t)slen << CI_TXD_QW1_TX_BUF_SZ_S));
 
 			txe->last_id = tx_last;
 			tx_id = txe->next_id;
@@ -1464,7 +1464,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		} while (m_seg);
 
 		/* The last packet data descriptor needs End Of Packet (EOP) */
-		td_cmd |= IDPF_TX_DESC_CMD_EOP;
+		td_cmd |= CI_TX_DESC_CMD_EOP;
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
@@ -1473,13 +1473,13 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			       "%4u (port=%d queue=%d)",
 			       tx_last, txq->port_id, txq->queue_id);
 
-			td_cmd |= IDPF_TX_DESC_CMD_RS;
+			td_cmd |= CI_TX_DESC_CMD_RS;
 
 			/* Update txq RS bit counters */
 			txq->nb_tx_used = 0;
 		}
 
-		txd->cmd_type_offset_bsz |= rte_cpu_to_le_16(td_cmd << IDPF_TXD_QW1_CMD_S);
+		txd->cmd_type_offset_bsz |= rte_cpu_to_le_16(td_cmd << CI_TXD_QW1_CMD_S);
 	}
 
 end_of_tx:
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.h b/drivers/net/intel/idpf/idpf_common_rxtx.h
index 2f2fa153b2..b88a87402d 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.h
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.h
@@ -169,18 +169,6 @@ struct idpf_rx_queue {
 	uint32_t hw_register_set;
 };
 
-/* Offload features */
-union idpf_tx_offload {
-	uint64_t data;
-	struct {
-		uint64_t l2_len:7; /* L2 (MAC) Header Length. */
-		uint64_t l3_len:9; /* L3 (IP) Header Length. */
-		uint64_t l4_len:8; /* L4 Header Length. */
-		uint64_t tso_segsz:16; /* TCP TSO segment size */
-		/* uint64_t unused : 24; */
-	};
-};
-
 union idpf_tx_desc {
 	struct ci_tx_desc *tx_ring;
 	struct idpf_flex_tx_sched_desc *desc_ring;
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c b/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
index 04efee3722..b6bf7fca76 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
@@ -486,10 +486,9 @@ static inline void
 idpf_singleq_vtx1(volatile struct ci_tx_desc *txdp,
 		  struct rte_mbuf *pkt, uint64_t flags)
 {
-	uint64_t high_qw =
-		(IDPF_TX_DESC_DTYPE_DATA |
-		 ((uint64_t)flags  << IDPF_TXD_QW1_CMD_S) |
-		 ((uint64_t)pkt->data_len << IDPF_TXD_QW1_TX_BUF_SZ_S));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		 ((uint64_t)flags  << CI_TXD_QW1_CMD_S) |
+		 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	__m128i descriptor = _mm_set_epi64x(high_qw,
 				pkt->buf_iova + pkt->data_off);
@@ -500,8 +499,7 @@ static inline void
 idpf_singleq_vtx(volatile struct ci_tx_desc *txdp,
 		 struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
-	const uint64_t hi_qw_tmpl = (IDPF_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << IDPF_TXD_QW1_CMD_S));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags  << CI_TXD_QW1_CMD_S));
 
 	/* if unaligned on 32-bit boundary, do one to align */
 	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
@@ -511,22 +509,14 @@ idpf_singleq_vtx(volatile struct ci_tx_desc *txdp,
 
 	/* do two at a time while possible, in bursts */
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
-		uint64_t hi_qw3 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[3]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
-		uint64_t hi_qw2 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[2]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
-		uint64_t hi_qw1 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[1]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
-		uint64_t hi_qw0 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[0]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw3 = hi_qw_tmpl |
+			((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw2 = hi_qw_tmpl |
+			((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw1 = hi_qw_tmpl |
+			((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw0 = hi_qw_tmpl |
+			((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 
 		__m256i desc2_3 =
 			_mm256_set_epi64x
@@ -559,8 +549,8 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts
 	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
-	uint64_t flags = IDPF_TX_DESC_CMD_EOP;
-	uint64_t rs = IDPF_TX_DESC_CMD_RS | flags;
+	uint64_t flags = CI_TX_DESC_CMD_EOP;
+	uint64_t rs = CI_TX_DESC_CMD_RS | flags;
 
 	/* cross rx_thresh boundary is not allowed */
 	nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);
@@ -605,8 +595,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)IDPF_TX_DESC_CMD_RS) <<
-					 IDPF_TXD_QW1_CMD_S);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
index d5e5a2ca5f..fcdec3a4d5 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
@@ -1003,10 +1003,9 @@ static __rte_always_inline void
 idpf_singleq_vtx1(volatile struct ci_tx_desc *txdp,
 	  struct rte_mbuf *pkt, uint64_t flags)
 {
-	uint64_t high_qw =
-		(IDPF_TX_DESC_DTYPE_DATA |
-		 ((uint64_t)flags  << IDPF_TXD_QW1_CMD_S) |
-		 ((uint64_t)pkt->data_len << IDPF_TXD_QW1_TX_BUF_SZ_S));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		 ((uint64_t)flags  << CI_TXD_QW1_CMD_S) |
+		 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	__m128i descriptor = _mm_set_epi64x(high_qw,
 					    pkt->buf_iova + pkt->data_off);
@@ -1019,8 +1018,7 @@ static __rte_always_inline void
 idpf_singleq_vtx(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
-	const uint64_t hi_qw_tmpl = (IDPF_TX_DESC_DTYPE_DATA  |
-			((uint64_t)flags  << IDPF_TXD_QW1_CMD_S));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA  | (flags  << CI_TXD_QW1_CMD_S));
 
 	/* if unaligned on 32-bit boundary, do one to align */
 	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
@@ -1030,22 +1028,14 @@ idpf_singleq_vtx(volatile struct ci_tx_desc *txdp,
 
 	/* do 4 at a time while possible, in bursts */
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
-		uint64_t hi_qw3 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[3]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
-		uint64_t hi_qw2 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[2]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
-		uint64_t hi_qw1 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[1]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
-		uint64_t hi_qw0 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[0]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw3 = hi_qw_tmpl |
+			((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw2 = hi_qw_tmpl |
+			((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw1 = hi_qw_tmpl |
+			((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw0 = hi_qw_tmpl |
+			((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 
 		__m512i desc0_3 =
 			_mm512_set_epi64
@@ -1075,8 +1065,8 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pk
 	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
-	uint64_t flags = IDPF_TX_DESC_CMD_EOP;
-	uint64_t rs = IDPF_TX_DESC_CMD_RS | flags;
+	uint64_t flags = CI_TX_DESC_CMD_EOP;
+	uint64_t rs = CI_TX_DESC_CMD_RS | flags;
 
 	/* cross rx_thresh boundary is not allowed */
 	nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);
@@ -1124,8 +1114,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pk
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)IDPF_TX_DESC_CMD_RS) <<
-					 IDPF_TXD_QW1_CMD_S);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/idpf/idpf_rxtx_vec_common.h b/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
index b5e8574667..a43d8f78e2 100644
--- a/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
+++ b/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
@@ -32,8 +32,8 @@ idpf_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 		return 1;
 
 	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(IDPF_TXD_QW1_DTYPE_M)) ==
-				rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+				rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 }
 
 static inline int
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [RFC PATCH 05/27] net/intel: create separate header for Tx scalar fns
  2025-12-19 17:25 [RFC PATCH 00/27] combine multiple Intel scalar Tx paths Bruce Richardson
                   ` (3 preceding siblings ...)
  2025-12-19 17:25 ` [RFC PATCH 04/27] net/intel: consolidate definitions for Tx desc fields Bruce Richardson
@ 2025-12-19 17:25 ` Bruce Richardson
  2025-12-19 17:25 ` [RFC PATCH 06/27] net/intel: add common fn to calculate needed descriptors Bruce Richardson
                   ` (25 subsequent siblings)
  30 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2025-12-19 17:25 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

Rather than having all Tx code in the one file, which could start
getting rather long, move the scalar datapath functions to a new header
file.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h            | 58 ++------------------
 drivers/net/intel/common/tx_scalar_fns.h | 67 ++++++++++++++++++++++++
 2 files changed, 72 insertions(+), 53 deletions(-)
 create mode 100644 drivers/net/intel/common/tx_scalar_fns.h

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 3d3d9ad8e3..320ab0b8e0 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -309,59 +309,6 @@ ci_tx_free_bufs_vec(struct ci_tx_queue *txq, ci_desc_done_fn desc_done, bool ctx
 	return txq->tx_rs_thresh;
 }
 
-/*
- * Common transmit descriptor cleanup function for Intel drivers.
- * Used by ice, i40e, iavf, and idpf drivers.
- *
- * Returns:
- *   0 on success
- *  -1 if cleanup cannot proceed (descriptors not yet processed by HW)
- */
-static __rte_always_inline int
-ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
-{
-	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
-	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
-	uint16_t nb_tx_desc = txq->nb_tx_desc;
-	uint16_t desc_to_clean_to;
-	uint16_t nb_tx_to_clean;
-
-	/* Determine the last descriptor needing to be cleaned */
-	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
-	if (desc_to_clean_to >= nb_tx_desc)
-		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
-
-	/* Check to make sure the last descriptor to clean is done */
-	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-
-	/* Check if descriptor is done - all drivers use 0xF as done value in bits 3:0 */
-	if ((txd[desc_to_clean_to].cmd_type_offset_bsz & rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
-			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE)) {
-		/* Descriptor not yet processed by hardware */
-		return -1;
-	}
-
-	/* Figure out how many descriptors will be cleaned */
-	if (last_desc_cleaned > desc_to_clean_to)
-		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) + desc_to_clean_to);
-	else
-		nb_tx_to_clean = (uint16_t)(desc_to_clean_to - last_desc_cleaned);
-
-	/* The last descriptor to clean is done, so that means all the
-	 * descriptors from the last descriptor that was cleaned
-	 * up to the last descriptor with the RS bit set
-	 * are done. Only reset the threshold descriptor.
-	 */
-	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
-
-	/* Update the txq to reflect the last descriptor that was cleaned */
-	txq->last_desc_cleaned = desc_to_clean_to;
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
-
-	return 0;
-}
-
 static inline void
 ci_txq_release_all_mbufs(struct ci_tx_queue *txq, bool use_ctx)
 {
@@ -480,4 +427,9 @@ ci_tx_path_select(const struct ci_tx_path_features *req_features,
 	return idx;
 }
 
+/* include the scalar functions at the end, so they can use the common definitions.
+ * This is done so drivers can use all functions just by including tx.h
+ */
+#include "tx_scalar_fns.h"
+
 #endif /* _COMMON_INTEL_TX_H_ */
diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
new file mode 100644
index 0000000000..c79210d084
--- /dev/null
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -0,0 +1,67 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2025 Intel Corporation
+ */
+
+#ifndef _COMMON_INTEL_TX_SCALAR_FNS_H_
+#define _COMMON_INTEL_TX_SCALAR_FNS_H_
+
+#include <stdint.h>
+#include <rte_byteorder.h>
+
+/* depends on common Tx definitions. */
+#include "tx.h"
+
+/*
+ * Common transmit descriptor cleanup function for Intel drivers.
+ * Used by ice, i40e, iavf, and idpf drivers.
+ *
+ * Returns:
+ *   0 on success
+ *  -1 if cleanup cannot proceed (descriptors not yet processed by HW)
+ */
+static __rte_always_inline int
+ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
+{
+	struct ci_tx_entry *sw_ring = txq->sw_ring;
+	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
+	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
+	uint16_t nb_tx_desc = txq->nb_tx_desc;
+	uint16_t desc_to_clean_to;
+	uint16_t nb_tx_to_clean;
+
+	/* Determine the last descriptor needing to be cleaned */
+	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
+	if (desc_to_clean_to >= nb_tx_desc)
+		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
+
+	/* Check to make sure the last descriptor to clean is done */
+	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
+
+	/* Check if descriptor is done - all drivers use 0xF as done value in bits 3:0 */
+	if ((txd[desc_to_clean_to].cmd_type_offset_bsz & rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE)) {
+		/* Descriptor not yet processed by hardware */
+		return -1;
+	}
+
+	/* Figure out how many descriptors will be cleaned */
+	if (last_desc_cleaned > desc_to_clean_to)
+		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) + desc_to_clean_to);
+	else
+		nb_tx_to_clean = (uint16_t)(desc_to_clean_to - last_desc_cleaned);
+
+	/* The last descriptor to clean is done, so that means all the
+	 * descriptors from the last descriptor that was cleaned
+	 * up to the last descriptor with the RS bit set
+	 * are done. Only reset the threshold descriptor.
+	 */
+	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
+
+	/* Update the txq to reflect the last descriptor that was cleaned */
+	txq->last_desc_cleaned = desc_to_clean_to;
+	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
+
+	return 0;
+}
+
+#endif /* _COMMON_INTEL_TX_SCALAR_FNS_H_ */
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [RFC PATCH 06/27] net/intel: add common fn to calculate needed descriptors
  2025-12-19 17:25 [RFC PATCH 00/27] combine multiple Intel scalar Tx paths Bruce Richardson
                   ` (4 preceding siblings ...)
  2025-12-19 17:25 ` [RFC PATCH 05/27] net/intel: create separate header for Tx scalar fns Bruce Richardson
@ 2025-12-19 17:25 ` Bruce Richardson
  2025-12-19 17:25 ` [RFC PATCH 07/27] net/ice: refactor context descriptor handling Bruce Richardson
                   ` (24 subsequent siblings)
  30 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2025-12-19 17:25 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Vladimir Medvedkin, Anatoly Burakov,
	Jingjing Wu, Praveen Shetty

Multiple drivers used the same logic to calculate how many Tx data
descriptors were needed. Move that calculation to common code. In the
process of updating drivers, fix idpf driver calculation for the TSO
case.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar_fns.h  | 21 +++++++++++++++++++++
 drivers/net/intel/i40e/i40e_rxtx.c        | 18 +-----------------
 drivers/net/intel/iavf/iavf_rxtx.c        | 17 +----------------
 drivers/net/intel/ice/ice_rxtx.c          | 18 +-----------------
 drivers/net/intel/idpf/idpf_common_rxtx.c | 21 +++++++++++++++++----
 5 files changed, 41 insertions(+), 54 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index c79210d084..f894cea616 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -64,4 +64,25 @@ ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
 	return 0;
 }
 
+static inline uint16_t
+ci_div_roundup16(uint16_t x, uint16_t y)
+{
+	return (uint16_t)((x + y - 1) / y);
+}
+
+/* Calculate the number of TX descriptors needed for each pkt */
+static inline uint16_t
+ci_calc_pkt_desc(const struct rte_mbuf *tx_pkt)
+{
+	uint16_t count = 0;
+
+	while (tx_pkt != NULL) {
+		count += ci_div_roundup16(tx_pkt->data_len, CI_MAX_DATA_PER_TXD);
+		tx_pkt = tx_pkt->next;
+	}
+
+	return count;
+}
+
+
 #endif /* _COMMON_INTEL_TX_SCALAR_FNS_H_ */
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 892069372f..886be06a89 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -1025,21 +1025,6 @@ i40e_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 	return ctx_desc;
 }
 
-/* Calculate the number of TX descriptors needed for each pkt */
-static inline uint16_t
-i40e_calc_pkt_desc(struct rte_mbuf *tx_pkt)
-{
-	struct rte_mbuf *txd = tx_pkt;
-	uint16_t count = 0;
-
-	while (txd != NULL) {
-		count += DIV_ROUND_UP(txd->data_len, CI_MAX_DATA_PER_TXD);
-		txd = txd->next;
-	}
-
-	return count;
-}
-
 uint16_t
 i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
@@ -1102,8 +1087,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		 * per tx desc.
 		 */
 		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-			nb_used = (uint16_t)(i40e_calc_pkt_desc(tx_pkt) +
-					     nb_ctx);
+			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
 		else
 			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
 		tx_last = (uint16_t)(tx_id + nb_used - 1);
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index 9946e112e8..ecf954a2c2 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -2667,21 +2667,6 @@ iavf_build_data_desc_cmd_offset_fields(volatile uint64_t *qw1,
 		((uint64_t)l2tag1 << IAVF_TXD_DATA_QW1_L2TAG1_SHIFT));
 }
 
-/* Calculate the number of TX descriptors needed for each pkt */
-static inline uint16_t
-iavf_calc_pkt_desc(struct rte_mbuf *tx_pkt)
-{
-	struct rte_mbuf *txd = tx_pkt;
-	uint16_t count = 0;
-
-	while (txd != NULL) {
-		count += (txd->data_len + CI_MAX_DATA_PER_TXD - 1) / CI_MAX_DATA_PER_TXD;
-		txd = txd->next;
-	}
-
-	return count;
-}
-
 static inline void
 iavf_fill_data_desc(volatile struct ci_tx_desc *desc,
 	uint64_t desc_template,	uint16_t buffsz,
@@ -2767,7 +2752,7 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		 * per tx desc.
 		 */
 		if (mb->ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-			nb_desc_required = iavf_calc_pkt_desc(mb) + nb_desc_ctx + nb_desc_ipsec;
+			nb_desc_required = ci_calc_pkt_desc(mb) + nb_desc_ctx + nb_desc_ipsec;
 		else
 			nb_desc_required = nb_desc_data + nb_desc_ctx + nb_desc_ipsec;
 
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 5864238092..c2a38b1a13 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3067,21 +3067,6 @@ ice_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 	return ctx_desc;
 }
 
-/* Calculate the number of TX descriptors needed for each pkt */
-static inline uint16_t
-ice_calc_pkt_desc(struct rte_mbuf *tx_pkt)
-{
-	struct rte_mbuf *txd = tx_pkt;
-	uint16_t count = 0;
-
-	while (txd != NULL) {
-		count += DIV_ROUND_UP(txd->data_len, CI_MAX_DATA_PER_TXD);
-		txd = txd->next;
-	}
-
-	return count;
-}
-
 uint16_t
 ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
@@ -3144,8 +3129,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		 * per tx desc.
 		 */
 		if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
-			nb_used = (uint16_t)(ice_calc_pkt_desc(tx_pkt) +
-					     nb_ctx);
+			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
 		else
 			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
 		tx_last = (uint16_t)(tx_id + nb_used - 1);
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 587871b54a..11d6848430 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -934,7 +934,16 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		tx_offload.tso_segsz = tx_pkt->tso_segsz;
 		/* Calculate the number of context descriptors needed. */
 		nb_ctx = idpf_calc_context_desc(ol_flags);
-		nb_used = tx_pkt->nb_segs + nb_ctx;
+
+		/* Calculate the number of TX descriptors needed for
+		 * each packet. For TSO packets, use ci_calc_pkt_desc as
+		 * the mbuf data size might exceed max data size that hw allows
+		 * per tx desc.
+		 */
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+			nb_used = ci_calc_pkt_desc(tx_pkt) + nb_ctx;
+		else
+			nb_used = tx_pkt->nb_segs + nb_ctx;
 
 		if (ol_flags & IDPF_TX_CKSUM_OFFLOAD_MASK)
 			cmd_dtype = IDPF_TXD_FLEX_FLOW_CMD_CS_EN;
@@ -1382,10 +1391,14 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		nb_ctx = idpf_calc_context_desc(ol_flags);
 
 		/* The number of descriptors that must be allocated for
-		 * a packet equals to the number of the segments of that
-		 * packet plus 1 context descriptor if needed.
+		 * a packet. For TSO packets, use ci_calc_pkt_desc as
+		 * the mbuf data size might exceed max data size that hw allows
+		 * per tx desc.
 		 */
-		nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
+		else
+			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
 		tx_last = (uint16_t)(tx_id + nb_used - 1);
 
 		/* Circular ring */
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [RFC PATCH 07/27] net/ice: refactor context descriptor handling
  2025-12-19 17:25 [RFC PATCH 00/27] combine multiple Intel scalar Tx paths Bruce Richardson
                   ` (5 preceding siblings ...)
  2025-12-19 17:25 ` [RFC PATCH 06/27] net/intel: add common fn to calculate needed descriptors Bruce Richardson
@ 2025-12-19 17:25 ` Bruce Richardson
  2025-12-19 17:25 ` [RFC PATCH 08/27] net/i40e: " Bruce Richardson
                   ` (23 subsequent siblings)
  30 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2025-12-19 17:25 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

Create a single function to manage all context descriptor handling,
which returns either 0 or 1 depending on whether a descriptor is needed
or not, as well as returning directly the descriptor contents if
relevant.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/ice/ice_rxtx.c | 96 ++++++++++++++++++--------------
 1 file changed, 55 insertions(+), 41 deletions(-)

diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index c2a38b1a13..b90a1b4ec4 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3044,7 +3044,7 @@ ice_calc_context_desc(uint64_t flags)
 
 /* set ice TSO context descriptor */
 static inline uint64_t
-ice_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
+ice_set_tso_ctx(uint64_t ol_flags, const struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 {
 	uint64_t ctx_desc = 0;
 	uint32_t cd_cmd, hdr_len, cd_tso_len;
@@ -3055,7 +3055,7 @@ ice_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 	}
 
 	hdr_len = tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len;
-	hdr_len += (mbuf->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) ?
+	hdr_len += (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) ?
 		   tx_offload.outer_l2_len + tx_offload.outer_l3_len : 0;
 
 	cd_cmd = CI_TX_CTX_DESC_TSO;
@@ -3067,6 +3067,51 @@ ice_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 	return ctx_desc;
 }
 
+/* compute a context descriptor if one is necessary based on the ol_flags
+ *
+ * Returns 0 if no descriptor is necessary.
+ * Returns 1 if one is necessary and the contents of the descriptor are returned
+ *   in the values pointed to by qw0 and qw1. td_offset may also be modified.
+ */
+static __rte_always_inline uint16_t
+get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
+	const union ci_tx_offload *tx_offload, const struct ci_tx_queue *txq,
+	uint32_t *td_offset, uint64_t *qw0, uint64_t *qw1)
+{
+	uint16_t cd_l2tag2 = 0;
+	uint64_t cd_type_cmd_tso_mss = ICE_TX_DESC_DTYPE_CTX;
+	uint32_t cd_tunneling_params = 0;
+	uint64_t ptp_tx_index = txq->ice_vsi->adapter->ptp_tx_index;
+
+	if (ice_calc_context_desc(ol_flags) == 0)
+		return 0;
+
+	if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) {
+		*td_offset |= (tx_offload->outer_l2_len >> 1) << CI_TX_DESC_LEN_MACLEN_S;
+		ice_parse_tunneling_params(ol_flags, *tx_offload, &cd_tunneling_params);
+	}
+
+	if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
+		cd_type_cmd_tso_mss |= ice_set_tso_ctx(ol_flags, tx_pkt, *tx_offload);
+	else if (ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST)
+		cd_type_cmd_tso_mss |=
+			((uint64_t)CI_TX_CTX_DESC_TSYN << CI_TXD_QW1_CMD_S) |
+			((ptp_tx_index << ICE_TXD_CTX_QW1_TSYN_S) & ICE_TXD_CTX_QW1_TSYN_M);
+
+
+	/* TX context descriptor based double VLAN insert */
+	if (ol_flags & RTE_MBUF_F_TX_QINQ) {
+		cd_l2tag2 = tx_pkt->vlan_tci_outer;
+		cd_type_cmd_tso_mss |= ((uint64_t)CI_TX_CTX_DESC_IL2TAG2 << CI_TXD_QW1_CMD_S);
+	}
+
+	*qw0 = rte_cpu_to_le_32(cd_tunneling_params) |
+		((uint64_t)rte_cpu_to_le_16(cd_l2tag2) << 32);
+	*qw1 = rte_cpu_to_le_64(cd_type_cmd_tso_mss);
+
+	return 1;
+}
+
 uint16_t
 ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
@@ -3077,7 +3122,6 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	struct ci_tx_entry *txe, *txn;
 	struct rte_mbuf *tx_pkt;
 	struct rte_mbuf *m_seg;
-	uint32_t cd_tunneling_params;
 	uint16_t tx_id;
 	uint16_t ts_id = -1;
 	uint16_t nb_tx;
@@ -3106,20 +3150,24 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		(void)ci_tx_xmit_cleanup(txq);
 
 	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
+		uint64_t cd_qw0, cd_qw1;
 		tx_pkt = *tx_pkts++;
 
 		td_cmd = 0;
 		td_tag = 0;
 		td_offset = 0;
 		ol_flags = tx_pkt->ol_flags;
+
 		tx_offload.l2_len = tx_pkt->l2_len;
 		tx_offload.l3_len = tx_pkt->l3_len;
 		tx_offload.outer_l2_len = tx_pkt->outer_l2_len;
 		tx_offload.outer_l3_len = tx_pkt->outer_l3_len;
 		tx_offload.l4_len = tx_pkt->l4_len;
 		tx_offload.tso_segsz = tx_pkt->tso_segsz;
+
 		/* Calculate the number of context descriptors needed. */
-		nb_ctx = ice_calc_context_desc(ol_flags);
+		nb_ctx = get_context_desc(ol_flags, tx_pkt, &tx_offload,
+			txq, &td_offset, &cd_qw0, &cd_qw1);
 
 		/* The number of descriptors that must be allocated for
 		 * a packet equals to the number of the segments of that
@@ -3161,15 +3209,6 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			td_tag = tx_pkt->vlan_tci;
 		}
 
-		/* Fill in tunneling parameters if necessary */
-		cd_tunneling_params = 0;
-		if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) {
-			td_offset |= (tx_offload.outer_l2_len >> 1)
-				<< CI_TX_DESC_LEN_MACLEN_S;
-			ice_parse_tunneling_params(ol_flags, tx_offload,
-						   &cd_tunneling_params);
-		}
-
 		/* Enable checksum offloading */
 		if (ol_flags & ICE_TX_CKSUM_OFFLOAD_MASK)
 			ice_txd_enable_checksum(ol_flags, &td_cmd,
@@ -3177,11 +3216,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		if (nb_ctx) {
 			/* Setup TX context descriptor if required */
-			volatile struct ice_tx_ctx_desc *ctx_txd =
-				(volatile struct ice_tx_ctx_desc *)
-					&ci_tx_ring[tx_id];
-			uint16_t cd_l2tag2 = 0;
-			uint64_t cd_type_cmd_tso_mss = ICE_TX_DESC_DTYPE_CTX;
+			uint64_t *ctx_txd = RTE_CAST_PTR(uint64_t *, &ci_tx_ring[tx_id]);
 
 			txn = &sw_ring[txe->next_id];
 			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
@@ -3190,29 +3225,8 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 				txe->mbuf = NULL;
 			}
 
-			if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
-				cd_type_cmd_tso_mss |=
-					ice_set_tso_ctx(tx_pkt, tx_offload);
-			else if (ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST)
-				cd_type_cmd_tso_mss |=
-					((uint64_t)CI_TX_CTX_DESC_TSYN <<
-					CI_TXD_QW1_CMD_S) |
-					 (((uint64_t)txq->ice_vsi->adapter->ptp_tx_index <<
-					 ICE_TXD_CTX_QW1_TSYN_S) & ICE_TXD_CTX_QW1_TSYN_M);
-
-			ctx_txd->tunneling_params =
-				rte_cpu_to_le_32(cd_tunneling_params);
-
-			/* TX context descriptor based double VLAN insert */
-			if (ol_flags & RTE_MBUF_F_TX_QINQ) {
-				cd_l2tag2 = tx_pkt->vlan_tci_outer;
-				cd_type_cmd_tso_mss |=
-					((uint64_t)CI_TX_CTX_DESC_IL2TAG2 <<
-					 CI_TXD_QW1_CMD_S);
-			}
-			ctx_txd->l2tag2 = rte_cpu_to_le_16(cd_l2tag2);
-			ctx_txd->qw1 =
-				rte_cpu_to_le_64(cd_type_cmd_tso_mss);
+			ctx_txd[0] = cd_qw0;
+			ctx_txd[1] = cd_qw1;
 
 			txe->last_id = tx_last;
 			tx_id = txe->next_id;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [RFC PATCH 08/27] net/i40e: refactor context descriptor handling
  2025-12-19 17:25 [RFC PATCH 00/27] combine multiple Intel scalar Tx paths Bruce Richardson
                   ` (6 preceding siblings ...)
  2025-12-19 17:25 ` [RFC PATCH 07/27] net/ice: refactor context descriptor handling Bruce Richardson
@ 2025-12-19 17:25 ` Bruce Richardson
  2025-12-19 17:25 ` [RFC PATCH 09/27] net/idpf: " Bruce Richardson
                   ` (22 subsequent siblings)
  30 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2025-12-19 17:25 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

move all context descriptor handling to a single function, as with the
ice driver, and use the same function signature as that driver.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/i40e/i40e_rxtx.c | 109 +++++++++++++++--------------
 1 file changed, 58 insertions(+), 51 deletions(-)

diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 886be06a89..82c4c6017b 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -1000,7 +1000,7 @@ i40e_calc_context_desc(uint64_t flags)
 
 /* set i40e TSO context descriptor */
 static inline uint64_t
-i40e_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
+i40e_set_tso_ctx(uint64_t ol_flags, const struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 {
 	uint64_t ctx_desc = 0;
 	uint32_t cd_cmd, hdr_len, cd_tso_len;
@@ -1011,7 +1011,7 @@ i40e_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 	}
 
 	hdr_len = tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len;
-	hdr_len += (mbuf->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) ?
+	hdr_len += (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) ?
 		   tx_offload.outer_l2_len + tx_offload.outer_l3_len : 0;
 
 	cd_cmd = I40E_TX_CTX_DESC_TSO;
@@ -1025,6 +1025,53 @@ i40e_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 	return ctx_desc;
 }
 
+/* compute a context descriptor if one is necessary based on the ol_flags
+ *
+ * Returns 0 if no descriptor is necessary.
+ * Returns 1 if one is necessary and the contents of the descriptor are returned
+ *   in the values pointed to by qw0 and qw1. td_offset may also be modified.
+ */
+static __rte_always_inline uint16_t
+get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
+		 const union ci_tx_offload *tx_offload,
+		 const struct ci_tx_queue *txq __rte_unused,
+		 uint32_t *td_offset, uint64_t *qw0, uint64_t *qw1)
+{
+	uint16_t cd_l2tag2 = 0;
+	uint64_t cd_type_cmd_tso_mss = I40E_TX_DESC_DTYPE_CONTEXT;
+	uint32_t cd_tunneling_params = 0;
+
+	if (i40e_calc_context_desc(ol_flags) == 0)
+		return 0;
+
+	if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) {
+		*td_offset |= (tx_offload->outer_l2_len >> 1) << CI_TX_DESC_LEN_MACLEN_S;
+		i40e_parse_tunneling_params(ol_flags, *tx_offload, &cd_tunneling_params);
+	}
+
+	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+		cd_type_cmd_tso_mss |= i40e_set_tso_ctx(ol_flags, tx_pkt, *tx_offload);
+	else {
+#ifdef RTE_LIBRTE_IEEE1588
+		if (ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST)
+			cd_type_cmd_tso_mss |=
+				((uint64_t)I40E_TX_CTX_DESC_TSYN << I40E_TXD_CTX_QW1_CMD_SHIFT);
+#endif
+	}
+
+	/* TX context descriptor based double VLAN insert */
+	if (ol_flags & RTE_MBUF_F_TX_QINQ) {
+		cd_l2tag2 = tx_pkt->vlan_tci_outer;
+		cd_type_cmd_tso_mss |= ((uint64_t)I40E_TX_CTX_DESC_IL2TAG2 << I40E_TXD_CTX_QW1_CMD_SHIFT);
+	}
+
+	*qw0 = rte_cpu_to_le_32(cd_tunneling_params) |
+		((uint64_t)rte_cpu_to_le_16(cd_l2tag2) << 32);
+	*qw1 = rte_cpu_to_le_64(cd_type_cmd_tso_mss);
+
+	return 1;
+}
+
 uint16_t
 i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
@@ -1035,7 +1082,6 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	volatile struct ci_tx_desc *txr;
 	struct rte_mbuf *tx_pkt;
 	struct rte_mbuf *m_seg;
-	uint32_t cd_tunneling_params;
 	uint16_t tx_id;
 	uint16_t nb_tx;
 	uint32_t td_cmd;
@@ -1076,7 +1122,9 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		tx_offload.tso_segsz = tx_pkt->tso_segsz;
 
 		/* Calculate the number of context descriptors needed. */
-		nb_ctx = i40e_calc_context_desc(ol_flags);
+		uint64_t cd_qw0 = 0, cd_qw1 = 0;
+		nb_ctx = get_context_desc(ol_flags, tx_pkt, &tx_offload, txq, &td_offset,
+				&cd_qw0, &cd_qw1);
 
 		/**
 		 * The number of descriptors that must be allocated for
@@ -1122,14 +1170,6 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		/* Always enable CRC offload insertion */
 		td_cmd |= CI_TX_DESC_CMD_ICRC;
 
-		/* Fill in tunneling parameters if necessary */
-		cd_tunneling_params = 0;
-		if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) {
-			td_offset |= (tx_offload.outer_l2_len >> 1)
-					<< CI_TX_DESC_LEN_MACLEN_S;
-			i40e_parse_tunneling_params(ol_flags, tx_offload,
-						    &cd_tunneling_params);
-		}
 		/* Enable checksum offloading */
 		if (ol_flags & I40E_TX_CKSUM_OFFLOAD_MASK)
 			i40e_txd_enable_checksum(ol_flags, &td_cmd,
@@ -1137,12 +1177,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		if (nb_ctx) {
 			/* Setup TX context descriptor if required */
-			volatile struct i40e_tx_context_desc *ctx_txd =
-				(volatile struct i40e_tx_context_desc *)\
-							&txr[tx_id];
-			uint16_t cd_l2tag2 = 0;
-			uint64_t cd_type_cmd_tso_mss =
-				I40E_TX_DESC_DTYPE_CONTEXT;
+			uint64_t *desc = RTE_CAST_PTR(uint64_t *, &txr[tx_id]);
 
 			txn = &sw_ring[txe->next_id];
 			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
@@ -1151,41 +1186,13 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 				txe->mbuf = NULL;
 			}
 
-			/* TSO enabled means no timestamp */
-			if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-				cd_type_cmd_tso_mss |=
-					i40e_set_tso_ctx(tx_pkt, tx_offload);
-			else {
-#ifdef RTE_LIBRTE_IEEE1588
-				if (ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST)
-					cd_type_cmd_tso_mss |=
-						((uint64_t)I40E_TX_CTX_DESC_TSYN <<
-						 I40E_TXD_CTX_QW1_CMD_SHIFT);
-#endif
-			}
-
-			ctx_txd->tunneling_params =
-				rte_cpu_to_le_32(cd_tunneling_params);
-			if (ol_flags & RTE_MBUF_F_TX_QINQ) {
-				cd_l2tag2 = tx_pkt->vlan_tci_outer;
-				cd_type_cmd_tso_mss |=
-					((uint64_t)I40E_TX_CTX_DESC_IL2TAG2 <<
-						I40E_TXD_CTX_QW1_CMD_SHIFT);
-			}
-			ctx_txd->l2tag2 = rte_cpu_to_le_16(cd_l2tag2);
-			ctx_txd->type_cmd_tso_mss =
-				rte_cpu_to_le_64(cd_type_cmd_tso_mss);
+			desc[0] = cd_qw0;
+			desc[1] = cd_qw1;
 
 			PMD_TX_LOG(DEBUG, "mbuf: %p, TCD[%u]: "
-				"tunneling_params: %#x; "
-				"l2tag2: %#hx; "
-				"rsvd: %#hx; "
-				"type_cmd_tso_mss: %#"PRIx64";",
-				tx_pkt, tx_id,
-				ctx_txd->tunneling_params,
-				ctx_txd->l2tag2,
-				ctx_txd->rsvd,
-				ctx_txd->type_cmd_tso_mss);
+				"qw0: %#"PRIx64"; "
+				"qw1: %#"PRIx64";",
+				tx_pkt, tx_id, cd_qw0, cd_qw1);
 
 			txe->last_id = tx_last;
 			tx_id = txe->next_id;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [RFC PATCH 09/27] net/idpf: refactor context descriptor handling
  2025-12-19 17:25 [RFC PATCH 00/27] combine multiple Intel scalar Tx paths Bruce Richardson
                   ` (7 preceding siblings ...)
  2025-12-19 17:25 ` [RFC PATCH 08/27] net/i40e: " Bruce Richardson
@ 2025-12-19 17:25 ` Bruce Richardson
  2025-12-19 17:25 ` [RFC PATCH 10/27] net/intel: consolidate checksum mask definition Bruce Richardson
                   ` (21 subsequent siblings)
  30 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2025-12-19 17:25 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Jingjing Wu, Praveen Shetty

move all context descriptor handling to a single function, as with the
ice driver.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/idpf/idpf_common_rxtx.c | 61 +++++++++++------------
 1 file changed, 28 insertions(+), 33 deletions(-)

diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 11d6848430..9219ad9047 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -845,37 +845,36 @@ idpf_calc_context_desc(uint64_t flags)
 	return 0;
 }
 
-/* set TSO context descriptor
+/* set TSO context descriptor, returns 0 if no context needed, 1 if context set
  */
-static inline void
-idpf_set_splitq_tso_ctx(struct rte_mbuf *mbuf,
+static inline uint16_t
+idpf_set_tso_ctx(uint64_t ol_flags, struct rte_mbuf *mbuf,
 			union ci_tx_offload tx_offload,
-			volatile union idpf_flex_tx_ctx_desc *ctx_desc)
+			uint64_t *qw0, uint64_t *qw1)
 {
-	uint16_t cmd_dtype;
+	uint16_t cmd_dtype = IDPF_TX_DESC_DTYPE_FLEX_TSO_CTX | IDPF_TX_FLEX_CTX_DESC_CMD_TSO;
+	uint16_t tso_segsz = mbuf->tso_segsz;
 	uint32_t tso_len;
 	uint8_t hdr_len;
 
+	if (idpf_calc_context_desc(ol_flags) == 0)
+		return 0;
+
+	/* TSO context descriptor setup */
 	if (tx_offload.l4_len == 0) {
 		TX_LOG(DEBUG, "L4 length set to 0");
-		return;
+		return 0;
 	}
 
-	hdr_len = tx_offload.l2_len +
-		tx_offload.l3_len +
-		tx_offload.l4_len;
-	cmd_dtype = IDPF_TX_DESC_DTYPE_FLEX_TSO_CTX |
-		IDPF_TX_FLEX_CTX_DESC_CMD_TSO;
+	hdr_len = tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len;
 	tso_len = mbuf->pkt_len - hdr_len;
 
-	ctx_desc->tso.qw1.cmd_dtype = rte_cpu_to_le_16(cmd_dtype);
-	ctx_desc->tso.qw0.hdr_len = hdr_len;
-	ctx_desc->tso.qw0.mss_rt =
-		rte_cpu_to_le_16((uint16_t)mbuf->tso_segsz &
-				 IDPF_TXD_FLEX_CTX_MSS_RT_M);
-	ctx_desc->tso.qw0.flex_tlen =
-		rte_cpu_to_le_32(tso_len &
-				 IDPF_TXD_FLEX_CTX_MSS_RT_M);
+	*qw0 = rte_cpu_to_le_32(tso_len & IDPF_TXD_FLEX_CTX_MSS_RT_M) |
+	       ((uint64_t)rte_cpu_to_le_16(tso_segsz & IDPF_TXD_FLEX_CTX_MSS_RT_M) << 32) |
+	       ((uint64_t)hdr_len << 48);
+	*qw1 = rte_cpu_to_le_16(cmd_dtype);
+
+	return 1;
 }
 
 RTE_EXPORT_INTERNAL_SYMBOL(idpf_dp_splitq_xmit_pkts)
@@ -933,7 +932,8 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		tx_offload.l4_len = tx_pkt->l4_len;
 		tx_offload.tso_segsz = tx_pkt->tso_segsz;
 		/* Calculate the number of context descriptors needed. */
-		nb_ctx = idpf_calc_context_desc(ol_flags);
+		uint64_t cd_qw0, cd_qw1;
+		nb_ctx = idpf_set_tso_ctx(ol_flags, tx_pkt, tx_offload, &cd_qw0, &cd_qw1);
 
 		/* Calculate the number of TX descriptors needed for
 		 * each packet. For TSO packets, use ci_calc_pkt_desc as
@@ -950,12 +950,10 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 		/* context descriptor */
 		if (nb_ctx != 0) {
-			volatile union idpf_flex_tx_ctx_desc *ctx_desc =
-				(volatile union idpf_flex_tx_ctx_desc *)&txr[tx_id];
+			uint64_t *ctx_desc = RTE_CAST_PTR(uint64_t *, &txr[tx_id]);
 
-			if ((ol_flags & RTE_MBUF_F_TX_TCP_SEG) != 0)
-				idpf_set_splitq_tso_ctx(tx_pkt, tx_offload,
-							ctx_desc);
+			ctx_desc[0] = cd_qw0;
+			ctx_desc[1] = cd_qw1;
 
 			tx_id++;
 			if (tx_id == txq->nb_tx_desc)
@@ -1388,7 +1386,8 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		tx_offload.l4_len = tx_pkt->l4_len;
 		tx_offload.tso_segsz = tx_pkt->tso_segsz;
 		/* Calculate the number of context descriptors needed. */
-		nb_ctx = idpf_calc_context_desc(ol_flags);
+		uint64_t cd_qw0, cd_qw1;
+		nb_ctx = idpf_set_tso_ctx(ol_flags, tx_pkt, tx_offload, &cd_qw0, &cd_qw1);
 
 		/* The number of descriptors that must be allocated for
 		 * a packet. For TSO packets, use ci_calc_pkt_desc as
@@ -1431,9 +1430,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 		if (nb_ctx != 0) {
 			/* Setup TX context descriptor if required */
-			volatile union idpf_flex_tx_ctx_desc *ctx_txd =
-				(volatile union idpf_flex_tx_ctx_desc *)
-				&txr[tx_id];
+			uint64_t *ctx_txd = RTE_CAST_PTR(uint64_t *, &txr[tx_id]);
 
 			txn = &sw_ring[txe->next_id];
 			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
@@ -1442,10 +1439,8 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 				txe->mbuf = NULL;
 			}
 
-			/* TSO enabled */
-			if ((ol_flags & RTE_MBUF_F_TX_TCP_SEG) != 0)
-				idpf_set_splitq_tso_ctx(tx_pkt, tx_offload,
-							ctx_txd);
+			ctx_txd[0] = cd_qw0;
+			ctx_txd[1] = cd_qw1;
 
 			txe->last_id = tx_last;
 			tx_id = txe->next_id;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [RFC PATCH 10/27] net/intel: consolidate checksum mask definition
  2025-12-19 17:25 [RFC PATCH 00/27] combine multiple Intel scalar Tx paths Bruce Richardson
                   ` (8 preceding siblings ...)
  2025-12-19 17:25 ` [RFC PATCH 09/27] net/idpf: " Bruce Richardson
@ 2025-12-19 17:25 ` Bruce Richardson
  2025-12-19 17:25 ` [RFC PATCH 11/27] net/intel: create common checksum Tx offload function Bruce Richardson
                   ` (20 subsequent siblings)
  30 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2025-12-19 17:25 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Vladimir Medvedkin, Anatoly Burakov,
	Jingjing Wu, Praveen Shetty

Create a common definition for checksum masks across iavf, idpf, i40e
and ice drivers.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h             | 7 +++++++
 drivers/net/intel/i40e/i40e_rxtx.c        | 7 +------
 drivers/net/intel/iavf/iavf_rxtx.c        | 2 +-
 drivers/net/intel/iavf/iavf_rxtx.h        | 8 --------
 drivers/net/intel/ice/ice_rxtx.c          | 7 +------
 drivers/net/intel/idpf/idpf_common_rxtx.c | 4 ++--
 drivers/net/intel/idpf/idpf_common_rxtx.h | 7 +------
 7 files changed, 13 insertions(+), 29 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 320ab0b8e0..a71b98f119 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -53,6 +53,13 @@
 /* Common maximum data per TX descriptor */
 #define CI_MAX_DATA_PER_TXD     (CI_TXD_QW1_TX_BUF_SZ_M >> CI_TXD_QW1_TX_BUF_SZ_S)
 
+/* Checksum offload mask to identify packets requesting offload */
+#define CI_TX_CKSUM_OFFLOAD_MASK (RTE_MBUF_F_TX_IP_CKSUM |		 \
+				   RTE_MBUF_F_TX_L4_MASK |		 \
+				   RTE_MBUF_F_TX_TCP_SEG |		 \
+				   RTE_MBUF_F_TX_OUTER_IP_CKSUM |	 \
+				   RTE_MBUF_F_TX_OUTER_UDP_CKSUM)
+
 /**
  * Common TX offload union for Intel drivers.
  * Supports both basic offloads (l2_len, l3_len, l4_len, tso_segsz) and
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 82c4c6017b..e1964eab97 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -49,11 +49,6 @@
 #define I40E_TX_IEEE1588_TMST 0
 #endif
 
-#define I40E_TX_CKSUM_OFFLOAD_MASK (RTE_MBUF_F_TX_IP_CKSUM |		 \
-		RTE_MBUF_F_TX_L4_MASK |		 \
-		RTE_MBUF_F_TX_TCP_SEG |		 \
-		RTE_MBUF_F_TX_OUTER_IP_CKSUM)
-
 #define I40E_TX_OFFLOAD_MASK (RTE_MBUF_F_TX_OUTER_IPV4 |	\
 		RTE_MBUF_F_TX_OUTER_IPV6 |	\
 		RTE_MBUF_F_TX_IPV4 |		\
@@ -1171,7 +1166,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		td_cmd |= CI_TX_DESC_CMD_ICRC;
 
 		/* Enable checksum offloading */
-		if (ol_flags & I40E_TX_CKSUM_OFFLOAD_MASK)
+		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
 			i40e_txd_enable_checksum(ol_flags, &td_cmd,
 						 &td_offset, tx_offload);
 
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index ecf954a2c2..9ce978e69c 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -2597,7 +2597,7 @@ iavf_build_data_desc_cmd_offset_fields(volatile uint64_t *qw1,
 	}
 
 	if ((m->ol_flags &
-	    (IAVF_TX_CKSUM_OFFLOAD_MASK | RTE_MBUF_F_TX_SEC_OFFLOAD)) == 0)
+	    (CI_TX_CKSUM_OFFLOAD_MASK | RTE_MBUF_F_TX_SEC_OFFLOAD)) == 0)
 		goto skip_cksum;
 
 	/* Set MACLEN */
diff --git a/drivers/net/intel/iavf/iavf_rxtx.h b/drivers/net/intel/iavf/iavf_rxtx.h
index 86281aa965..4080184b3b 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.h
+++ b/drivers/net/intel/iavf/iavf_rxtx.h
@@ -136,14 +136,6 @@
 
 #define IAVF_TX_MIN_PKT_LEN 17
 
-#define IAVF_TX_CKSUM_OFFLOAD_MASK (		 \
-		RTE_MBUF_F_TX_IP_CKSUM |		 \
-		RTE_MBUF_F_TX_L4_MASK |		 \
-		RTE_MBUF_F_TX_TCP_SEG |          \
-		RTE_MBUF_F_TX_UDP_SEG |          \
-		RTE_MBUF_F_TX_OUTER_IP_CKSUM |   \
-		RTE_MBUF_F_TX_OUTER_UDP_CKSUM)
-
 #define IAVF_TX_OFFLOAD_MASK (  \
 		RTE_MBUF_F_TX_OUTER_IPV6 |		 \
 		RTE_MBUF_F_TX_OUTER_IPV4 |		 \
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index b90a1b4ec4..e102eb9bcc 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -9,11 +9,6 @@
 #include "ice_rxtx.h"
 #include "ice_rxtx_vec_common.h"
 
-#define ICE_TX_CKSUM_OFFLOAD_MASK (RTE_MBUF_F_TX_IP_CKSUM |		 \
-		RTE_MBUF_F_TX_L4_MASK |		 \
-		RTE_MBUF_F_TX_TCP_SEG |		 \
-		RTE_MBUF_F_TX_UDP_SEG |		 \
-		RTE_MBUF_F_TX_OUTER_IP_CKSUM)
 
 /**
  * The mbuf dynamic field pointer for protocol extraction metadata.
@@ -3210,7 +3205,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		}
 
 		/* Enable checksum offloading */
-		if (ol_flags & ICE_TX_CKSUM_OFFLOAD_MASK)
+		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
 			ice_txd_enable_checksum(ol_flags, &td_cmd,
 						&td_offset, tx_offload);
 
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 9219ad9047..b34d545a0a 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -945,7 +945,7 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		else
 			nb_used = tx_pkt->nb_segs + nb_ctx;
 
-		if (ol_flags & IDPF_TX_CKSUM_OFFLOAD_MASK)
+		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
 			cmd_dtype = IDPF_TXD_FLEX_FLOW_CMD_CS_EN;
 
 		/* context descriptor */
@@ -1425,7 +1425,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			}
 		}
 
-		if (ol_flags & IDPF_TX_CKSUM_OFFLOAD_MASK)
+		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
 			td_cmd |= IDPF_TX_FLEX_DESC_CMD_CS_EN;
 
 		if (nb_ctx != 0) {
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.h b/drivers/net/intel/idpf/idpf_common_rxtx.h
index b88a87402d..fe7094d434 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.h
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.h
@@ -39,13 +39,8 @@
 #define IDPF_RLAN_CTX_DBUF_S	7
 #define IDPF_RX_MAX_DATA_BUF_SIZE	(16 * 1024 - 128)
 
-#define IDPF_TX_CKSUM_OFFLOAD_MASK (		\
-		RTE_MBUF_F_TX_IP_CKSUM |	\
-		RTE_MBUF_F_TX_L4_MASK |		\
-		RTE_MBUF_F_TX_TCP_SEG)
-
 #define IDPF_TX_OFFLOAD_MASK (			\
-		IDPF_TX_CKSUM_OFFLOAD_MASK |	\
+		CI_TX_CKSUM_OFFLOAD_MASK |	\
 		RTE_MBUF_F_TX_IPV4 |		\
 		RTE_MBUF_F_TX_IPV6)
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [RFC PATCH 11/27] net/intel: create common checksum Tx offload function
  2025-12-19 17:25 [RFC PATCH 00/27] combine multiple Intel scalar Tx paths Bruce Richardson
                   ` (9 preceding siblings ...)
  2025-12-19 17:25 ` [RFC PATCH 10/27] net/intel: consolidate checksum mask definition Bruce Richardson
@ 2025-12-19 17:25 ` Bruce Richardson
  2025-12-19 17:25 ` [RFC PATCH 12/27] net/intel: create a common scalar Tx function Bruce Richardson
                   ` (19 subsequent siblings)
  30 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2025-12-19 17:25 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

Since i40e and ice have the same checksum offload logic, merge their
functions into one. Future rework should enable this to be used by more
drivers also.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar_fns.h | 63 +++++++++++++++++++++++
 drivers/net/intel/i40e/i40e_rxtx.c       | 57 +--------------------
 drivers/net/intel/ice/ice_rxtx.c         | 64 +-----------------------
 3 files changed, 65 insertions(+), 119 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index f894cea616..95ee7dc35f 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -64,6 +64,69 @@ ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
 	return 0;
 }
 
+/* Common checksum enable function for Intel drivers (ice, i40e, etc.) */
+static inline void
+ci_txd_enable_checksum(uint64_t ol_flags,
+		       uint32_t *td_cmd,
+		       uint32_t *td_offset,
+		       union ci_tx_offload tx_offload)
+{
+	/* Set MACLEN */
+	if (!(ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK))
+		*td_offset |= (tx_offload.l2_len >> 1)
+			<< CI_TX_DESC_LEN_MACLEN_S;
+
+	/* Enable L3 checksum offloads */
+	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
+		*td_offset |= (tx_offload.l3_len >> 2) <<
+			CI_TX_DESC_LEN_IPLEN_S;
+	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
+		*td_offset |= (tx_offload.l3_len >> 2) <<
+			CI_TX_DESC_LEN_IPLEN_S;
+	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
+		*td_offset |= (tx_offload.l3_len >> 2) <<
+			CI_TX_DESC_LEN_IPLEN_S;
+	}
+
+	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
+		*td_offset |= (tx_offload.l4_len >> 2) <<
+			      CI_TX_DESC_LEN_L4_LEN_S;
+		return;
+	}
+
+	if (ol_flags & RTE_MBUF_F_TX_UDP_SEG) {
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
+		*td_offset |= (tx_offload.l4_len >> 2) <<
+			      CI_TX_DESC_LEN_L4_LEN_S;
+		return;
+	}
+
+	/* Enable L4 checksum offloads */
+	switch (ol_flags & RTE_MBUF_F_TX_L4_MASK) {
+	case RTE_MBUF_F_TX_TCP_CKSUM:
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
+		*td_offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
+			      CI_TX_DESC_LEN_L4_LEN_S;
+		break;
+	case RTE_MBUF_F_TX_SCTP_CKSUM:
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
+		*td_offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
+			      CI_TX_DESC_LEN_L4_LEN_S;
+		break;
+	case RTE_MBUF_F_TX_UDP_CKSUM:
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
+		*td_offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
+			      CI_TX_DESC_LEN_L4_LEN_S;
+		break;
+	default:
+		break;
+	}
+}
+
 static inline uint16_t
 ci_div_roundup16(uint16_t x, uint16_t y)
 {
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index e1964eab97..5d1b2e4217 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -306,61 +306,6 @@ i40e_parse_tunneling_params(uint64_t ol_flags,
 		*cd_tunneling |= I40E_TXD_CTX_QW0_L4T_CS_MASK;
 }
 
-static inline void
-i40e_txd_enable_checksum(uint64_t ol_flags,
-			uint32_t *td_cmd,
-			uint32_t *td_offset,
-			union ci_tx_offload tx_offload)
-{
-	/* Set MACLEN */
-	if (!(ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK))
-		*td_offset |= (tx_offload.l2_len >> 1)
-			<< CI_TX_DESC_LEN_MACLEN_S;
-
-	/* Enable L3 checksum offloads */
-	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
-		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
-		*td_offset |= (tx_offload.l3_len >> 2)
-				<< CI_TX_DESC_LEN_IPLEN_S;
-	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
-		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
-		*td_offset |= (tx_offload.l3_len >> 2)
-				<< CI_TX_DESC_LEN_IPLEN_S;
-	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
-		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
-		*td_offset |= (tx_offload.l3_len >> 2)
-				<< CI_TX_DESC_LEN_IPLEN_S;
-	}
-
-	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
-		*td_offset |= (tx_offload.l4_len >> 2)
-			<< CI_TX_DESC_LEN_L4_LEN_S;
-		return;
-	}
-
-	/* Enable L4 checksum offloads */
-	switch (ol_flags & RTE_MBUF_F_TX_L4_MASK) {
-	case RTE_MBUF_F_TX_TCP_CKSUM:
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
-		*td_offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
-				CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	case RTE_MBUF_F_TX_SCTP_CKSUM:
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
-		*td_offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
-				CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	case RTE_MBUF_F_TX_UDP_CKSUM:
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
-		*td_offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
-				CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	default:
-		break;
-	}
-}
-
 /* Construct the tx flags */
 static inline uint64_t
 i40e_build_ctob(uint32_t td_cmd,
@@ -1167,7 +1112,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		/* Enable checksum offloading */
 		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
-			i40e_txd_enable_checksum(ol_flags, &td_cmd,
+			ci_txd_enable_checksum(ol_flags, &td_cmd,
 						 &td_offset, tx_offload);
 
 		if (nb_ctx) {
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index e102eb9bcc..0b0179e1fa 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -2947,68 +2947,6 @@ ice_parse_tunneling_params(uint64_t ol_flags,
 		*cd_tunneling |= ICE_TXD_CTX_QW0_L4T_CS_M;
 }
 
-static inline void
-ice_txd_enable_checksum(uint64_t ol_flags,
-			uint32_t *td_cmd,
-			uint32_t *td_offset,
-			union ci_tx_offload tx_offload)
-{
-	/* Set MACLEN */
-	if (!(ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK))
-		*td_offset |= (tx_offload.l2_len >> 1)
-			<< CI_TX_DESC_LEN_MACLEN_S;
-
-	/* Enable L3 checksum offloads */
-	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
-		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
-		*td_offset |= (tx_offload.l3_len >> 2) <<
-			CI_TX_DESC_LEN_IPLEN_S;
-	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
-		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
-		*td_offset |= (tx_offload.l3_len >> 2) <<
-			CI_TX_DESC_LEN_IPLEN_S;
-	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
-		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
-		*td_offset |= (tx_offload.l3_len >> 2) <<
-			CI_TX_DESC_LEN_IPLEN_S;
-	}
-
-	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
-		*td_offset |= (tx_offload.l4_len >> 2) <<
-			      CI_TX_DESC_LEN_L4_LEN_S;
-		return;
-	}
-
-	if (ol_flags & RTE_MBUF_F_TX_UDP_SEG) {
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
-		*td_offset |= (tx_offload.l4_len >> 2) <<
-			      CI_TX_DESC_LEN_L4_LEN_S;
-		return;
-	}
-
-	/* Enable L4 checksum offloads */
-	switch (ol_flags & RTE_MBUF_F_TX_L4_MASK) {
-	case RTE_MBUF_F_TX_TCP_CKSUM:
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
-		*td_offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
-			      CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	case RTE_MBUF_F_TX_SCTP_CKSUM:
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
-		*td_offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
-			      CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	case RTE_MBUF_F_TX_UDP_CKSUM:
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
-		*td_offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
-			      CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	default:
-		break;
-	}
-}
-
 /* Construct the tx flags */
 static inline uint64_t
 ice_build_ctob(uint32_t td_cmd,
@@ -3206,7 +3144,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		/* Enable checksum offloading */
 		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
-			ice_txd_enable_checksum(ol_flags, &td_cmd,
+			ci_txd_enable_checksum(ol_flags, &td_cmd,
 						&td_offset, tx_offload);
 
 		if (nb_ctx) {
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [RFC PATCH 12/27] net/intel: create a common scalar Tx function
  2025-12-19 17:25 [RFC PATCH 00/27] combine multiple Intel scalar Tx paths Bruce Richardson
                   ` (10 preceding siblings ...)
  2025-12-19 17:25 ` [RFC PATCH 11/27] net/intel: create common checksum Tx offload function Bruce Richardson
@ 2025-12-19 17:25 ` Bruce Richardson
  2025-12-19 17:25 ` [RFC PATCH 13/27] net/i40e: use " Bruce Richardson
                   ` (18 subsequent siblings)
  30 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2025-12-19 17:25 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

Given the similarities between the transmit functions across various
Intel drivers, make a start on consolidating them by moving the ice Tx
function into common, for reuse by other drivers.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar_fns.h | 215 ++++++++++++++++++
 drivers/net/intel/ice/ice_rxtx.c         | 268 +++++------------------
 2 files changed, 267 insertions(+), 216 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index 95ee7dc35f..70b22f1da0 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -6,6 +6,7 @@
 #define _COMMON_INTEL_TX_SCALAR_FNS_H_
 
 #include <stdint.h>
+#include <rte_io.h>
 #include <rte_byteorder.h>
 
 /* depends on common Tx definitions. */
@@ -147,5 +148,219 @@ ci_calc_pkt_desc(const struct rte_mbuf *tx_pkt)
 	return count;
 }
 
+typedef uint16_t (*ci_get_ctx_desc_fn)(uint64_t ol_flags, const struct rte_mbuf *mbuf,
+		const union ci_tx_offload *tx_offload, const struct ci_tx_queue *txq,
+		uint32_t *td_offset, uint64_t *qw0, uint64_t *qw1);
+
+/* gets current timestamp tail index */
+typedef uint16_t (*get_ts_tail_t)(struct ci_tx_queue *txq);
+/* writes a timestamp descriptor and returns new tail index */
+typedef uint16_t (*write_ts_desc_t)(struct ci_tx_queue *txq, struct rte_mbuf *mbuf,
+		uint16_t tx_id, uint16_t ts_id);
+/* writes a timestamp tail index - doorbell */
+typedef void (*write_ts_tail_t)(struct ci_tx_queue *txq, uint16_t ts_id);
+
+struct ci_timesstamp_queue_fns {
+	get_ts_tail_t get_ts_tail;
+	write_ts_desc_t write_ts_desc;
+	write_ts_tail_t write_ts_tail;
+};
+
+static inline uint16_t
+ci_xmit_pkts(struct ci_tx_queue *txq,
+	     struct rte_mbuf **tx_pkts,
+	     uint16_t nb_pkts,
+	     ci_get_ctx_desc_fn get_ctx_desc,
+	     const struct ci_timesstamp_queue_fns *ts_fns)
+{
+	volatile struct ci_tx_desc *ci_tx_ring;
+	volatile struct ci_tx_desc *txd;
+	struct ci_tx_entry *sw_ring;
+	struct ci_tx_entry *txe, *txn;
+	struct rte_mbuf *tx_pkt;
+	struct rte_mbuf *m_seg;
+	uint16_t tx_id;
+	uint16_t ts_id = -1;
+	uint16_t nb_tx;
+	uint16_t nb_used;
+	uint16_t nb_ctx;
+	uint32_t td_cmd = 0;
+	uint32_t td_offset = 0;
+	uint32_t td_tag = 0;
+	uint16_t tx_last;
+	uint16_t slen;
+	uint64_t buf_dma_addr;
+	uint64_t ol_flags;
+	union ci_tx_offload tx_offload = {0};
+
+	sw_ring = txq->sw_ring;
+	ci_tx_ring = txq->ci_tx_ring;
+	tx_id = txq->tx_tail;
+	txe = &sw_ring[tx_id];
+
+	if (ts_fns != NULL)
+		ts_id = ts_fns->get_ts_tail(txq);
+
+	/* Check if the descriptor ring needs to be cleaned. */
+	if (txq->nb_tx_free < txq->tx_free_thresh)
+		(void)ci_tx_xmit_cleanup(txq);
+
+	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
+		uint64_t cd_qw0, cd_qw1;
+		tx_pkt = *tx_pkts++;
+
+		td_cmd = CI_TX_DESC_CMD_ICRC;
+		td_tag = 0;
+		td_offset = 0;
+		ol_flags = tx_pkt->ol_flags;
+
+		tx_offload.l2_len = tx_pkt->l2_len;
+		tx_offload.l3_len = tx_pkt->l3_len;
+		tx_offload.outer_l2_len = tx_pkt->outer_l2_len;
+		tx_offload.outer_l3_len = tx_pkt->outer_l3_len;
+		tx_offload.l4_len = tx_pkt->l4_len;
+		tx_offload.tso_segsz = tx_pkt->tso_segsz;
+
+		/* Calculate the number of context descriptors needed. */
+		nb_ctx = get_ctx_desc(ol_flags, tx_pkt, &tx_offload,
+			txq, &td_offset, &cd_qw0, &cd_qw1);
+
+		/* The number of descriptors that must be allocated for
+		 * a packet equals to the number of the segments of that
+		 * packet plus the number of context descriptor if needed.
+		 * Recalculate the needed tx descs when TSO enabled in case
+		 * the mbuf data size exceeds max data size that hw allows
+		 * per tx desc.
+		 */
+		if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
+			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
+		else
+			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
+		tx_last = (uint16_t)(tx_id + nb_used - 1);
+
+		/* Circular ring */
+		if (tx_last >= txq->nb_tx_desc)
+			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
+
+		if (nb_used > txq->nb_tx_free) {
+			if (ci_tx_xmit_cleanup(txq) != 0) {
+				if (nb_tx == 0)
+					return 0;
+				goto end_of_tx;
+			}
+			if (unlikely(nb_used > txq->tx_rs_thresh)) {
+				while (nb_used > txq->nb_tx_free) {
+					if (ci_tx_xmit_cleanup(txq) != 0) {
+						if (nb_tx == 0)
+							return 0;
+						goto end_of_tx;
+					}
+				}
+			}
+		}
+
+		/* Descriptor based VLAN insertion */
+		if (ol_flags & (RTE_MBUF_F_TX_VLAN | RTE_MBUF_F_TX_QINQ)) {
+			td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
+			td_tag = tx_pkt->vlan_tci;
+		}
+
+		/* Enable checksum offloading */
+		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
+			ci_txd_enable_checksum(ol_flags, &td_cmd,
+						&td_offset, tx_offload);
+
+		if (nb_ctx) {
+			/* Setup TX context descriptor if required */
+			uint64_t *ctx_txd = RTE_CAST_PTR(uint64_t *, &ci_tx_ring[tx_id]);
+
+			txn = &sw_ring[txe->next_id];
+			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
+			if (txe->mbuf) {
+				rte_pktmbuf_free_seg(txe->mbuf);
+				txe->mbuf = NULL;
+			}
+
+			ctx_txd[0] = cd_qw0;
+			ctx_txd[1] = cd_qw1;
+
+			txe->last_id = tx_last;
+			tx_id = txe->next_id;
+			txe = txn;
+		}
+		m_seg = tx_pkt;
+
+		do {
+			txd = &ci_tx_ring[tx_id];
+			txn = &sw_ring[txe->next_id];
+
+			if (txe->mbuf)
+				rte_pktmbuf_free_seg(txe->mbuf);
+			txe->mbuf = m_seg;
+
+			/* Setup TX Descriptor */
+			slen = m_seg->data_len;
+			buf_dma_addr = rte_mbuf_data_iova(m_seg);
+
+			while ((ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) &&
+					unlikely(slen > CI_MAX_DATA_PER_TXD)) {
+				txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
+				txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+					((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
+					((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+					((uint64_t)CI_MAX_DATA_PER_TXD << CI_TXD_QW1_TX_BUF_SZ_S) |
+					((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
+
+				buf_dma_addr += CI_MAX_DATA_PER_TXD;
+				slen -= CI_MAX_DATA_PER_TXD;
+
+				txe->last_id = tx_last;
+				tx_id = txe->next_id;
+				txe = txn;
+				txd = &ci_tx_ring[tx_id];
+				txn = &sw_ring[txe->next_id];
+			}
+
+			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
+			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
+				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+				((uint64_t)slen << CI_TXD_QW1_TX_BUF_SZ_S) |
+				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
+
+			txe->last_id = tx_last;
+			tx_id = txe->next_id;
+			txe = txn;
+			m_seg = m_seg->next;
+		} while (m_seg);
+
+		/* fill the last descriptor with End of Packet (EOP) bit */
+		td_cmd |= CI_TX_DESC_CMD_EOP;
+		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
+		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
+
+		/* set RS bit on the last descriptor of one packet */
+		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
+			td_cmd |= CI_TX_DESC_CMD_RS;
+
+			/* Update txq RS bit counters */
+			txq->nb_tx_used = 0;
+		}
+		txd->cmd_type_offset_bsz |=
+				rte_cpu_to_le_64(((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S);
+
+		if (ts_fns != NULL)
+			ts_id = ts_fns->write_ts_desc(txq, tx_pkt, tx_id, ts_id);
+	}
+end_of_tx:
+	/* update Tail register */
+	if (ts_fns != NULL)
+		ts_fns->write_ts_tail(txq, ts_id);
+	else
+		rte_write32_wc(tx_id, txq->qtx_tail);
+	txq->tx_tail = tx_id;
+
+	return nb_tx;
+}
 
 #endif /* _COMMON_INTEL_TX_SCALAR_FNS_H_ */
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 0b0179e1fa..384676cfc2 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3045,228 +3045,64 @@ get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
 	return 1;
 }
 
-uint16_t
-ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+static uint16_t
+ice_get_ts_tail(struct ci_tx_queue *txq)
 {
-	struct ci_tx_queue *txq;
-	volatile struct ci_tx_desc *ci_tx_ring;
-	volatile struct ci_tx_desc *txd;
-	struct ci_tx_entry *sw_ring;
-	struct ci_tx_entry *txe, *txn;
-	struct rte_mbuf *tx_pkt;
-	struct rte_mbuf *m_seg;
-	uint16_t tx_id;
-	uint16_t ts_id = -1;
-	uint16_t nb_tx;
-	uint16_t nb_used;
-	uint16_t nb_ctx;
-	uint32_t td_cmd = 0;
-	uint32_t td_offset = 0;
-	uint32_t td_tag = 0;
-	uint16_t tx_last;
-	uint16_t slen;
-	uint64_t buf_dma_addr;
-	uint64_t ol_flags;
-	union ci_tx_offload tx_offload = {0};
-
-	txq = tx_queue;
-	sw_ring = txq->sw_ring;
-	ci_tx_ring = txq->ci_tx_ring;
-	tx_id = txq->tx_tail;
-	txe = &sw_ring[tx_id];
-
-	if (txq->tsq != NULL && txq->tsq->ts_flag > 0)
-		ts_id = txq->tsq->ts_tail;
-
-	/* Check if the descriptor ring needs to be cleaned. */
-	if (txq->nb_tx_free < txq->tx_free_thresh)
-		(void)ci_tx_xmit_cleanup(txq);
-
-	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
-		uint64_t cd_qw0, cd_qw1;
-		tx_pkt = *tx_pkts++;
-
-		td_cmd = 0;
-		td_tag = 0;
-		td_offset = 0;
-		ol_flags = tx_pkt->ol_flags;
-
-		tx_offload.l2_len = tx_pkt->l2_len;
-		tx_offload.l3_len = tx_pkt->l3_len;
-		tx_offload.outer_l2_len = tx_pkt->outer_l2_len;
-		tx_offload.outer_l3_len = tx_pkt->outer_l3_len;
-		tx_offload.l4_len = tx_pkt->l4_len;
-		tx_offload.tso_segsz = tx_pkt->tso_segsz;
-
-		/* Calculate the number of context descriptors needed. */
-		nb_ctx = get_context_desc(ol_flags, tx_pkt, &tx_offload,
-			txq, &td_offset, &cd_qw0, &cd_qw1);
-
-		/* The number of descriptors that must be allocated for
-		 * a packet equals to the number of the segments of that
-		 * packet plus the number of context descriptor if needed.
-		 * Recalculate the needed tx descs when TSO enabled in case
-		 * the mbuf data size exceeds max data size that hw allows
-		 * per tx desc.
-		 */
-		if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
-			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
-		else
-			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
-		tx_last = (uint16_t)(tx_id + nb_used - 1);
-
-		/* Circular ring */
-		if (tx_last >= txq->nb_tx_desc)
-			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
-
-		if (nb_used > txq->nb_tx_free) {
-			if (ci_tx_xmit_cleanup(txq) != 0) {
-				if (nb_tx == 0)
-					return 0;
-				goto end_of_tx;
-			}
-			if (unlikely(nb_used > txq->tx_rs_thresh)) {
-				while (nb_used > txq->nb_tx_free) {
-					if (ci_tx_xmit_cleanup(txq) != 0) {
-						if (nb_tx == 0)
-							return 0;
-						goto end_of_tx;
-					}
-				}
-			}
-		}
-
-		/* Descriptor based VLAN insertion */
-		if (ol_flags & (RTE_MBUF_F_TX_VLAN | RTE_MBUF_F_TX_QINQ)) {
-			td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
-			td_tag = tx_pkt->vlan_tci;
-		}
-
-		/* Enable checksum offloading */
-		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
-			ci_txd_enable_checksum(ol_flags, &td_cmd,
-						&td_offset, tx_offload);
-
-		if (nb_ctx) {
-			/* Setup TX context descriptor if required */
-			uint64_t *ctx_txd = RTE_CAST_PTR(uint64_t *, &ci_tx_ring[tx_id]);
-
-			txn = &sw_ring[txe->next_id];
-			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
-			if (txe->mbuf) {
-				rte_pktmbuf_free_seg(txe->mbuf);
-				txe->mbuf = NULL;
-			}
-
-			ctx_txd[0] = cd_qw0;
-			ctx_txd[1] = cd_qw1;
-
-			txe->last_id = tx_last;
-			tx_id = txe->next_id;
-			txe = txn;
-		}
-		m_seg = tx_pkt;
-
-		do {
-			txd = &ci_tx_ring[tx_id];
-			txn = &sw_ring[txe->next_id];
-
-			if (txe->mbuf)
-				rte_pktmbuf_free_seg(txe->mbuf);
-			txe->mbuf = m_seg;
-
-			/* Setup TX Descriptor */
-			slen = m_seg->data_len;
-			buf_dma_addr = rte_mbuf_data_iova(m_seg);
-
-			while ((ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) &&
-					unlikely(slen > CI_MAX_DATA_PER_TXD)) {
-				txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-				txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
-					((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
-					((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
-					((uint64_t)CI_MAX_DATA_PER_TXD << CI_TXD_QW1_TX_BUF_SZ_S) |
-					((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
-
-				buf_dma_addr += CI_MAX_DATA_PER_TXD;
-				slen -= CI_MAX_DATA_PER_TXD;
-
-				txe->last_id = tx_last;
-				tx_id = txe->next_id;
-				txe = txn;
-				txd = &ci_tx_ring[tx_id];
-				txn = &sw_ring[txe->next_id];
-			}
-
-			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
-				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
-				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
-				((uint64_t)slen << CI_TXD_QW1_TX_BUF_SZ_S) |
-				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
-
-			txe->last_id = tx_last;
-			tx_id = txe->next_id;
-			txe = txn;
-			m_seg = m_seg->next;
-		} while (m_seg);
+	return txq->tsq->ts_tail;
+}
 
-		/* fill the last descriptor with End of Packet (EOP) bit */
-		td_cmd |= CI_TX_DESC_CMD_EOP;
-		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
-		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
+static uint16_t
+ice_write_ts_desc(struct ci_tx_queue *txq,
+		  struct rte_mbuf *tx_pkt,
+		  uint16_t tx_id,
+		  uint16_t ts_id)
+{
+	uint64_t txtime = *RTE_MBUF_DYNFIELD(tx_pkt, txq->tsq->ts_offset, uint64_t *);
+	uint32_t tstamp = (uint32_t)(txtime % NS_PER_S) >> ICE_TXTIME_CTX_RESOLUTION_128NS;
+	const uint32_t desc_tx_id = (tx_id == 0) ? txq->nb_tx_desc : tx_id;
+	__le32 ts_desc = rte_cpu_to_le_32(FIELD_PREP(ICE_TXTIME_TX_DESC_IDX_M, desc_tx_id) |
+			FIELD_PREP(ICE_TXTIME_STAMP_M, tstamp));
+
+	txq->tsq->ice_ts_ring[ts_id].tx_desc_idx_tstamp = ts_desc;
+	ts_id++;
+
+	/* To prevent an MDD, when wrapping the tstamp
+	 * ring create additional TS descriptors equal
+	 * to the number of the fetch TS descriptors
+	 * value. HW will merge the TS descriptors with
+	 * the same timestamp value into a single
+	 * descriptor.
+	 */
+	if (ts_id == txq->tsq->nb_ts_desc) {
+		uint16_t fetch = txq->tsq->nb_ts_desc - txq->nb_tx_desc;
+		ts_id = 0;
+		for (; ts_id < fetch; ts_id++)
+			txq->tsq->ice_ts_ring[ts_id].tx_desc_idx_tstamp = ts_desc;
+	}
+	return ts_id;
+}
 
-		/* set RS bit on the last descriptor of one packet */
-		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
-			PMD_TX_LOG(DEBUG,
-				   "Setting RS bit on TXD id="
-				   "%4u (port=%d queue=%d)",
-				   tx_last, txq->port_id, txq->queue_id);
+static void
+ice_write_ts_tail(struct ci_tx_queue *txq, uint16_t ts_tail)
+{
+	ICE_PCI_REG_WRITE(txq->qtx_tail, ts_tail);
+	txq->tsq->ts_tail = ts_tail;
+}
 
-			td_cmd |= CI_TX_DESC_CMD_RS;
+uint16_t
+ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	const struct ci_timesstamp_queue_fns ts_fns = {
+		.get_ts_tail = ice_get_ts_tail,
+		.write_ts_desc = ice_write_ts_desc,
+		.write_ts_tail = ice_write_ts_tail,
+	};
+	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
 
-			/* Update txq RS bit counters */
-			txq->nb_tx_used = 0;
-		}
-		txd->cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S);
-
-		if (txq->tsq != NULL && txq->tsq->ts_flag > 0) {
-			uint64_t txtime = *RTE_MBUF_DYNFIELD(tx_pkt,
-					txq->tsq->ts_offset, uint64_t *);
-			uint32_t tstamp = (uint32_t)(txtime % NS_PER_S) >>
-						ICE_TXTIME_CTX_RESOLUTION_128NS;
-			const uint32_t desc_tx_id = (tx_id == 0) ? txq->nb_tx_desc : tx_id;
-			__le32 ts_desc = rte_cpu_to_le_32(FIELD_PREP(ICE_TXTIME_TX_DESC_IDX_M,
-					desc_tx_id) | FIELD_PREP(ICE_TXTIME_STAMP_M, tstamp));
-			txq->tsq->ice_ts_ring[ts_id].tx_desc_idx_tstamp = ts_desc;
-			ts_id++;
-			/* To prevent an MDD, when wrapping the tstamp
-			 * ring create additional TS descriptors equal
-			 * to the number of the fetch TS descriptors
-			 * value. HW will merge the TS descriptors with
-			 * the same timestamp value into a single
-			 * descriptor.
-			 */
-			if (ts_id == txq->tsq->nb_ts_desc) {
-				uint16_t fetch = txq->tsq->nb_ts_desc - txq->nb_tx_desc;
-				ts_id = 0;
-				for (; ts_id < fetch; ts_id++)
-					txq->tsq->ice_ts_ring[ts_id].tx_desc_idx_tstamp = ts_desc;
-			}
-		}
-	}
-end_of_tx:
-	/* update Tail register */
-	if (txq->tsq != NULL && txq->tsq->ts_flag > 0) {
-		ICE_PCI_REG_WRITE(txq->qtx_tail, ts_id);
-		txq->tsq->ts_tail = ts_id;
-	} else {
-		ICE_PCI_REG_WRITE(txq->qtx_tail, tx_id);
-	}
-	txq->tx_tail = tx_id;
+	if (txq->tsq != NULL && txq->tsq->ts_flag > 0)
+		return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, &ts_fns);
 
-	return nb_tx;
+	return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, NULL);
 }
 
 static __rte_always_inline int
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [RFC PATCH 13/27] net/i40e: use common scalar Tx function
  2025-12-19 17:25 [RFC PATCH 00/27] combine multiple Intel scalar Tx paths Bruce Richardson
                   ` (11 preceding siblings ...)
  2025-12-19 17:25 ` [RFC PATCH 12/27] net/intel: create a common scalar Tx function Bruce Richardson
@ 2025-12-19 17:25 ` Bruce Richardson
  2025-12-19 17:25 ` [RFC PATCH 14/27] net/intel: add IPSec hooks to common " Bruce Richardson
                   ` (17 subsequent siblings)
  30 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2025-12-19 17:25 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

Following earlier rework, the scalar transmit function for i40e can use
the common function previously moved over from ice driver. This saves
hundreds of duplicated lines of code.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/i40e/i40e_rxtx.c | 206 +----------------------------
 1 file changed, 2 insertions(+), 204 deletions(-)

diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 5d1b2e4217..ecec70e0ac 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -1015,210 +1015,8 @@ get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
 uint16_t
 i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
-	struct ci_tx_queue *txq;
-	struct ci_tx_entry *sw_ring;
-	struct ci_tx_entry *txe, *txn;
-	volatile struct ci_tx_desc *txd;
-	volatile struct ci_tx_desc *txr;
-	struct rte_mbuf *tx_pkt;
-	struct rte_mbuf *m_seg;
-	uint16_t tx_id;
-	uint16_t nb_tx;
-	uint32_t td_cmd;
-	uint32_t td_offset;
-	uint32_t td_tag;
-	uint64_t ol_flags;
-	uint16_t nb_used;
-	uint16_t nb_ctx;
-	uint16_t tx_last;
-	uint16_t slen;
-	uint64_t buf_dma_addr;
-	union ci_tx_offload tx_offload = {0};
-
-	txq = tx_queue;
-	sw_ring = txq->sw_ring;
-	txr = txq->ci_tx_ring;
-	tx_id = txq->tx_tail;
-	txe = &sw_ring[tx_id];
-
-	/* Check if the descriptor ring needs to be cleaned. */
-	if (txq->nb_tx_free < txq->tx_free_thresh)
-		(void)ci_tx_xmit_cleanup(txq);
-
-	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
-		td_cmd = 0;
-		td_tag = 0;
-		td_offset = 0;
-
-		tx_pkt = *tx_pkts++;
-		RTE_MBUF_PREFETCH_TO_FREE(txe->mbuf);
-
-		ol_flags = tx_pkt->ol_flags;
-		tx_offload.l2_len = tx_pkt->l2_len;
-		tx_offload.l3_len = tx_pkt->l3_len;
-		tx_offload.outer_l2_len = tx_pkt->outer_l2_len;
-		tx_offload.outer_l3_len = tx_pkt->outer_l3_len;
-		tx_offload.l4_len = tx_pkt->l4_len;
-		tx_offload.tso_segsz = tx_pkt->tso_segsz;
-
-		/* Calculate the number of context descriptors needed. */
-		uint64_t cd_qw0 = 0, cd_qw1 = 0;
-		nb_ctx = get_context_desc(ol_flags, tx_pkt, &tx_offload, txq, &td_offset,
-				&cd_qw0, &cd_qw1);
-
-		/**
-		 * The number of descriptors that must be allocated for
-		 * a packet equals to the number of the segments of that
-		 * packet plus 1 context descriptor if needed.
-		 * Recalculate the needed tx descs when TSO enabled in case
-		 * the mbuf data size exceeds max data size that hw allows
-		 * per tx desc.
-		 */
-		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
-		else
-			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
-		tx_last = (uint16_t)(tx_id + nb_used - 1);
-
-		/* Circular ring */
-		if (tx_last >= txq->nb_tx_desc)
-			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
-
-		if (nb_used > txq->nb_tx_free) {
-			if (ci_tx_xmit_cleanup(txq) != 0) {
-				if (nb_tx == 0)
-					return 0;
-				goto end_of_tx;
-			}
-			if (unlikely(nb_used > txq->tx_rs_thresh)) {
-				while (nb_used > txq->nb_tx_free) {
-					if (ci_tx_xmit_cleanup(txq) != 0) {
-						if (nb_tx == 0)
-							return 0;
-						goto end_of_tx;
-					}
-				}
-			}
-		}
-
-		/* Descriptor based VLAN insertion */
-		if (ol_flags & (RTE_MBUF_F_TX_VLAN | RTE_MBUF_F_TX_QINQ)) {
-			td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
-			td_tag = tx_pkt->vlan_tci;
-		}
-
-		/* Always enable CRC offload insertion */
-		td_cmd |= CI_TX_DESC_CMD_ICRC;
-
-		/* Enable checksum offloading */
-		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
-			ci_txd_enable_checksum(ol_flags, &td_cmd,
-						 &td_offset, tx_offload);
-
-		if (nb_ctx) {
-			/* Setup TX context descriptor if required */
-			uint64_t *desc = RTE_CAST_PTR(uint64_t *, &txr[tx_id]);
-
-			txn = &sw_ring[txe->next_id];
-			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
-			if (txe->mbuf != NULL) {
-				rte_pktmbuf_free_seg(txe->mbuf);
-				txe->mbuf = NULL;
-			}
-
-			desc[0] = cd_qw0;
-			desc[1] = cd_qw1;
-
-			PMD_TX_LOG(DEBUG, "mbuf: %p, TCD[%u]: "
-				"qw0: %#"PRIx64"; "
-				"qw1: %#"PRIx64";",
-				tx_pkt, tx_id, cd_qw0, cd_qw1);
-
-			txe->last_id = tx_last;
-			tx_id = txe->next_id;
-			txe = txn;
-		}
-
-		m_seg = tx_pkt;
-		do {
-			txd = &txr[tx_id];
-			txn = &sw_ring[txe->next_id];
-
-			if (txe->mbuf)
-				rte_pktmbuf_free_seg(txe->mbuf);
-			txe->mbuf = m_seg;
-
-			/* Setup TX Descriptor */
-			slen = m_seg->data_len;
-			buf_dma_addr = rte_mbuf_data_iova(m_seg);
-
-			while ((ol_flags & RTE_MBUF_F_TX_TCP_SEG) &&
-				unlikely(slen > CI_MAX_DATA_PER_TXD)) {
-				txd->buffer_addr =
-					rte_cpu_to_le_64(buf_dma_addr);
-				txd->cmd_type_offset_bsz =
-					i40e_build_ctob(td_cmd,
-					td_offset, CI_MAX_DATA_PER_TXD,
-					td_tag);
-
-				buf_dma_addr += CI_MAX_DATA_PER_TXD;
-				slen -= CI_MAX_DATA_PER_TXD;
-
-				txe->last_id = tx_last;
-				tx_id = txe->next_id;
-				txe = txn;
-				txd = &txr[tx_id];
-				txn = &sw_ring[txe->next_id];
-			}
-			PMD_TX_LOG(DEBUG, "mbuf: %p, TDD[%u]: "
-				"buf_dma_addr: %#"PRIx64"; "
-				"td_cmd: %#x; "
-				"td_offset: %#x; "
-				"td_len: %u; "
-				"td_tag: %#x;",
-				tx_pkt, tx_id, buf_dma_addr,
-				td_cmd, td_offset, slen, td_tag);
-
-			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-			txd->cmd_type_offset_bsz = i40e_build_ctob(td_cmd,
-						td_offset, slen, td_tag);
-			txe->last_id = tx_last;
-			tx_id = txe->next_id;
-			txe = txn;
-			m_seg = m_seg->next;
-		} while (m_seg != NULL);
-
-		/* The last packet data descriptor needs End Of Packet (EOP) */
-		td_cmd |= CI_TX_DESC_CMD_EOP;
-		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
-		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
-
-		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
-			PMD_TX_LOG(DEBUG,
-				   "Setting RS bit on TXD id="
-				   "%4u (port=%d queue=%d)",
-				   tx_last, txq->port_id, txq->queue_id);
-
-			td_cmd |= CI_TX_DESC_CMD_RS;
-
-			/* Update txq RS bit counters */
-			txq->nb_tx_used = 0;
-		}
-
-		txd->cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S);
-	}
-
-end_of_tx:
-	PMD_TX_LOG(DEBUG, "port_id=%u queue_id=%u tx_tail=%u nb_tx=%u",
-		   (unsigned) txq->port_id, (unsigned) txq->queue_id,
-		   (unsigned) tx_id, (unsigned) nb_tx);
-
-	rte_io_wmb();
-	I40E_PCI_REG_WC_WRITE_RELAXED(txq->qtx_tail, tx_id);
-	txq->tx_tail = tx_id;
-
-	return nb_tx;
+	/* i40e does not support timestamp queues, so pass NULL for ts_fns */
+	return ci_xmit_pkts(tx_queue, tx_pkts, nb_pkts, get_context_desc, NULL);
 }
 
 static __rte_always_inline int
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [RFC PATCH 14/27] net/intel: add IPSec hooks to common Tx function
  2025-12-19 17:25 [RFC PATCH 00/27] combine multiple Intel scalar Tx paths Bruce Richardson
                   ` (12 preceding siblings ...)
  2025-12-19 17:25 ` [RFC PATCH 13/27] net/i40e: use " Bruce Richardson
@ 2025-12-19 17:25 ` Bruce Richardson
  2025-12-19 17:25 ` [RFC PATCH 15/27] net/intel: support configurable VLAN tag insertion on Tx Bruce Richardson
                   ` (16 subsequent siblings)
  30 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2025-12-19 17:25 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

The iavf driver has IPSec offload support on Tx, so add hooks to the
common Tx function to support that. Do so in a way that has zero
performance impact for drivers which do not have IPSec support, by
passing in compile-time NULL constants for the function pointers, which
can be optimized away by the compiler.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar_fns.h | 60 ++++++++++++++++++++++--
 drivers/net/intel/i40e/i40e_rxtx.c       |  4 +-
 drivers/net/intel/ice/ice_rxtx.c         |  4 +-
 3 files changed, 60 insertions(+), 8 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index 70b22f1da0..8c0de26537 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -152,6 +152,24 @@ typedef uint16_t (*ci_get_ctx_desc_fn)(uint64_t ol_flags, const struct rte_mbuf
 		const union ci_tx_offload *tx_offload, const struct ci_tx_queue *txq,
 		uint32_t *td_offset, uint64_t *qw0, uint64_t *qw1);
 
+/* gets IPsec descriptor information and returns number of descriptors needed (0 or 1) */
+typedef uint16_t (*get_ipsec_desc_t)(const struct rte_mbuf *mbuf,
+		const struct ci_tx_queue *txq,
+		void **ipsec_metadata,
+		uint64_t *qw0,
+		uint64_t *qw1);
+/* calculates segment length for IPsec + TSO combinations */
+typedef uint16_t (*calc_ipsec_segment_len_t)(const struct rte_mbuf *mb_seg,
+		uint64_t ol_flags,
+		const void *ipsec_metadata,
+		uint16_t tlen);
+
+/** IPsec descriptor operations for drivers that support inline IPsec crypto. */
+struct ci_ipsec_ops {
+	get_ipsec_desc_t get_ipsec_desc;
+	calc_ipsec_segment_len_t calc_segment_len;
+};
+
 /* gets current timestamp tail index */
 typedef uint16_t (*get_ts_tail_t)(struct ci_tx_queue *txq);
 /* writes a timestamp descriptor and returns new tail index */
@@ -171,6 +189,7 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
 	     uint16_t nb_pkts,
 	     ci_get_ctx_desc_fn get_ctx_desc,
+	     const struct ci_ipsec_ops *ipsec_ops,
 	     const struct ci_timesstamp_queue_fns *ts_fns)
 {
 	volatile struct ci_tx_desc *ci_tx_ring;
@@ -206,6 +225,9 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 		(void)ci_tx_xmit_cleanup(txq);
 
 	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
+		void *ipsec_md = NULL;
+		uint16_t nb_ipsec = 0;
+		uint64_t ipsec_qw0 = 0, ipsec_qw1 = 0;
 		uint64_t cd_qw0, cd_qw1;
 		tx_pkt = *tx_pkts++;
 
@@ -225,17 +247,22 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 		nb_ctx = get_ctx_desc(ol_flags, tx_pkt, &tx_offload,
 			txq, &td_offset, &cd_qw0, &cd_qw1);
 
+		/* Get IPsec descriptor information if IPsec ops provided */
+		if (ipsec_ops != NULL)
+			nb_ipsec = ipsec_ops->get_ipsec_desc(tx_pkt, txq, &ipsec_md,
+					&ipsec_qw0, &ipsec_qw1);
+
 		/* The number of descriptors that must be allocated for
 		 * a packet equals to the number of the segments of that
-		 * packet plus the number of context descriptor if needed.
+		 * packet plus the number of context and IPsec descriptors if needed.
 		 * Recalculate the needed tx descs when TSO enabled in case
 		 * the mbuf data size exceeds max data size that hw allows
 		 * per tx desc.
 		 */
 		if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
-			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
+			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx + nb_ipsec);
 		else
-			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
+			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx + nb_ipsec);
 		tx_last = (uint16_t)(tx_id + nb_used - 1);
 
 		/* Circular ring */
@@ -288,6 +315,26 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			tx_id = txe->next_id;
 			txe = txn;
 		}
+
+		if (ipsec_ops != NULL && nb_ipsec > 0) {
+			/* Setup TX IPsec descriptor if required */
+			uint64_t *ipsec_txd = RTE_CAST_PTR(uint64_t *, &ci_tx_ring[tx_id]);
+
+			txn = &sw_ring[txe->next_id];
+			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
+			if (txe->mbuf) {
+				rte_pktmbuf_free_seg(txe->mbuf);
+				txe->mbuf = NULL;
+			}
+
+			ipsec_txd[0] = ipsec_qw0;
+			ipsec_txd[1] = ipsec_qw1;
+
+			txe->last_id = tx_last;
+			tx_id = txe->next_id;
+			txe = txn;
+		}
+
 		m_seg = tx_pkt;
 
 		do {
@@ -299,7 +346,12 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			txe->mbuf = m_seg;
 
 			/* Setup TX Descriptor */
-			slen = m_seg->data_len;
+			/* Calculate segment length, using IPsec callback if provided */
+			if (ipsec_ops != NULL)
+				slen = ipsec_ops->calc_segment_len(m_seg, ol_flags, ipsec_md, 0);
+			else
+				slen = m_seg->data_len;
+
 			buf_dma_addr = rte_mbuf_data_iova(m_seg);
 
 			while ((ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) &&
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index ecec70e0ac..e22fcfff60 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -1015,8 +1015,8 @@ get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
 uint16_t
 i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
-	/* i40e does not support timestamp queues, so pass NULL for ts_fns */
-	return ci_xmit_pkts(tx_queue, tx_pkts, nb_pkts, get_context_desc, NULL);
+	/* i40e does not support IPsec or timestamp queues, so pass NULL for both */
+	return ci_xmit_pkts(tx_queue, tx_pkts, nb_pkts, get_context_desc, NULL, NULL);
 }
 
 static __rte_always_inline int
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 384676cfc2..49ed6b8399 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3100,9 +3100,9 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
 
 	if (txq->tsq != NULL && txq->tsq->ts_flag > 0)
-		return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, &ts_fns);
+		return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, NULL, &ts_fns);
 
-	return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, NULL);
+	return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, NULL, NULL);
 }
 
 static __rte_always_inline int
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [RFC PATCH 15/27] net/intel: support configurable VLAN tag insertion on Tx
  2025-12-19 17:25 [RFC PATCH 00/27] combine multiple Intel scalar Tx paths Bruce Richardson
                   ` (13 preceding siblings ...)
  2025-12-19 17:25 ` [RFC PATCH 14/27] net/intel: add IPSec hooks to common " Bruce Richardson
@ 2025-12-19 17:25 ` Bruce Richardson
  2025-12-19 17:25 ` [RFC PATCH 16/27] net/iavf: use common scalar Tx function Bruce Richardson
                   ` (15 subsequent siblings)
  30 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2025-12-19 17:25 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

Make the VLAN tag insertion logic configurable in the common code, as to
where inner/outer tags get placed.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h            | 10 ++++++++++
 drivers/net/intel/common/tx_scalar_fns.h |  9 +++++++--
 drivers/net/intel/i40e/i40e_rxtx.c       |  4 ++--
 drivers/net/intel/ice/ice_rxtx.c         |  4 ++--
 4 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index a71b98f119..0d11daaab3 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -45,6 +45,16 @@
 #define CI_TX_CTX_DESC_TSYN             0x02
 #define CI_TX_CTX_DESC_IL2TAG2          0x04
 
+/**
+ * L2TAG1 Field Source Selection
+ * Specifies which mbuf VLAN field to use for the L2TAG1 field in data descriptors.
+ * Context descriptor VLAN handling (L2TAG2) is managed by driver-specific callbacks.
+ */
+enum ci_tx_l2tag1_field {
+	CI_VLAN_IN_L2TAG1,       /**< For VLAN (not QinQ), use L2Tag1 field in data desc */
+	CI_VLAN_IN_L2TAG2,       /**< For VLAN (not QinQ), use L2Tag2 field in ctx desc */
+};
+
 /* Common TX Descriptor Length Field Shifts */
 #define CI_TX_DESC_LEN_MACLEN_S         0  /* 7 BITS */
 #define CI_TX_DESC_LEN_IPLEN_S          7  /* 7 BITS */
diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index 8c0de26537..6079a558e4 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -188,6 +188,7 @@ static inline uint16_t
 ci_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
 	     uint16_t nb_pkts,
+	     enum ci_tx_l2tag1_field l2tag1_field,
 	     ci_get_ctx_desc_fn get_ctx_desc,
 	     const struct ci_ipsec_ops *ipsec_ops,
 	     const struct ci_timesstamp_queue_fns *ts_fns)
@@ -286,8 +287,12 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			}
 		}
 
-		/* Descriptor based VLAN insertion */
-		if (ol_flags & (RTE_MBUF_F_TX_VLAN | RTE_MBUF_F_TX_QINQ)) {
+		/* Descriptor based VLAN/QinQ insertion */
+		/* for single vlan offload, only insert in data desc with VLAN_IN_L2TAG1 is set
+		 * for qinq offload, we always put inner tag in L2Tag1
+		 */
+		if (((ol_flags & RTE_MBUF_F_TX_VLAN) && (l2tag1_field == CI_VLAN_IN_L2TAG1)) ||
+				(ol_flags & RTE_MBUF_F_TX_QINQ)) {
 			td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
 			td_tag = tx_pkt->vlan_tci;
 		}
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index e22fcfff60..2d12e6dd1a 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -1002,7 +1002,7 @@ get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
 	/* TX context descriptor based double VLAN insert */
 	if (ol_flags & RTE_MBUF_F_TX_QINQ) {
 		cd_l2tag2 = tx_pkt->vlan_tci_outer;
-		cd_type_cmd_tso_mss |= ((uint64_t)I40E_TX_CTX_DESC_IL2TAG2 << I40E_TXD_CTX_QW1_CMD_SHIFT);
+		cd_type_cmd_tso_mss |= (I40E_TX_CTX_DESC_IL2TAG2 << I40E_TXD_CTX_QW1_CMD_SHIFT);
 	}
 
 	*qw0 = rte_cpu_to_le_32(cd_tunneling_params) |
@@ -1016,7 +1016,7 @@ uint16_t
 i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
 	/* i40e does not support IPsec or timestamp queues, so pass NULL for both */
-	return ci_xmit_pkts(tx_queue, tx_pkts, nb_pkts, get_context_desc, NULL, NULL);
+	return ci_xmit_pkts(tx_queue, tx_pkts, nb_pkts, CI_VLAN_IN_L2TAG1, get_context_desc, NULL, NULL);
 }
 
 static __rte_always_inline int
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 49ed6b8399..2c73011181 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3100,9 +3100,9 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
 
 	if (txq->tsq != NULL && txq->tsq->ts_flag > 0)
-		return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, NULL, &ts_fns);
+		return ci_xmit_pkts(txq, tx_pkts, nb_pkts, CI_VLAN_IN_L2TAG1, get_context_desc, NULL, &ts_fns);
 
-	return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, NULL, NULL);
+	return ci_xmit_pkts(txq, tx_pkts, nb_pkts, CI_VLAN_IN_L2TAG1, get_context_desc, NULL, NULL);
 }
 
 static __rte_always_inline int
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [RFC PATCH 16/27] net/iavf: use common scalar Tx function
  2025-12-19 17:25 [RFC PATCH 00/27] combine multiple Intel scalar Tx paths Bruce Richardson
                   ` (14 preceding siblings ...)
  2025-12-19 17:25 ` [RFC PATCH 15/27] net/intel: support configurable VLAN tag insertion on Tx Bruce Richardson
@ 2025-12-19 17:25 ` Bruce Richardson
  2025-12-19 17:25 ` [RFC PATCH 17/27] net/i40e: document requirement for QinQ support Bruce Richardson
                   ` (14 subsequent siblings)
  30 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2025-12-19 17:25 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Vladimir Medvedkin

Now that the common scalar Tx function has all necessary hooks for the
features supported by the iavf driver, use the common function to avoid
duplicated code.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/iavf/iavf_rxtx.c | 534 ++++++-----------------------
 1 file changed, 109 insertions(+), 425 deletions(-)

diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index 9ce978e69c..f96876ca46 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -2327,7 +2327,7 @@ iavf_recv_pkts_bulk_alloc(void *rx_queue,
 
 /* Check if the context descriptor is needed for TX offloading */
 static inline uint16_t
-iavf_calc_context_desc(struct rte_mbuf *mb, uint8_t vlan_flag)
+iavf_calc_context_desc(const struct rte_mbuf *mb, uint8_t vlan_flag)
 {
 	uint64_t flags = mb->ol_flags;
 	if (flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG |
@@ -2345,44 +2345,7 @@ iavf_calc_context_desc(struct rte_mbuf *mb, uint8_t vlan_flag)
 }
 
 static inline void
-iavf_fill_ctx_desc_cmd_field(volatile uint64_t *field, struct rte_mbuf *m,
-		uint8_t vlan_flag)
-{
-	uint64_t cmd = 0;
-
-	/* TSO enabled */
-	if (m->ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
-		cmd = CI_TX_CTX_DESC_TSO << IAVF_TXD_CTX_QW1_CMD_SHIFT;
-
-	if ((m->ol_flags & RTE_MBUF_F_TX_VLAN &&
-			vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2) ||
-			m->ol_flags & RTE_MBUF_F_TX_QINQ) {
-		cmd |= CI_TX_CTX_DESC_IL2TAG2
-			<< IAVF_TXD_CTX_QW1_CMD_SHIFT;
-	}
-
-	if (IAVF_CHECK_TX_LLDP(m))
-		cmd |= IAVF_TX_CTX_DESC_SWTCH_UPLINK
-			<< IAVF_TXD_CTX_QW1_CMD_SHIFT;
-
-	*field |= cmd;
-}
-
-static inline void
-iavf_fill_ctx_desc_ipsec_field(volatile uint64_t *field,
-	struct iavf_ipsec_crypto_pkt_metadata *ipsec_md)
-{
-	uint64_t ipsec_field =
-		(uint64_t)ipsec_md->ctx_desc_ipsec_params <<
-			IAVF_TXD_CTX_QW1_IPSEC_PARAMS_CIPHERBLK_SHIFT;
-
-	*field |= ipsec_field;
-}
-
-
-static inline void
-iavf_fill_ctx_desc_tunnelling_field(volatile uint64_t *qw0,
-		const struct rte_mbuf *m)
+iavf_fill_ctx_desc_tunnelling_field(uint64_t *qw0, const struct rte_mbuf *m)
 {
 	uint64_t eip_typ = IAVF_TX_CTX_DESC_EIPT_NONE;
 	uint64_t eip_len = 0;
@@ -2457,7 +2420,7 @@ iavf_fill_ctx_desc_tunnelling_field(volatile uint64_t *qw0,
 
 static inline uint16_t
 iavf_fill_ctx_desc_segmentation_field(volatile uint64_t *field,
-	struct rte_mbuf *m, struct iavf_ipsec_crypto_pkt_metadata *ipsec_md)
+	const struct rte_mbuf *m, struct iavf_ipsec_crypto_pkt_metadata *ipsec_md)
 {
 	uint64_t segmentation_field = 0;
 	uint64_t total_length = 0;
@@ -2496,59 +2459,31 @@ struct iavf_tx_context_desc_qws {
 	__le64 qw1;
 };
 
-static inline void
-iavf_fill_context_desc(volatile struct iavf_tx_context_desc *desc,
-	struct rte_mbuf *m, struct iavf_ipsec_crypto_pkt_metadata *ipsec_md,
-	uint16_t *tlen, uint8_t vlan_flag)
+/* IPsec callback for ci_xmit_pkts - gets IPsec descriptor information */
+static uint16_t
+iavf_get_ipsec_desc(const struct rte_mbuf *mbuf, const struct ci_tx_queue *txq,
+		    void **ipsec_metadata, uint64_t *qw0, uint64_t *qw1)
 {
-	volatile struct iavf_tx_context_desc_qws *desc_qws =
-			(volatile struct iavf_tx_context_desc_qws *)desc;
-	/* fill descriptor type field */
-	desc_qws->qw1 = IAVF_TX_DESC_DTYPE_CONTEXT;
-
-	/* fill command field */
-	iavf_fill_ctx_desc_cmd_field(&desc_qws->qw1, m, vlan_flag);
-
-	/* fill segmentation field */
-	if (m->ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) {
-		/* fill IPsec field */
-		if (m->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD)
-			iavf_fill_ctx_desc_ipsec_field(&desc_qws->qw1,
-				ipsec_md);
-
-		*tlen = iavf_fill_ctx_desc_segmentation_field(&desc_qws->qw1,
-				m, ipsec_md);
-	}
-
-	/* fill tunnelling field */
-	if (m->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)
-		iavf_fill_ctx_desc_tunnelling_field(&desc_qws->qw0, m);
-	else
-		desc_qws->qw0 = 0;
+	struct iavf_ipsec_crypto_pkt_metadata *md;
 
-	desc_qws->qw0 = rte_cpu_to_le_64(desc_qws->qw0);
-	desc_qws->qw1 = rte_cpu_to_le_64(desc_qws->qw1);
+	if (!(mbuf->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD))
+		return 0;
 
-	/* vlan_flag specifies VLAN tag location for VLAN, and outer tag location for QinQ. */
-	if (m->ol_flags & RTE_MBUF_F_TX_QINQ)
-		desc->l2tag2 = vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2 ? m->vlan_tci_outer :
-						m->vlan_tci;
-	else if (m->ol_flags & RTE_MBUF_F_TX_VLAN && vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2)
-		desc->l2tag2 = m->vlan_tci;
-}
+	md = RTE_MBUF_DYNFIELD(mbuf, txq->ipsec_crypto_pkt_md_offset,
+				     struct iavf_ipsec_crypto_pkt_metadata *);
+	if (!md)
+		return 0;
 
+	*ipsec_metadata = md;
 
-static inline void
-iavf_fill_ipsec_desc(volatile struct iavf_tx_ipsec_desc *desc,
-	const struct iavf_ipsec_crypto_pkt_metadata *md, uint16_t *ipsec_len)
-{
-	desc->qw0 = rte_cpu_to_le_64(((uint64_t)md->l4_payload_len <<
+	/* Fill IPsec descriptor using existing logic */
+	*qw0 = rte_cpu_to_le_64(((uint64_t)md->l4_payload_len <<
 		IAVF_IPSEC_TX_DESC_QW0_L4PAYLEN_SHIFT) |
 		((uint64_t)md->esn << IAVF_IPSEC_TX_DESC_QW0_IPSECESN_SHIFT) |
 		((uint64_t)md->esp_trailer_len <<
 				IAVF_IPSEC_TX_DESC_QW0_TRAILERLEN_SHIFT));
 
-	desc->qw1 = rte_cpu_to_le_64(((uint64_t)md->sa_idx <<
+	*qw1 = rte_cpu_to_le_64(((uint64_t)md->sa_idx <<
 		IAVF_IPSEC_TX_DESC_QW1_IPSECSA_SHIFT) |
 		((uint64_t)md->next_proto <<
 				IAVF_IPSEC_TX_DESC_QW1_IPSECNH_SHIFT) |
@@ -2557,143 +2492,106 @@ iavf_fill_ipsec_desc(volatile struct iavf_tx_ipsec_desc *desc,
 		((uint64_t)(md->ol_flags & IAVF_IPSEC_CRYPTO_OL_FLAGS_NATT ?
 				1ULL : 0ULL) <<
 				IAVF_IPSEC_TX_DESC_QW1_UDP_SHIFT) |
-		(uint64_t)IAVF_TX_DESC_DTYPE_IPSEC);
+		((uint64_t)IAVF_TX_DESC_DTYPE_IPSEC <<
+				CI_TXD_QW1_DTYPE_S));
 
-	/**
-	 * TODO: Pre-calculate this in the Session initialization
-	 *
-	 * Calculate IPsec length required in data descriptor func when TSO
-	 * offload is enabled
-	 */
-	*ipsec_len = sizeof(struct rte_esp_hdr) + (md->len_iv >> 2) +
-			(md->ol_flags & IAVF_IPSEC_CRYPTO_OL_FLAGS_NATT ?
-			sizeof(struct rte_udp_hdr) : 0);
+	return 1; /* One IPsec descriptor needed */
 }
 
-static inline void
-iavf_build_data_desc_cmd_offset_fields(volatile uint64_t *qw1,
-		struct rte_mbuf *m, uint8_t vlan_flag)
+/* IPsec callback for ci_xmit_pkts - calculates segment length for IPsec+TSO */
+static uint16_t
+iavf_calc_ipsec_segment_len(const struct rte_mbuf *mb_seg, uint64_t ol_flags,
+			    const void *ipsec_metadata, uint16_t tlen)
 {
-	uint64_t command = 0;
-	uint64_t offset = 0;
-	uint64_t l2tag1 = 0;
-
-	*qw1 = CI_TX_DESC_DTYPE_DATA;
-
-	command = (uint64_t)CI_TX_DESC_CMD_ICRC;
-
-	/* Descriptor based VLAN insertion */
-	if ((vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1) &&
-			m->ol_flags & RTE_MBUF_F_TX_VLAN) {
-		command |= (uint64_t)CI_TX_DESC_CMD_IL2TAG1;
-		l2tag1 |= m->vlan_tci;
-	}
-
-	/* Descriptor based QinQ insertion. vlan_flag specifies outer tag location. */
-	if (m->ol_flags & RTE_MBUF_F_TX_QINQ) {
-		command |= (uint64_t)CI_TX_DESC_CMD_IL2TAG1;
-		l2tag1 = vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1 ? m->vlan_tci_outer :
-									m->vlan_tci;
+	const struct iavf_ipsec_crypto_pkt_metadata *ipsec_md = ipsec_metadata;
+
+	if ((ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD) &&
+	    (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))) {
+		uint16_t ipseclen = ipsec_md ? (ipsec_md->esp_trailer_len +
+						ipsec_md->len_iv) : 0;
+		uint16_t slen = tlen + mb_seg->l2_len + mb_seg->l3_len +
+				mb_seg->outer_l3_len + ipseclen;
+		if (ol_flags & RTE_MBUF_F_TX_L4_MASK)
+			slen += mb_seg->l4_len;
+		return slen;
 	}
 
-	if ((m->ol_flags &
-	    (CI_TX_CKSUM_OFFLOAD_MASK | RTE_MBUF_F_TX_SEC_OFFLOAD)) == 0)
-		goto skip_cksum;
+	return mb_seg->data_len;
+}
 
-	/* Set MACLEN */
-	if (m->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK &&
-			!(m->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD))
-		offset |= (m->outer_l2_len >> 1)
-			<< CI_TX_DESC_LEN_MACLEN_S;
-	else
-		offset |= (m->l2_len >> 1)
-			<< CI_TX_DESC_LEN_MACLEN_S;
+/* Context descriptor callback for ci_xmit_pkts */
+static uint16_t
+iavf_get_context_desc(uint64_t ol_flags, const struct rte_mbuf *mbuf,
+		      const union ci_tx_offload *tx_offload __rte_unused,
+		      const struct ci_tx_queue *txq,
+		      uint32_t *td_offset __rte_unused, uint64_t *qw0, uint64_t *qw1)
+{
+	uint8_t iavf_vlan_flag;
+	uint16_t cd_l2tag2 = 0;
+	uint64_t cd_type_cmd = IAVF_TX_DESC_DTYPE_CONTEXT;
+	uint64_t cd_tunneling_params = 0;
+	uint16_t tlen = 0;
+	struct iavf_ipsec_crypto_pkt_metadata *ipsec_md = NULL;
+
+	/* Use IAVF-specific vlan_flag from txq */
+	iavf_vlan_flag = txq->vlan_flag;
+
+	/* Check if context descriptor is needed using existing IAVF logic */
+	if (!iavf_calc_context_desc(mbuf, iavf_vlan_flag))
+		return 0;
 
-	/* Enable L3 checksum offloading inner */
-	if (m->ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
-		if (m->ol_flags & RTE_MBUF_F_TX_IPV4) {
-			command |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
-			offset |= (m->l3_len >> 2) << CI_TX_DESC_LEN_IPLEN_S;
-		}
-	} else if (m->ol_flags & RTE_MBUF_F_TX_IPV4) {
-		command |= CI_TX_DESC_CMD_IIPT_IPV4;
-		offset |= (m->l3_len >> 2) << CI_TX_DESC_LEN_IPLEN_S;
-	} else if (m->ol_flags & RTE_MBUF_F_TX_IPV6) {
-		command |= CI_TX_DESC_CMD_IIPT_IPV6;
-		offset |= (m->l3_len >> 2) << CI_TX_DESC_LEN_IPLEN_S;
+	/* Get IPsec metadata if needed */
+	if (ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD) {
+		ipsec_md = RTE_MBUF_DYNFIELD(mbuf, txq->ipsec_crypto_pkt_md_offset,
+					     struct iavf_ipsec_crypto_pkt_metadata *);
 	}
 
-	if (m->ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) {
-		if (m->ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-			command |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
-		else
-			command |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
-		offset |= (m->l4_len >> 2) <<
-			      CI_TX_DESC_LEN_L4_LEN_S;
+	/* TSO command field */
+	if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) {
+		cd_type_cmd |= (uint64_t)CI_TX_CTX_DESC_TSO << IAVF_TXD_CTX_QW1_CMD_SHIFT;
 
-		*qw1 = rte_cpu_to_le_64((((uint64_t)command <<
-			IAVF_TXD_DATA_QW1_CMD_SHIFT) & IAVF_TXD_DATA_QW1_CMD_MASK) |
-			(((uint64_t)offset << IAVF_TXD_DATA_QW1_OFFSET_SHIFT) &
-			IAVF_TXD_DATA_QW1_OFFSET_MASK) |
-			((uint64_t)l2tag1 << IAVF_TXD_DATA_QW1_L2TAG1_SHIFT));
+		/* IPsec field for TSO */
+		if (ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD && ipsec_md) {
+			uint64_t ipsec_field = (uint64_t)ipsec_md->ctx_desc_ipsec_params <<
+				IAVF_TXD_CTX_QW1_IPSEC_PARAMS_CIPHERBLK_SHIFT;
+			cd_type_cmd |= ipsec_field;
+		}
 
-		return;
+		/* TSO segmentation field */
+		tlen = iavf_fill_ctx_desc_segmentation_field(&cd_type_cmd,
+							     mbuf, ipsec_md);
+		(void)tlen; /* Suppress unused variable warning */
 	}
 
-	/* Enable L4 checksum offloads */
-	switch (m->ol_flags & RTE_MBUF_F_TX_L4_MASK) {
-	case RTE_MBUF_F_TX_TCP_CKSUM:
-		command |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
-		offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
-				CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	case RTE_MBUF_F_TX_SCTP_CKSUM:
-		command |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
-		offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
-				CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	case RTE_MBUF_F_TX_UDP_CKSUM:
-		command |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
-		offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
-				CI_TX_DESC_LEN_L4_LEN_S;
-		break;
+	/* VLAN field for L2TAG2 */
+	if ((ol_flags & RTE_MBUF_F_TX_VLAN &&
+	     iavf_vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2) ||
+	    ol_flags & RTE_MBUF_F_TX_QINQ) {
+		cd_type_cmd |= (uint64_t)CI_TX_CTX_DESC_IL2TAG2 << IAVF_TXD_CTX_QW1_CMD_SHIFT;
 	}
 
-skip_cksum:
-	*qw1 = rte_cpu_to_le_64((((uint64_t)command <<
-		IAVF_TXD_DATA_QW1_CMD_SHIFT) & IAVF_TXD_DATA_QW1_CMD_MASK) |
-		(((uint64_t)offset << IAVF_TXD_DATA_QW1_OFFSET_SHIFT) &
-		IAVF_TXD_DATA_QW1_OFFSET_MASK) |
-		((uint64_t)l2tag1 << IAVF_TXD_DATA_QW1_L2TAG1_SHIFT));
-}
-
-static inline void
-iavf_fill_data_desc(volatile struct ci_tx_desc *desc,
-	uint64_t desc_template,	uint16_t buffsz,
-	uint64_t buffer_addr)
-{
-	/* fill data descriptor qw1 from template */
-	desc->cmd_type_offset_bsz = desc_template;
-
-	/* set data buffer size */
-	desc->cmd_type_offset_bsz |=
-		(((uint64_t)buffsz << IAVF_TXD_DATA_QW1_TX_BUF_SZ_SHIFT) &
-		IAVF_TXD_DATA_QW1_TX_BUF_SZ_MASK);
-
-	desc->buffer_addr = rte_cpu_to_le_64(buffer_addr);
-	desc->cmd_type_offset_bsz = rte_cpu_to_le_64(desc->cmd_type_offset_bsz);
-}
-
+	/* LLDP switching field */
+	if (IAVF_CHECK_TX_LLDP(mbuf))
+		cd_type_cmd |= IAVF_TX_CTX_DESC_SWTCH_UPLINK << IAVF_TXD_CTX_QW1_CMD_SHIFT;
+
+	/* Tunneling field */
+	if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)
+		iavf_fill_ctx_desc_tunnelling_field((uint64_t *)&cd_tunneling_params, mbuf);
+
+	/* L2TAG2 field (VLAN) */
+	if (ol_flags & RTE_MBUF_F_TX_QINQ) {
+		cd_l2tag2 = iavf_vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2 ?
+			    mbuf->vlan_tci_outer : mbuf->vlan_tci;
+	} else if (ol_flags & RTE_MBUF_F_TX_VLAN &&
+		   iavf_vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2) {
+		cd_l2tag2 = mbuf->vlan_tci;
+	}
 
-static struct iavf_ipsec_crypto_pkt_metadata *
-iavf_ipsec_crypto_get_pkt_metadata(const struct ci_tx_queue *txq,
-		struct rte_mbuf *m)
-{
-	if (m->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD)
-		return RTE_MBUF_DYNFIELD(m, txq->ipsec_crypto_pkt_md_offset,
-				struct iavf_ipsec_crypto_pkt_metadata *);
+	/* Set outputs */
+	*qw0 = rte_cpu_to_le_64(cd_tunneling_params | ((uint64_t)cd_l2tag2 << 32));
+	*qw1 = rte_cpu_to_le_64(cd_type_cmd);
 
-	return NULL;
+	return 1; /* One context descriptor needed */
 }
 
 /* TX function */
@@ -2701,231 +2599,17 @@ uint16_t
 iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = tx_queue;
-	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
-	struct ci_tx_entry *txe_ring = txq->sw_ring;
-	struct ci_tx_entry *txe, *txn;
-	struct rte_mbuf *mb, *mb_seg;
-	uint64_t buf_dma_addr;
-	uint16_t desc_idx, desc_idx_last;
-	uint16_t idx;
-	uint16_t slen;
-
-
-	/* Check if the descriptor ring needs to be cleaned. */
-	if (txq->nb_tx_free < txq->tx_free_thresh)
-		ci_tx_xmit_cleanup(txq);
-
-	desc_idx = txq->tx_tail;
-	txe = &txe_ring[desc_idx];
-
-	for (idx = 0; idx < nb_pkts; idx++) {
-		volatile struct ci_tx_desc *ddesc;
-		struct iavf_ipsec_crypto_pkt_metadata *ipsec_md;
-
-		uint16_t nb_desc_ctx, nb_desc_ipsec;
-		uint16_t nb_desc_data, nb_desc_required;
-		uint16_t tlen = 0, ipseclen = 0;
-		uint64_t ddesc_template = 0;
-		uint64_t ddesc_cmd = 0;
-
-		mb = tx_pkts[idx];
 
-		RTE_MBUF_PREFETCH_TO_FREE(txe->mbuf);
-
-		/**
-		 * Get metadata for ipsec crypto from mbuf dynamic fields if
-		 * security offload is specified.
-		 */
-		ipsec_md = iavf_ipsec_crypto_get_pkt_metadata(txq, mb);
-
-		nb_desc_data = mb->nb_segs;
-		nb_desc_ctx =
-			iavf_calc_context_desc(mb, txq->vlan_flag);
-		nb_desc_ipsec = !!(mb->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD);
-
-		/**
-		 * The number of descriptors that must be allocated for
-		 * a packet equals to the number of the segments of that
-		 * packet plus the context and ipsec descriptors if needed.
-		 * Recalculate the needed tx descs when TSO enabled in case
-		 * the mbuf data size exceeds max data size that hw allows
-		 * per tx desc.
-		 */
-		if (mb->ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-			nb_desc_required = ci_calc_pkt_desc(mb) + nb_desc_ctx + nb_desc_ipsec;
-		else
-			nb_desc_required = nb_desc_data + nb_desc_ctx + nb_desc_ipsec;
-
-		desc_idx_last = (uint16_t)(desc_idx + nb_desc_required - 1);
-
-		/* wrap descriptor ring */
-		if (desc_idx_last >= txq->nb_tx_desc)
-			desc_idx_last =
-				(uint16_t)(desc_idx_last - txq->nb_tx_desc);
-
-		PMD_TX_LOG(DEBUG,
-			"port_id=%u queue_id=%u tx_first=%u tx_last=%u",
-			txq->port_id, txq->queue_id, desc_idx, desc_idx_last);
-
-		if (nb_desc_required > txq->nb_tx_free) {
-			if (ci_tx_xmit_cleanup(txq)) {
-				if (idx == 0)
-					return 0;
-				goto end_of_tx;
-			}
-			if (unlikely(nb_desc_required > txq->tx_rs_thresh)) {
-				while (nb_desc_required > txq->nb_tx_free) {
-					if (ci_tx_xmit_cleanup(txq)) {
-						if (idx == 0)
-							return 0;
-						goto end_of_tx;
-					}
-				}
-			}
-		}
-
-		iavf_build_data_desc_cmd_offset_fields(&ddesc_template, mb,
-			txq->vlan_flag);
-
-			/* Setup TX context descriptor if required */
-		if (nb_desc_ctx) {
-			volatile struct iavf_tx_context_desc *ctx_desc =
-				(volatile struct iavf_tx_context_desc *)
-					&txr[desc_idx];
-
-			/* clear QW0 or the previous writeback value
-			 * may impact next write
-			 */
-			*(volatile uint64_t *)ctx_desc = 0;
-
-			txn = &txe_ring[txe->next_id];
-			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
-
-			if (txe->mbuf) {
-				rte_pktmbuf_free_seg(txe->mbuf);
-				txe->mbuf = NULL;
-			}
-
-			iavf_fill_context_desc(ctx_desc, mb, ipsec_md, &tlen,
-				txq->vlan_flag);
-			IAVF_DUMP_TX_DESC(txq, ctx_desc, desc_idx);
-
-			txe->last_id = desc_idx_last;
-			desc_idx = txe->next_id;
-			txe = txn;
-		}
-
-		if (nb_desc_ipsec) {
-			volatile struct iavf_tx_ipsec_desc *ipsec_desc =
-				(volatile struct iavf_tx_ipsec_desc *)
-					&txr[desc_idx];
-
-			txn = &txe_ring[txe->next_id];
-			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
-
-			if (txe->mbuf) {
-				rte_pktmbuf_free_seg(txe->mbuf);
-				txe->mbuf = NULL;
-			}
-
-			iavf_fill_ipsec_desc(ipsec_desc, ipsec_md, &ipseclen);
-
-			IAVF_DUMP_TX_DESC(txq, ipsec_desc, desc_idx);
-
-			txe->last_id = desc_idx_last;
-			desc_idx = txe->next_id;
-			txe = txn;
-		}
-
-		mb_seg = mb;
-
-		do {
-			ddesc = (volatile struct ci_tx_desc *)
-					&txr[desc_idx];
-
-			txn = &txe_ring[txe->next_id];
-			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
-
-			if (txe->mbuf)
-				rte_pktmbuf_free_seg(txe->mbuf);
-
-			txe->mbuf = mb_seg;
-
-			if ((mb_seg->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD) &&
-					(mb_seg->ol_flags &
-						(RTE_MBUF_F_TX_TCP_SEG |
-						RTE_MBUF_F_TX_UDP_SEG))) {
-				slen = tlen + mb_seg->l2_len + mb_seg->l3_len +
-						mb_seg->outer_l3_len + ipseclen;
-				if (mb_seg->ol_flags & RTE_MBUF_F_TX_L4_MASK)
-					slen += mb_seg->l4_len;
-			} else {
-				slen = mb_seg->data_len;
-			}
-
-			buf_dma_addr = rte_mbuf_data_iova(mb_seg);
-			while ((mb_seg->ol_flags & (RTE_MBUF_F_TX_TCP_SEG |
-					RTE_MBUF_F_TX_UDP_SEG)) &&
-					unlikely(slen > CI_MAX_DATA_PER_TXD)) {
-				iavf_fill_data_desc(ddesc, ddesc_template,
-					CI_MAX_DATA_PER_TXD, buf_dma_addr);
-
-				IAVF_DUMP_TX_DESC(txq, ddesc, desc_idx);
-
-				buf_dma_addr += CI_MAX_DATA_PER_TXD;
-				slen -= CI_MAX_DATA_PER_TXD;
-
-				txe->last_id = desc_idx_last;
-				desc_idx = txe->next_id;
-				txe = txn;
-				ddesc = &txr[desc_idx];
-				txn = &txe_ring[txe->next_id];
-			}
-
-			iavf_fill_data_desc(ddesc, ddesc_template,
-					slen, buf_dma_addr);
-
-			IAVF_DUMP_TX_DESC(txq, ddesc, desc_idx);
-
-			txe->last_id = desc_idx_last;
-			desc_idx = txe->next_id;
-			txe = txn;
-			mb_seg = mb_seg->next;
-		} while (mb_seg);
-
-		/* The last packet data descriptor needs End Of Packet (EOP) */
-		ddesc_cmd = CI_TX_DESC_CMD_EOP;
-
-		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_desc_required);
-		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_desc_required);
-
-		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
-			PMD_TX_LOG(DEBUG, "Setting RS bit on TXD id="
-				   "%4u (port=%d queue=%d)",
-				   desc_idx_last, txq->port_id, txq->queue_id);
-
-			ddesc_cmd |= CI_TX_DESC_CMD_RS;
-
-			/* Update txq RS bit counters */
-			txq->nb_tx_used = 0;
-		}
-
-		ddesc->cmd_type_offset_bsz |= rte_cpu_to_le_64(ddesc_cmd <<
-				IAVF_TXD_DATA_QW1_CMD_SHIFT);
-
-		IAVF_DUMP_TX_DESC(txq, ddesc, desc_idx - 1);
-	}
-
-end_of_tx:
-	rte_wmb();
-
-	PMD_TX_LOG(DEBUG, "port_id=%u queue_id=%u tx_tail=%u nb_tx=%u",
-		   txq->port_id, txq->queue_id, desc_idx, idx);
-
-	IAVF_PCI_REG_WRITE_RELAXED(txq->qtx_tail, desc_idx);
-	txq->tx_tail = desc_idx;
+	const struct ci_ipsec_ops ipsec_ops = {
+		.get_ipsec_desc = iavf_get_ipsec_desc,
+		.calc_segment_len = iavf_calc_ipsec_segment_len,
+	};
 
-	return idx;
+	/* IAVF does not support timestamp queues, so pass NULL for ts_fns */
+	return ci_xmit_pkts(txq, tx_pkts, nb_pkts,
+			    (txq->vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1) ?
+				CI_VLAN_IN_L2TAG1 : CI_VLAN_IN_L2TAG2,
+			    iavf_get_context_desc, &ipsec_ops, NULL);
 }
 
 /* Check if the packet with vlan user priority is transmitted in the
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [RFC PATCH 17/27] net/i40e: document requirement for QinQ support
  2025-12-19 17:25 [RFC PATCH 00/27] combine multiple Intel scalar Tx paths Bruce Richardson
                   ` (15 preceding siblings ...)
  2025-12-19 17:25 ` [RFC PATCH 16/27] net/iavf: use common scalar Tx function Bruce Richardson
@ 2025-12-19 17:25 ` Bruce Richardson
  2025-12-19 17:25 ` [RFC PATCH 18/27] net/idpf: use common scalar Tx function Bruce Richardson
                   ` (13 subsequent siblings)
  30 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2025-12-19 17:25 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

In order to get multiple VLANs inserted in an outgoing packet with QinQ
offload the i40e driver needs to be set to double vlan mode. This is
done by using the VLAN_EXTEND Rx config flag. Add a code check for this
dependency and update the docs about it.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 doc/guides/nics/i40e.rst           | 18 ++++++++++++++++++
 drivers/net/intel/i40e/i40e_rxtx.c |  9 +++++++++
 2 files changed, 27 insertions(+)

diff --git a/doc/guides/nics/i40e.rst b/doc/guides/nics/i40e.rst
index 45dc083c94..cbfaddbdd8 100644
--- a/doc/guides/nics/i40e.rst
+++ b/doc/guides/nics/i40e.rst
@@ -245,6 +245,24 @@ Runtime Configuration
   * ``segment``: Check number of mbuf segments not exceed hw limitation.
   * ``offload``: Check any unsupported offload flag.
 
+QinQ Configuration
+~~~~~~~~~~~~~~~~~~
+
+When using QinQ TX offload (``RTE_ETH_TX_OFFLOAD_QINQ_INSERT``), you must also
+enable ``RTE_ETH_RX_OFFLOAD_VLAN_EXTEND`` to configure the hardware for double
+VLAN mode. Without this, only the inner VLAN tag will be inserted.
+
+Example::
+
+  struct rte_eth_conf port_conf = {
+      .rxmode = {
+          .offloads = RTE_ETH_RX_OFFLOAD_VLAN_EXTEND,
+      },
+      .txmode = {
+          .offloads = RTE_ETH_TX_OFFLOAD_QINQ_INSERT,
+      },
+  };
+
 Vector RX Pre-conditions
 ~~~~~~~~~~~~~~~~~~~~~~~~
 For Vector RX it is assumed that the number of descriptor rings will be a power
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 2d12e6dd1a..aef78c5358 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -2171,6 +2171,15 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	vsi = i40e_pf_get_vsi_by_qindex(pf, queue_idx);
 	if (!vsi)
 		return -EINVAL;
+
+	/* Check if QinQ TX offload requires VLAN extend mode */
+	if ((offloads & RTE_ETH_TX_OFFLOAD_QINQ_INSERT) &&
+			!(dev->data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_VLAN_EXTEND)) {
+		PMD_DRV_LOG(WARNING, "Port %u: QinQ TX offload is enabled but VLAN extend mode is not set. ",
+				dev->data->port_id);
+		PMD_DRV_LOG(WARNING, "Double VLAN insertion may not work correctly without RTE_ETH_RX_OFFLOAD_VLAN_EXTEND set in Rx configuration.");
+	}
+
 	q_offset = i40e_get_queue_offset_by_qindex(pf, queue_idx);
 	if (q_offset < 0)
 		return -EINVAL;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [RFC PATCH 18/27] net/idpf: use common scalar Tx function
  2025-12-19 17:25 [RFC PATCH 00/27] combine multiple Intel scalar Tx paths Bruce Richardson
                   ` (16 preceding siblings ...)
  2025-12-19 17:25 ` [RFC PATCH 17/27] net/i40e: document requirement for QinQ support Bruce Richardson
@ 2025-12-19 17:25 ` Bruce Richardson
  2025-12-19 17:25 ` [RFC PATCH 19/27] net/intel: avoid writing the final pkt descriptor twice Bruce Richardson
                   ` (12 subsequent siblings)
  30 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2025-12-19 17:25 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Jingjing Wu, Praveen Shetty

Update idpf driver to use the common scalar Tx function in single-queue
configuration.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/idpf/idpf_common_rxtx.c | 179 ++--------------------
 1 file changed, 11 insertions(+), 168 deletions(-)

diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index b34d545a0a..81bc45f6ef 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -8,7 +8,6 @@
 
 #include "idpf_common_rxtx.h"
 #include "idpf_common_device.h"
-#include "../common/rx.h"
 
 int idpf_timestamp_dynfield_offset = -1;
 uint64_t idpf_timestamp_dynflag;
@@ -848,9 +847,11 @@ idpf_calc_context_desc(uint64_t flags)
 /* set TSO context descriptor, returns 0 if no context needed, 1 if context set
  */
 static inline uint16_t
-idpf_set_tso_ctx(uint64_t ol_flags, struct rte_mbuf *mbuf,
-			union ci_tx_offload tx_offload,
-			uint64_t *qw0, uint64_t *qw1)
+idpf_set_tso_ctx(uint64_t ol_flags, const struct rte_mbuf *mbuf,
+		 const union ci_tx_offload *tx_offload,
+		 const struct ci_tx_queue *txq __rte_unused,
+		 uint32_t *td_offset __rte_unused,
+		 uint64_t *qw0, uint64_t *qw1)
 {
 	uint16_t cmd_dtype = IDPF_TX_DESC_DTYPE_FLEX_TSO_CTX | IDPF_TX_FLEX_CTX_DESC_CMD_TSO;
 	uint16_t tso_segsz = mbuf->tso_segsz;
@@ -861,12 +862,12 @@ idpf_set_tso_ctx(uint64_t ol_flags, struct rte_mbuf *mbuf,
 		return 0;
 
 	/* TSO context descriptor setup */
-	if (tx_offload.l4_len == 0) {
+	if (tx_offload->l4_len == 0) {
 		TX_LOG(DEBUG, "L4 length set to 0");
 		return 0;
 	}
 
-	hdr_len = tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len;
+	hdr_len = tx_offload->l2_len + tx_offload->l3_len + tx_offload->l4_len;
 	tso_len = mbuf->pkt_len - hdr_len;
 
 	*qw0 = rte_cpu_to_le_32(tso_len & IDPF_TXD_FLEX_CTX_MSS_RT_M) |
@@ -933,7 +934,8 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		tx_offload.tso_segsz = tx_pkt->tso_segsz;
 		/* Calculate the number of context descriptors needed. */
 		uint64_t cd_qw0, cd_qw1;
-		nb_ctx = idpf_set_tso_ctx(ol_flags, tx_pkt, tx_offload, &cd_qw0, &cd_qw1);
+		nb_ctx = idpf_set_tso_ctx(ol_flags, tx_pkt, &tx_offload, txq,
+					  NULL /* unused */, &cd_qw0, &cd_qw1);
 
 		/* Calculate the number of TX descriptors needed for
 		 * each packet. For TSO packets, use ci_calc_pkt_desc as
@@ -1339,167 +1341,8 @@ uint16_t
 idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			  uint16_t nb_pkts)
 {
-	volatile struct ci_tx_desc *txd;
-	volatile struct ci_tx_desc *txr;
-	union ci_tx_offload tx_offload = {0};
-	struct ci_tx_entry *txe, *txn;
-	struct ci_tx_entry *sw_ring;
-	struct ci_tx_queue *txq;
-	struct rte_mbuf *tx_pkt;
-	struct rte_mbuf *m_seg;
-	uint64_t buf_dma_addr;
-	uint32_t td_offset;
-	uint64_t ol_flags;
-	uint16_t tx_last;
-	uint16_t nb_used;
-	uint16_t nb_ctx;
-	uint16_t td_cmd;
-	uint16_t tx_id;
-	uint16_t nb_tx;
-	uint16_t slen;
-
-	nb_tx = 0;
-	txq = tx_queue;
-
-	if (unlikely(txq == NULL))
-		return nb_tx;
-
-	sw_ring = txq->sw_ring;
-	txr = txq->ci_tx_ring;
-	tx_id = txq->tx_tail;
-	txe = &sw_ring[tx_id];
-
-	/* Check if the descriptor ring needs to be cleaned. */
-	if (txq->nb_tx_free < txq->tx_free_thresh)
-		(void)ci_tx_xmit_cleanup(txq);
-
-	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
-		td_cmd = 0;
-		td_offset = 0;
-
-		tx_pkt = *tx_pkts++;
-		RTE_MBUF_PREFETCH_TO_FREE(txe->mbuf);
-
-		ol_flags = tx_pkt->ol_flags;
-		tx_offload.l2_len = tx_pkt->l2_len;
-		tx_offload.l3_len = tx_pkt->l3_len;
-		tx_offload.l4_len = tx_pkt->l4_len;
-		tx_offload.tso_segsz = tx_pkt->tso_segsz;
-		/* Calculate the number of context descriptors needed. */
-		uint64_t cd_qw0, cd_qw1;
-		nb_ctx = idpf_set_tso_ctx(ol_flags, tx_pkt, tx_offload, &cd_qw0, &cd_qw1);
-
-		/* The number of descriptors that must be allocated for
-		 * a packet. For TSO packets, use ci_calc_pkt_desc as
-		 * the mbuf data size might exceed max data size that hw allows
-		 * per tx desc.
-		 */
-		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
-		else
-			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
-		tx_last = (uint16_t)(tx_id + nb_used - 1);
-
-		/* Circular ring */
-		if (tx_last >= txq->nb_tx_desc)
-			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
-
-		TX_LOG(DEBUG, "port_id=%u queue_id=%u"
-		       " tx_first=%u tx_last=%u",
-		       txq->port_id, txq->queue_id, tx_id, tx_last);
-
-		if (nb_used > txq->nb_tx_free) {
-			if (ci_tx_xmit_cleanup(txq) != 0) {
-				if (nb_tx == 0)
-					return 0;
-				goto end_of_tx;
-			}
-			if (unlikely(nb_used > txq->tx_rs_thresh)) {
-				while (nb_used > txq->nb_tx_free) {
-					if (ci_tx_xmit_cleanup(txq) != 0) {
-						if (nb_tx == 0)
-							return 0;
-						goto end_of_tx;
-					}
-				}
-			}
-		}
-
-		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
-			td_cmd |= IDPF_TX_FLEX_DESC_CMD_CS_EN;
-
-		if (nb_ctx != 0) {
-			/* Setup TX context descriptor if required */
-			uint64_t *ctx_txd = RTE_CAST_PTR(uint64_t *, &txr[tx_id]);
-
-			txn = &sw_ring[txe->next_id];
-			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
-			if (txe->mbuf != NULL) {
-				rte_pktmbuf_free_seg(txe->mbuf);
-				txe->mbuf = NULL;
-			}
-
-			ctx_txd[0] = cd_qw0;
-			ctx_txd[1] = cd_qw1;
-
-			txe->last_id = tx_last;
-			tx_id = txe->next_id;
-			txe = txn;
-		}
-
-		m_seg = tx_pkt;
-		do {
-			txd = &txr[tx_id];
-			txn = &sw_ring[txe->next_id];
-
-			if (txe->mbuf != NULL)
-				rte_pktmbuf_free_seg(txe->mbuf);
-			txe->mbuf = m_seg;
-
-			/* Setup TX Descriptor */
-			slen = m_seg->data_len;
-			buf_dma_addr = rte_mbuf_data_iova(m_seg);
-			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
-				((uint64_t)td_cmd  << CI_TXD_QW1_CMD_S) |
-				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
-				((uint64_t)slen << CI_TXD_QW1_TX_BUF_SZ_S));
-
-			txe->last_id = tx_last;
-			tx_id = txe->next_id;
-			txe = txn;
-			m_seg = m_seg->next;
-		} while (m_seg);
-
-		/* The last packet data descriptor needs End Of Packet (EOP) */
-		td_cmd |= CI_TX_DESC_CMD_EOP;
-		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
-		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
-
-		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
-			TX_LOG(DEBUG, "Setting RS bit on TXD id="
-			       "%4u (port=%d queue=%d)",
-			       tx_last, txq->port_id, txq->queue_id);
-
-			td_cmd |= CI_TX_DESC_CMD_RS;
-
-			/* Update txq RS bit counters */
-			txq->nb_tx_used = 0;
-		}
-
-		txd->cmd_type_offset_bsz |= rte_cpu_to_le_16(td_cmd << CI_TXD_QW1_CMD_S);
-	}
-
-end_of_tx:
-	rte_wmb();
-
-	TX_LOG(DEBUG, "port_id=%u queue_id=%u tx_tail=%u nb_tx=%u",
-	       txq->port_id, txq->queue_id, tx_id, nb_tx);
-
-	IDPF_PCI_REG_WRITE(txq->qtx_tail, tx_id);
-	txq->tx_tail = tx_id;
-
-	return nb_tx;
+	return ci_xmit_pkts(tx_queue, tx_pkts, nb_pkts, CI_VLAN_IN_L2TAG1,
+			idpf_set_tso_ctx, NULL, NULL);
 }
 
 /* TX prep functions */
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [RFC PATCH 19/27] net/intel: avoid writing the final pkt descriptor twice
  2025-12-19 17:25 [RFC PATCH 00/27] combine multiple Intel scalar Tx paths Bruce Richardson
                   ` (17 preceding siblings ...)
  2025-12-19 17:25 ` [RFC PATCH 18/27] net/idpf: use common scalar Tx function Bruce Richardson
@ 2025-12-19 17:25 ` Bruce Richardson
  2025-12-19 17:25 ` [RFC PATCH 20/27] net/intel: write descriptors using non-volatile pointers Bruce Richardson
                   ` (11 subsequent siblings)
  30 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2025-12-19 17:25 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

In the scalar datapath, there is a loop to handle multi-segment, and
multi-descriptor packets on Tx. After that loop, the end-of-packet bit
was written to the descriptor separately, meaning that for each
single-descriptor packet there were two writes to the second quad-word -
basically 3 x 64-bit writes rather than just 2. Adjusting the code to
compute the EOP bit inside the loop saves that extra write per packet
and so improves performance.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar_fns.h | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index 6079a558e4..7b643fcf44 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -378,6 +378,10 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 				txn = &sw_ring[txe->next_id];
 			}
 
+			/* fill the last descriptor with End of Packet (EOP) bit */
+			if (m_seg->next == NULL)
+				td_cmd |= CI_TX_DESC_CMD_EOP;
+
 			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
 			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
 				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
@@ -390,21 +394,17 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			txe = txn;
 			m_seg = m_seg->next;
 		} while (m_seg);
-
-		/* fill the last descriptor with End of Packet (EOP) bit */
-		td_cmd |= CI_TX_DESC_CMD_EOP;
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
 		/* set RS bit on the last descriptor of one packet */
 		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
-			td_cmd |= CI_TX_DESC_CMD_RS;
+			txd->cmd_type_offset_bsz |=
+					rte_cpu_to_le_64(CI_TX_DESC_CMD_RS << CI_TXD_QW1_CMD_S);
 
 			/* Update txq RS bit counters */
 			txq->nb_tx_used = 0;
 		}
-		txd->cmd_type_offset_bsz |=
-				rte_cpu_to_le_64(((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S);
 
 		if (ts_fns != NULL)
 			ts_id = ts_fns->write_ts_desc(txq, tx_pkt, tx_id, ts_id);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [RFC PATCH 20/27] net/intel: write descriptors using non-volatile pointers
  2025-12-19 17:25 [RFC PATCH 00/27] combine multiple Intel scalar Tx paths Bruce Richardson
                   ` (18 preceding siblings ...)
  2025-12-19 17:25 ` [RFC PATCH 19/27] net/intel: avoid writing the final pkt descriptor twice Bruce Richardson
@ 2025-12-19 17:25 ` Bruce Richardson
  2025-12-20  8:43   ` Morten Brørup
  2025-12-19 17:25 ` [RFC PATCH 21/27] net/intel: remove unnecessary flag clearing Bruce Richardson
                   ` (10 subsequent siblings)
  30 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2025-12-19 17:25 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

Use a non-volatile uint64_t pointer to store to the descriptor ring.
This will allow the compiler to optionally merge the stores as it sees
best.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar_fns.h | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index 7b643fcf44..95e9acbe60 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -184,6 +184,15 @@ struct ci_timesstamp_queue_fns {
 	write_ts_tail_t write_ts_tail;
 };
 
+static inline void
+write_txd(volatile void *txd, uint64_t qw0, uint64_t qw1)
+{
+	uint64_t *txd_qw = RTE_CAST_PTR(void *, txd);
+
+	txd_qw[0] = rte_cpu_to_le_64(qw0);
+	txd_qw[1] = rte_cpu_to_le_64(qw1);
+}
+
 static inline uint16_t
 ci_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
@@ -313,8 +322,7 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 				txe->mbuf = NULL;
 			}
 
-			ctx_txd[0] = cd_qw0;
-			ctx_txd[1] = cd_qw1;
+			write_txd(ctx_txd, cd_qw0, cd_qw1);
 
 			txe->last_id = tx_last;
 			tx_id = txe->next_id;
@@ -361,12 +369,12 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 
 			while ((ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) &&
 					unlikely(slen > CI_MAX_DATA_PER_TXD)) {
-				txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-				txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+				const uint64_t cmd_type_offset_bsz = CI_TX_DESC_DTYPE_DATA |
 					((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
 					((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
 					((uint64_t)CI_MAX_DATA_PER_TXD << CI_TXD_QW1_TX_BUF_SZ_S) |
-					((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
+					((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S);
+				write_txd(txd, buf_dma_addr, cmd_type_offset_bsz);
 
 				buf_dma_addr += CI_MAX_DATA_PER_TXD;
 				slen -= CI_MAX_DATA_PER_TXD;
@@ -382,12 +390,12 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			if (m_seg->next == NULL)
 				td_cmd |= CI_TX_DESC_CMD_EOP;
 
-			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+			const uint64_t cmd_type_offset_bsz = CI_TX_DESC_DTYPE_DATA |
 				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
 				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
 				((uint64_t)slen << CI_TXD_QW1_TX_BUF_SZ_S) |
-				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
+				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S);
+			write_txd(txd, buf_dma_addr, cmd_type_offset_bsz);
 
 			txe->last_id = tx_last;
 			tx_id = txe->next_id;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [RFC PATCH 21/27] net/intel: remove unnecessary flag clearing
  2025-12-19 17:25 [RFC PATCH 00/27] combine multiple Intel scalar Tx paths Bruce Richardson
                   ` (19 preceding siblings ...)
  2025-12-19 17:25 ` [RFC PATCH 20/27] net/intel: write descriptors using non-volatile pointers Bruce Richardson
@ 2025-12-19 17:25 ` Bruce Richardson
  2025-12-19 17:25 ` [RFC PATCH 22/27] net/intel: mark mid-burst ring cleanup as unlikely Bruce Richardson
                   ` (9 subsequent siblings)
  30 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2025-12-19 17:25 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

When cleaning the Tx ring, there is no need to zero out the done flag
from the completed entry. That flag will be automatically cleared when
the descriptor is next written. This gives a small performance benefit.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar_fns.h | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index 95e9acbe60..cb45029bd7 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -51,13 +51,6 @@ ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
 	else
 		nb_tx_to_clean = (uint16_t)(desc_to_clean_to - last_desc_cleaned);
 
-	/* The last descriptor to clean is done, so that means all the
-	 * descriptors from the last descriptor that was cleaned
-	 * up to the last descriptor with the RS bit set
-	 * are done. Only reset the threshold descriptor.
-	 */
-	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
-
 	/* Update the txq to reflect the last descriptor that was cleaned */
 	txq->last_desc_cleaned = desc_to_clean_to;
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [RFC PATCH 22/27] net/intel: mark mid-burst ring cleanup as unlikely
  2025-12-19 17:25 [RFC PATCH 00/27] combine multiple Intel scalar Tx paths Bruce Richardson
                   ` (20 preceding siblings ...)
  2025-12-19 17:25 ` [RFC PATCH 21/27] net/intel: remove unnecessary flag clearing Bruce Richardson
@ 2025-12-19 17:25 ` Bruce Richardson
  2025-12-19 17:25 ` [RFC PATCH 23/27] net/intel: add special handling for single desc packets Bruce Richardson
                   ` (8 subsequent siblings)
  30 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2025-12-19 17:25 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

It should rarely be the case that we need to cleanup the descriptor ring
mid-burst, so mark as unlikely to help performance.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar_fns.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index cb45029bd7..27791cf138 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -272,7 +272,7 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 		if (tx_last >= txq->nb_tx_desc)
 			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
 
-		if (nb_used > txq->nb_tx_free) {
+		if (unlikely(nb_used > txq->nb_tx_free)) {
 			if (ci_tx_xmit_cleanup(txq) != 0) {
 				if (nb_tx == 0)
 					return 0;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [RFC PATCH 23/27] net/intel: add special handling for single desc packets
  2025-12-19 17:25 [RFC PATCH 00/27] combine multiple Intel scalar Tx paths Bruce Richardson
                   ` (21 preceding siblings ...)
  2025-12-19 17:25 ` [RFC PATCH 22/27] net/intel: mark mid-burst ring cleanup as unlikely Bruce Richardson
@ 2025-12-19 17:25 ` Bruce Richardson
  2025-12-19 17:25 ` [RFC PATCH 24/27] net/intel: use separate array for desc status tracking Bruce Richardson
                   ` (7 subsequent siblings)
  30 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2025-12-19 17:25 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

Within the scalar packets, add a shortcut path for packets that don't
use TSO and have only a single data descriptor.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar_fns.h | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index 27791cf138..55502b46ed 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -304,6 +304,28 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			ci_txd_enable_checksum(ol_flags, &td_cmd,
 						&td_offset, tx_offload);
 
+		/* special case for single descriptor packet, without TSO offload */
+		if (nb_used == 1 && (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) == 0) {
+			txd = &ci_tx_ring[tx_id];
+			tx_id = txe->next_id;
+
+			if (txe->mbuf)
+				rte_pktmbuf_free_seg(txe->mbuf);
+			*txe = (struct ci_tx_entry){ .mbuf = tx_pkt, .last_id = tx_last, .next_id = tx_id };
+
+			/* Setup TX Descriptor */
+			td_cmd |= CI_TX_DESC_CMD_EOP;
+			const uint64_t cmd_type_offset_bsz = CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
+				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+				((uint64_t)tx_pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S) |
+				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S);
+			write_txd(txd, rte_mbuf_data_iova(tx_pkt), cmd_type_offset_bsz);
+
+			txe = &sw_ring[tx_id];
+			goto end_pkt;
+		}
+
 		if (nb_ctx) {
 			/* Setup TX context descriptor if required */
 			uint64_t *ctx_txd = RTE_CAST_PTR(uint64_t *, &ci_tx_ring[tx_id]);
@@ -395,6 +417,7 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			txe = txn;
 			m_seg = m_seg->next;
 		} while (m_seg);
+end_pkt:
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [RFC PATCH 24/27] net/intel: use separate array for desc status tracking
  2025-12-19 17:25 [RFC PATCH 00/27] combine multiple Intel scalar Tx paths Bruce Richardson
                   ` (22 preceding siblings ...)
  2025-12-19 17:25 ` [RFC PATCH 23/27] net/intel: add special handling for single desc packets Bruce Richardson
@ 2025-12-19 17:25 ` Bruce Richardson
  2025-12-19 17:25 ` [RFC PATCH 25/27] net/ixgbe: " Bruce Richardson
                   ` (6 subsequent siblings)
  30 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2025-12-19 17:25 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Vladimir Medvedkin, Anatoly Burakov,
	Jingjing Wu, Praveen Shetty

Rather than writing a last_id for each individual descriptor, we can
write one only for places where the "report status" (RS) bit is set,
i.e. the descriptors which will be written back when done. The method
used for marking what descriptors are free is also changed in the
process, even if the last descriptor with the "done" bits set is past
the expected point, we only track up to the expected point, and leave
the rest to be counted as freed next time. This means that we always
have the RS/DD bits set at fixed intervals, and we always track free
slots in units of the same tx_free_thresh intervals.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h             |  4 ++
 drivers/net/intel/common/tx_scalar_fns.h  | 59 +++++++++++------------
 drivers/net/intel/i40e/i40e_rxtx.c        | 20 ++++++++
 drivers/net/intel/iavf/iavf_rxtx.c        | 19 ++++++++
 drivers/net/intel/ice/ice_rxtx.c          | 20 ++++++++
 drivers/net/intel/idpf/idpf_common_rxtx.c |  7 +++
 drivers/net/intel/idpf/idpf_rxtx.c        | 13 +++++
 7 files changed, 110 insertions(+), 32 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 0d11daaab3..9b3f8385e6 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -126,6 +126,8 @@ struct ci_tx_queue {
 		struct ci_tx_entry *sw_ring; /* virtual address of SW ring */
 		struct ci_tx_entry_vec *sw_ring_vec;
 	};
+	/* Scalar TX path: Array tracking last_id at each RS threshold boundary */
+	uint16_t *rs_last_id;
 	uint16_t nb_tx_desc;           /* number of TX descriptors */
 	uint16_t tx_tail; /* current value of tail register */
 	uint16_t nb_tx_used; /* number of TX desc used since RS bit set */
@@ -139,6 +141,8 @@ struct ci_tx_queue {
 	uint16_t tx_free_thresh;
 	/* Number of TX descriptors to use before RS bit is set. */
 	uint16_t tx_rs_thresh;
+	/* Scalar TX path: log2 of tx_rs_thresh for efficient bit operations */
+	uint8_t log2_rs_thresh;
 	uint16_t port_id;  /* Device port identifier. */
 	uint16_t queue_id; /* TX queue index. */
 	uint16_t reg_idx;
diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index 55502b46ed..3d0a23eda3 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -23,37 +23,24 @@
 static __rte_always_inline int
 ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
 {
-	struct ci_tx_entry *sw_ring = txq->sw_ring;
 	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
-	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
-	uint16_t nb_tx_desc = txq->nb_tx_desc;
-	uint16_t desc_to_clean_to;
-	uint16_t nb_tx_to_clean;
+	const uint16_t last_desc_cleaned = txq->last_desc_cleaned;
+	const uint16_t nb_tx_desc = txq->nb_tx_desc;
 
-	/* Determine the last descriptor needing to be cleaned */
-	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
-	if (desc_to_clean_to >= nb_tx_desc)
-		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
-
-	/* Check to make sure the last descriptor to clean is done */
-	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
+	const uint16_t rs_idx = (last_desc_cleaned == nb_tx_desc - 1) ?
+			0 :
+			(last_desc_cleaned + 1) >> txq->log2_rs_thresh;
+	uint16_t desc_to_clean_to = (rs_idx << txq->log2_rs_thresh) + (txq->tx_rs_thresh - 1);
 
 	/* Check if descriptor is done - all drivers use 0xF as done value in bits 3:0 */
-	if ((txd[desc_to_clean_to].cmd_type_offset_bsz & rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
-			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE)) {
+	if ((txd[txq->rs_last_id[rs_idx]].cmd_type_offset_bsz & rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
 		/* Descriptor not yet processed by hardware */
 		return -1;
-	}
-
-	/* Figure out how many descriptors will be cleaned */
-	if (last_desc_cleaned > desc_to_clean_to)
-		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) + desc_to_clean_to);
-	else
-		nb_tx_to_clean = (uint16_t)(desc_to_clean_to - last_desc_cleaned);
 
 	/* Update the txq to reflect the last descriptor that was cleaned */
 	txq->last_desc_cleaned = desc_to_clean_to;
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
+	txq->nb_tx_free += txq->tx_rs_thresh;
 
 	return 0;
 }
@@ -232,6 +219,7 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 		uint16_t nb_ipsec = 0;
 		uint64_t ipsec_qw0 = 0, ipsec_qw1 = 0;
 		uint64_t cd_qw0, cd_qw1;
+		uint16_t pkt_rs_idx;
 		tx_pkt = *tx_pkts++;
 
 		td_cmd = CI_TX_DESC_CMD_ICRC;
@@ -272,6 +260,9 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 		if (tx_last >= txq->nb_tx_desc)
 			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
 
+		/* Track the RS threshold bucket at packet start */
+		pkt_rs_idx = (uint16_t)(tx_id >> txq->log2_rs_thresh);
+
 		if (unlikely(nb_used > txq->nb_tx_free)) {
 			if (ci_tx_xmit_cleanup(txq) != 0) {
 				if (nb_tx == 0)
@@ -311,8 +302,7 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 
 			if (txe->mbuf)
 				rte_pktmbuf_free_seg(txe->mbuf);
-			*txe = (struct ci_tx_entry){ .mbuf = tx_pkt, .last_id = tx_last, .next_id = tx_id };
-
+			txe->mbuf = tx_pkt;
 			/* Setup TX Descriptor */
 			td_cmd |= CI_TX_DESC_CMD_EOP;
 			const uint64_t cmd_type_offset_bsz = CI_TX_DESC_DTYPE_DATA |
@@ -339,7 +329,6 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 
 			write_txd(ctx_txd, cd_qw0, cd_qw1);
 
-			txe->last_id = tx_last;
 			tx_id = txe->next_id;
 			txe = txn;
 		}
@@ -358,7 +347,6 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			ipsec_txd[0] = ipsec_qw0;
 			ipsec_txd[1] = ipsec_qw1;
 
-			txe->last_id = tx_last;
 			tx_id = txe->next_id;
 			txe = txn;
 		}
@@ -394,7 +382,6 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 				buf_dma_addr += CI_MAX_DATA_PER_TXD;
 				slen -= CI_MAX_DATA_PER_TXD;
 
-				txe->last_id = tx_last;
 				tx_id = txe->next_id;
 				txe = txn;
 				txd = &ci_tx_ring[tx_id];
@@ -412,7 +399,6 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S);
 			write_txd(txd, buf_dma_addr, cmd_type_offset_bsz);
 
-			txe->last_id = tx_last;
 			tx_id = txe->next_id;
 			txe = txn;
 			m_seg = m_seg->next;
@@ -421,13 +407,22 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
-		/* set RS bit on the last descriptor of one packet */
-		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
+		/* Check if packet crosses into a new RS threshold bucket.
+		 * The RS bit is set on the last descriptor when we move from one bucket to another.
+		 * For example, with tx_rs_thresh=32 and a 5-descriptor packet using slots 30-34:
+		 *   - pkt_rs_idx = 30 >> 5 = 0 (started in bucket 0)
+		 *   - tx_last = 34, so 35 >> 5 = 1 (next packet is in bucket 1)
+		 *   - Since 0 != 1, set RS bit on descriptor 34, and record rs_last_id[0] = 34
+		 */
+		uint16_t next_rs_idx = ((tx_last + 1) >> txq->log2_rs_thresh);
+
+		if (next_rs_idx != pkt_rs_idx) {
+			/* Packet crossed into a new bucket - set RS bit on last descriptor */
 			txd->cmd_type_offset_bsz |=
 					rte_cpu_to_le_64(CI_TX_DESC_CMD_RS << CI_TXD_QW1_CMD_S);
 
-			/* Update txq RS bit counters */
-			txq->nb_tx_used = 0;
+			/* Record the last descriptor ID for the bucket we're leaving */
+			txq->rs_last_id[pkt_rs_idx] = tx_last;
 		}
 
 		if (ts_fns != NULL)
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index aef78c5358..1fadd0407a 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -24,6 +24,7 @@
 #include <rte_ip.h>
 #include <rte_net.h>
 #include <rte_vect.h>
+#include <rte_bitops.h>
 
 #include "i40e_logs.h"
 #include "base/i40e_prototype.h"
@@ -2269,6 +2270,13 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 			     (int)queue_idx);
 		return I40E_ERR_PARAM;
 	}
+	if (!rte_is_power_of_2(tx_rs_thresh)) {
+		PMD_INIT_LOG(ERR, "tx_rs_thresh must be a power of 2. (tx_rs_thresh=%u port=%d queue=%d)",
+			     (unsigned int)tx_rs_thresh,
+			     (int)dev->data->port_id,
+			     (int)queue_idx);
+		return I40E_ERR_PARAM;
+	}
 	if ((tx_rs_thresh > 1) && (tx_conf->tx_thresh.wthresh != 0)) {
 		PMD_INIT_LOG(ERR, "TX WTHRESH must be set to 0 if "
 			     "tx_rs_thresh is greater than 1. "
@@ -2310,6 +2318,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	txq->mz = tz;
 	txq->nb_tx_desc = nb_desc;
 	txq->tx_rs_thresh = tx_rs_thresh;
+	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
 	txq->tx_free_thresh = tx_free_thresh;
 	txq->queue_id = queue_idx;
 	txq->reg_idx = reg_idx;
@@ -2333,6 +2342,16 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 		return -ENOMEM;
 	}
 
+	/* Allocate RS last_id tracking array */
+	uint16_t num_rs_buckets = nb_desc / tx_rs_thresh;
+	txq->rs_last_id = rte_zmalloc_socket(NULL, sizeof(txq->rs_last_id[0]) * num_rs_buckets,
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (txq->rs_last_id == NULL) {
+		i40e_tx_queue_release(txq);
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for RS last_id array");
+		return -ENOMEM;
+	}
+
 	i40e_reset_tx_queue(txq);
 	txq->q_set = TRUE;
 
@@ -2378,6 +2397,7 @@ i40e_tx_queue_release(void *txq)
 
 	ci_txq_release_all_mbufs(q, false);
 	rte_free(q->sw_ring);
+	rte_free(q->rs_last_id);
 	rte_memzone_free(q->mz);
 	rte_free(q);
 }
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index f96876ca46..4517d55011 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -25,6 +25,7 @@
 #include <rte_ip.h>
 #include <rte_net.h>
 #include <rte_vect.h>
+#include <rte_bitops.h>
 #include <rte_vxlan.h>
 #include <rte_gtp.h>
 #include <rte_geneve.h>
@@ -204,6 +205,11 @@ check_tx_thresh(uint16_t nb_desc, uint16_t tx_rs_thresh,
 			     tx_rs_thresh, nb_desc);
 		return -EINVAL;
 	}
+	if (!rte_is_power_of_2(tx_rs_thresh)) {
+		PMD_INIT_LOG(ERR, "tx_rs_thresh must be a power of 2. (tx_rs_thresh=%u)",
+		             tx_rs_thresh);
+		return -EINVAL;
+	}
 
 	return 0;
 }
@@ -804,6 +810,7 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
 
 	txq->nb_tx_desc = nb_desc;
 	txq->tx_rs_thresh = tx_rs_thresh;
+	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
 	txq->tx_free_thresh = tx_free_thresh;
 	txq->queue_id = queue_idx;
 	txq->port_id = dev->data->port_id;
@@ -827,6 +834,17 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
 		return -ENOMEM;
 	}
 
+	/* Allocate RS last_id tracking array */
+	uint16_t num_rs_buckets = nb_desc / tx_rs_thresh;
+	txq->rs_last_id = rte_zmalloc_socket(NULL, sizeof(txq->rs_last_id[0]) * num_rs_buckets,
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (txq->rs_last_id == NULL) {
+		PMD_INIT_LOG(ERR, "Failed to allocate memory for RS last_id array");
+		rte_free(txq->sw_ring);
+		rte_free(txq);
+		return -ENOMEM;
+	}
+
 	/* Allocate TX hardware ring descriptors. */
 	ring_size = sizeof(struct ci_tx_desc) * IAVF_MAX_RING_DESC;
 	ring_size = RTE_ALIGN(ring_size, IAVF_DMA_MEM_ALIGN);
@@ -1051,6 +1069,7 @@ iavf_dev_tx_queue_release(struct rte_eth_dev *dev, uint16_t qid)
 
 	ci_txq_release_all_mbufs(q, q->use_ctx);
 	rte_free(q->sw_ring);
+	rte_free(q->rs_last_id);
 	rte_memzone_free(q->mz);
 	rte_free(q);
 }
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 2c73011181..a6a454ddf5 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -5,6 +5,7 @@
 #include <ethdev_driver.h>
 #include <rte_net.h>
 #include <rte_vect.h>
+#include <rte_bitops.h>
 
 #include "ice_rxtx.h"
 #include "ice_rxtx_vec_common.h"
@@ -1576,6 +1577,13 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
 			     (int)queue_idx);
 		return -EINVAL;
 	}
+	if (!rte_is_power_of_2(tx_rs_thresh)) {
+		PMD_INIT_LOG(ERR, "tx_rs_thresh must be a power of 2. (tx_rs_thresh=%u port=%d queue=%d)",
+			     (unsigned int)tx_rs_thresh,
+			     (int)dev->data->port_id,
+			     (int)queue_idx);
+		return -EINVAL;
+	}
 	if (tx_rs_thresh > 1 && tx_conf->tx_thresh.wthresh != 0) {
 		PMD_INIT_LOG(ERR, "TX WTHRESH must be set to 0 if "
 			     "tx_rs_thresh is greater than 1. "
@@ -1618,6 +1626,7 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
 	txq->mz = tz;
 	txq->nb_tx_desc = nb_desc;
 	txq->tx_rs_thresh = tx_rs_thresh;
+	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
 	txq->tx_free_thresh = tx_free_thresh;
 	txq->queue_id = queue_idx;
 
@@ -1642,6 +1651,16 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
 		return -ENOMEM;
 	}
 
+	/* Allocate RS last_id tracking array */
+	uint16_t num_rs_buckets = nb_desc / tx_rs_thresh;
+	txq->rs_last_id = rte_zmalloc_socket(NULL, sizeof(txq->rs_last_id[0]) * num_rs_buckets,
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (txq->rs_last_id == NULL) {
+		ice_tx_queue_release(txq);
+		PMD_INIT_LOG(ERR, "Failed to allocate memory for RS last_id array");
+		return -ENOMEM;
+	}
+
 	if (vsi->type == ICE_VSI_PF && (offloads & RTE_ETH_TX_OFFLOAD_SEND_ON_TIMESTAMP)) {
 		if (hw->phy_model != ICE_PHY_E830) {
 			ice_tx_queue_release(txq);
@@ -1714,6 +1733,7 @@ ice_tx_queue_release(void *txq)
 
 	ci_txq_release_all_mbufs(q, false);
 	rte_free(q->sw_ring);
+	rte_free(q->rs_last_id);
 	if (q->tsq) {
 		rte_memzone_free(q->tsq->ts_mz);
 		rte_free(q->tsq);
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 81bc45f6ef..1d123f6350 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -5,6 +5,7 @@
 #include <eal_export.h>
 #include <rte_mbuf_dyn.h>
 #include <rte_errno.h>
+#include <rte_bitops.h>
 
 #include "idpf_common_rxtx.h"
 #include "idpf_common_device.h"
@@ -73,6 +74,11 @@ idpf_qc_tx_thresh_check(uint16_t nb_desc, uint16_t tx_rs_thresh,
 			tx_rs_thresh, nb_desc);
 		return -EINVAL;
 	}
+	if (!rte_is_power_of_2(tx_rs_thresh)) {
+		DRV_LOG(ERR, "tx_rs_thresh must be a power of 2. (tx_rs_thresh=%u)",
+			tx_rs_thresh);
+		return -EINVAL;
+	}
 
 	return 0;
 }
@@ -333,6 +339,7 @@ idpf_qc_tx_queue_release(void *txq)
 	}
 
 	ci_txq_release_all_mbufs(q, false);
+	rte_free(q->rs_last_id);
 	rte_free(q->sw_ring);
 	rte_memzone_free(q->mz);
 	rte_free(q);
diff --git a/drivers/net/intel/idpf/idpf_rxtx.c b/drivers/net/intel/idpf/idpf_rxtx.c
index e974eb44b0..5c2516f556 100644
--- a/drivers/net/intel/idpf/idpf_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_rxtx.c
@@ -437,6 +437,7 @@ idpf_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 
 	txq->nb_tx_desc = nb_desc;
 	txq->tx_rs_thresh = tx_rs_thresh;
+	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
 	txq->tx_free_thresh = tx_free_thresh;
 	txq->queue_id = vport->chunks_info.tx_start_qid + queue_idx;
 	txq->port_id = dev->data->port_id;
@@ -468,6 +469,15 @@ idpf_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 		goto err_sw_ring_alloc;
 	}
 
+	txq->rs_last_id = rte_zmalloc_socket("idpf tx rs_last_id",
+			sizeof(txq->rs_last_id[0]) * (nb_desc >> txq->log2_rs_thresh),
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (txq->rs_last_id == NULL) {
+		PMD_INIT_LOG(ERR, "Failed to allocate memory for TX RS tracking");
+		ret = -ENOMEM;
+		goto err_rs_last_id_alloc;
+	}
+
 	if (!is_splitq) {
 		txq->ci_tx_ring = mz->addr;
 		idpf_qc_single_tx_queue_reset(txq);
@@ -490,6 +500,9 @@ idpf_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 	return 0;
 
 err_complq_setup:
+	rte_free(txq->rs_last_id);
+err_rs_last_id_alloc:
+	rte_free(txq->sw_ring);
 err_sw_ring_alloc:
 	idpf_dma_zone_release(mz);
 err_mz_reserve:
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [RFC PATCH 25/27] net/ixgbe: use separate array for desc status tracking
  2025-12-19 17:25 [RFC PATCH 00/27] combine multiple Intel scalar Tx paths Bruce Richardson
                   ` (23 preceding siblings ...)
  2025-12-19 17:25 ` [RFC PATCH 24/27] net/intel: use separate array for desc status tracking Bruce Richardson
@ 2025-12-19 17:25 ` Bruce Richardson
  2025-12-19 17:25 ` [RFC PATCH 26/27] net/intel: drop unused Tx queue used count Bruce Richardson
                   ` (5 subsequent siblings)
  30 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2025-12-19 17:25 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov, Vladimir Medvedkin

Due to significant differences in the ixgbe transmit descriptors, the
ixgbe driver does not use the common scalar Tx functionality. Update the
driver directly so its use of the rs_last_id array matches that of the
common Tx code.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/ixgbe/ixgbe_rxtx.c | 86 +++++++++++++++-------------
 1 file changed, 47 insertions(+), 39 deletions(-)

diff --git a/drivers/net/intel/ixgbe/ixgbe_rxtx.c b/drivers/net/intel/ixgbe/ixgbe_rxtx.c
index a7583c178a..3eeec220fd 100644
--- a/drivers/net/intel/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/intel/ixgbe/ixgbe_rxtx.c
@@ -43,6 +43,7 @@
 #include <rte_ip.h>
 #include <rte_net.h>
 #include <rte_vect.h>
+#include <rte_bitops.h>
 
 #include "ixgbe_logs.h"
 #include "base/ixgbe_api.h"
@@ -571,57 +572,35 @@ tx_desc_ol_flags_to_cmdtype(uint64_t ol_flags)
 static inline int
 ixgbe_xmit_cleanup(struct ci_tx_queue *txq)
 {
-	struct ci_tx_entry *sw_ring = txq->sw_ring;
 	volatile union ixgbe_adv_tx_desc *txr = txq->ixgbe_tx_ring;
-	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
-	uint16_t nb_tx_desc = txq->nb_tx_desc;
-	uint16_t desc_to_clean_to;
-	uint16_t nb_tx_to_clean;
-	uint32_t status;
+	const uint16_t last_desc_cleaned = txq->last_desc_cleaned;
+	const uint16_t nb_tx_desc = txq->nb_tx_desc;
 
-	/* Determine the last descriptor needing to be cleaned */
-	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
-	if (desc_to_clean_to >= nb_tx_desc)
-		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
+	const uint16_t rs_idx = (last_desc_cleaned == nb_tx_desc - 1) ?
+			0 :
+			(last_desc_cleaned + 1) >> txq->log2_rs_thresh;
+	uint16_t desc_to_clean_to = (rs_idx << txq->log2_rs_thresh) + (txq->tx_rs_thresh - 1);
 
-	/* Check to make sure the last descriptor to clean is done */
-	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-	status = txr[desc_to_clean_to].wb.status;
+	uint32_t status = txr[txq->rs_last_id[rs_idx]].wb.status;
 	if (!(status & rte_cpu_to_le_32(IXGBE_TXD_STAT_DD))) {
 		PMD_TX_LOG(DEBUG,
 			   "TX descriptor %4u is not done"
 			   "(port=%d queue=%d)",
-			   desc_to_clean_to,
+			   txq->rs_last_id[rs_idx],
 			   txq->port_id, txq->queue_id);
 		/* Failed to clean any descriptors, better luck next time */
 		return -(1);
 	}
 
-	/* Figure out how many descriptors will be cleaned */
-	if (last_desc_cleaned > desc_to_clean_to)
-		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) +
-							desc_to_clean_to);
-	else
-		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
-						last_desc_cleaned);
-
 	PMD_TX_LOG(DEBUG,
 		   "Cleaning %4u TX descriptors: %4u to %4u "
 		   "(port=%d queue=%d)",
-		   nb_tx_to_clean, last_desc_cleaned, desc_to_clean_to,
+		   txq->tx_rs_thresh, last_desc_cleaned, desc_to_clean_to,
 		   txq->port_id, txq->queue_id);
 
-	/*
-	 * The last descriptor to clean is done, so that means all the
-	 * descriptors from the last descriptor that was cleaned
-	 * up to the last descriptor with the RS bit set
-	 * are done. Only reset the threshold descriptor.
-	 */
-	txr[desc_to_clean_to].wb.status = 0;
-
 	/* Update the txq to reflect the last descriptor that was cleaned */
 	txq->last_desc_cleaned = desc_to_clean_to;
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
+	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + txq->tx_rs_thresh);
 
 	/* No Error */
 	return 0;
@@ -749,6 +728,9 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		if (tx_last >= txq->nb_tx_desc)
 			tx_last = (uint16_t) (tx_last - txq->nb_tx_desc);
 
+		/* Track the RS threshold bucket at packet start */
+		uint16_t pkt_rs_idx = (uint16_t)(tx_id >> txq->log2_rs_thresh);
+
 		PMD_TX_LOG(DEBUG, "port_id=%u queue_id=%u pktlen=%u"
 			   " tx_first=%u tx_last=%u",
 			   (unsigned) txq->port_id,
@@ -876,7 +858,6 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 					tx_offload,
 					rte_security_dynfield(tx_pkt));
 
-				txe->last_id = tx_last;
 				tx_id = txe->next_id;
 				txe = txn;
 			}
@@ -922,7 +903,6 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 				rte_cpu_to_le_32(cmd_type_len | slen);
 			txd->read.olinfo_status =
 				rte_cpu_to_le_32(olinfo_status);
-			txe->last_id = tx_last;
 			tx_id = txe->next_id;
 			txe = txn;
 			m_seg = m_seg->next;
@@ -935,8 +915,18 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
-		/* Set RS bit only on threshold packets' last descriptor */
-		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
+		/*
+		 * Check if packet crosses into a new RS threshold bucket.
+		 * The RS bit is set on the last descriptor when we move from one bucket to another.
+		 * For example, with tx_rs_thresh=32 and a 5-descriptor packet using slots 30-34:
+		 *   - pkt_rs_idx = 30 >> 5 = 0 (started in bucket 0)
+		 *   - tx_last = 34, so 35 >> 5 = 1 (next packet is in bucket 1)
+		 *   - Since 0 != 1, set RS bit on descriptor 34, and record rs_last_id[0] = 34
+		 */
+		uint16_t next_rs_idx = ((tx_last + 1) >> txq->log2_rs_thresh);
+
+		if (next_rs_idx != pkt_rs_idx) {
+			/* Packet crossed into a new bucket - set RS bit on last descriptor */
 			PMD_TX_LOG(DEBUG,
 				   "Setting RS bit on TXD id="
 				   "%4u (port=%d queue=%d)",
@@ -944,9 +934,8 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 			cmd_type_len |= IXGBE_TXD_CMD_RS;
 
-			/* Update txq RS bit counters */
-			txq->nb_tx_used = 0;
-			txp = NULL;
+			/* Record the last descriptor ID for the bucket we're leaving */
+			txq->rs_last_id[pkt_rs_idx] = tx_last;
 		} else
 			txp = txd;
 
@@ -2521,6 +2510,7 @@ ixgbe_tx_queue_release(struct ci_tx_queue *txq)
 	if (txq != NULL && txq->ops != NULL) {
 		ci_txq_release_all_mbufs(txq, false);
 		txq->ops->free_swring(txq);
+		rte_free(txq->rs_last_id);
 		rte_memzone_free(txq->mz);
 		rte_free(txq);
 	}
@@ -2825,6 +2815,13 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev,
 			     (int)dev->data->port_id, (int)queue_idx);
 		return -(EINVAL);
 	}
+	if (!rte_is_power_of_2(tx_rs_thresh)) {
+		PMD_INIT_LOG(ERR, "tx_rs_thresh must be a power of 2. (tx_rs_thresh=%u port=%d queue=%d)",
+			     (unsigned int)tx_rs_thresh,
+			     (int)dev->data->port_id,
+			     (int)queue_idx);
+		return -(EINVAL);
+	}
 
 	/*
 	 * If rs_bit_thresh is greater than 1, then TX WTHRESH should be
@@ -2870,6 +2867,7 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	txq->mz = tz;
 	txq->nb_tx_desc = nb_desc;
 	txq->tx_rs_thresh = tx_rs_thresh;
+	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
 	txq->tx_free_thresh = tx_free_thresh;
 	txq->pthresh = tx_conf->tx_thresh.pthresh;
 	txq->hthresh = tx_conf->tx_thresh.hthresh;
@@ -2911,6 +2909,16 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	PMD_INIT_LOG(DEBUG, "sw_ring=%p hw_ring=%p dma_addr=0x%"PRIx64,
 		     txq->sw_ring, txq->ixgbe_tx_ring, txq->tx_ring_dma);
 
+	/* Allocate RS last_id tracking array */
+	uint16_t num_rs_buckets = nb_desc / tx_rs_thresh;
+	txq->rs_last_id = rte_zmalloc_socket(NULL, sizeof(txq->rs_last_id[0]) * num_rs_buckets,
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (txq->rs_last_id == NULL) {
+		ixgbe_tx_queue_release(txq);
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for RS last_id array");
+		return -ENOMEM;
+	}
+
 	/* set up vector or scalar TX function as appropriate */
 	ixgbe_set_tx_function(dev, txq);
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [RFC PATCH 26/27] net/intel: drop unused Tx queue used count
  2025-12-19 17:25 [RFC PATCH 00/27] combine multiple Intel scalar Tx paths Bruce Richardson
                   ` (24 preceding siblings ...)
  2025-12-19 17:25 ` [RFC PATCH 25/27] net/ixgbe: " Bruce Richardson
@ 2025-12-19 17:25 ` Bruce Richardson
  2025-12-19 17:25 ` [RFC PATCH 27/27] net/intel: remove index for tracking end of packet Bruce Richardson
                   ` (4 subsequent siblings)
  30 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2025-12-19 17:25 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Vladimir Medvedkin, Anatoly Burakov,
	Jingjing Wu, Praveen Shetty

Since drivers now track the setting of the RS bit based on fixed
thresholds rather than after a fixed number of descriptors, we no longer
need to track the number of descriptors used from one call to another.
Therefore we can remove the tx_used value in the Tx queue structure.

This value was still being used inside the IDPF splitq scalar code,
however, the ipdf driver-specific section of the Tx queue structure also
had an rs_compl_count value that was only used for the vector code
paths, so we can use it to replace the old tx_used value in the scalar
path.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h                   | 1 -
 drivers/net/intel/common/tx_scalar_fns.h        | 1 -
 drivers/net/intel/i40e/i40e_rxtx.c              | 1 -
 drivers/net/intel/iavf/iavf_rxtx.c              | 1 -
 drivers/net/intel/ice/ice_dcf_ethdev.c          | 1 -
 drivers/net/intel/ice/ice_rxtx.c                | 1 -
 drivers/net/intel/idpf/idpf_common_rxtx.c       | 8 +++-----
 drivers/net/intel/ixgbe/ixgbe_rxtx.c            | 8 --------
 drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c | 1 -
 9 files changed, 3 insertions(+), 20 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 9b3f8385e6..3976766f06 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -130,7 +130,6 @@ struct ci_tx_queue {
 	uint16_t *rs_last_id;
 	uint16_t nb_tx_desc;           /* number of TX descriptors */
 	uint16_t tx_tail; /* current value of tail register */
-	uint16_t nb_tx_used; /* number of TX desc used since RS bit set */
 	/* index to last TX descriptor to have been cleaned */
 	uint16_t last_desc_cleaned;
 	/* Total number of TX descriptors ready to be allocated. */
diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index 3d0a23eda3..27a5dafefc 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -404,7 +404,6 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			m_seg = m_seg->next;
 		} while (m_seg);
 end_pkt:
-		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
 		/* Check if packet crosses into a new RS threshold bucket.
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 1fadd0407a..e1226d649b 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -2632,7 +2632,6 @@ i40e_reset_tx_queue(struct ci_tx_queue *txq)
 	txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 
 	txq->last_desc_cleaned = (uint16_t)(txq->nb_tx_desc - 1);
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_desc - 1);
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index 4517d55011..9cac6e8841 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -298,7 +298,6 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	}
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 
 	txq->last_desc_cleaned = txq->nb_tx_desc - 1;
 	txq->nb_tx_free = txq->nb_tx_desc - 1;
diff --git a/drivers/net/intel/ice/ice_dcf_ethdev.c b/drivers/net/intel/ice/ice_dcf_ethdev.c
index 4ceecc15c6..02a23629d6 100644
--- a/drivers/net/intel/ice/ice_dcf_ethdev.c
+++ b/drivers/net/intel/ice/ice_dcf_ethdev.c
@@ -414,7 +414,6 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	}
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 
 	txq->last_desc_cleaned = txq->nb_tx_desc - 1;
 	txq->nb_tx_free = txq->nb_tx_desc - 1;
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index a6a454ddf5..092981f452 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -1127,7 +1127,6 @@ ice_reset_tx_queue(struct ci_tx_queue *txq)
 	txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 
 	txq->last_desc_cleaned = (uint16_t)(txq->nb_tx_desc - 1);
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_desc - 1);
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 1d123f6350..b36e29c8d2 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -224,7 +224,6 @@ idpf_qc_split_tx_descq_reset(struct ci_tx_queue *txq)
 	}
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 
 	/* Use this as next to clean for split desc queue */
 	txq->last_desc_cleaned = 0;
@@ -284,7 +283,6 @@ idpf_qc_single_tx_queue_reset(struct ci_tx_queue *txq)
 	}
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 
 	txq->last_desc_cleaned = txq->nb_tx_desc - 1;
 	txq->nb_tx_free = txq->nb_tx_desc - 1;
@@ -993,12 +991,12 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txd->qw1.cmd_dtype |= IDPF_TXD_FLEX_FLOW_CMD_EOP;
 
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
-		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
+		txq->rs_compl_count += nb_used;
 
-		if (txq->nb_tx_used >= 32) {
+		if (txq->rs_compl_count >= 32) {
 			txd->qw1.cmd_dtype |= IDPF_TXD_FLEX_FLOW_CMD_RE;
 			/* Update txq RE bit counters */
-			txq->nb_tx_used = 0;
+			txq->rs_compl_count = 0;
 		}
 	}
 
diff --git a/drivers/net/intel/ixgbe/ixgbe_rxtx.c b/drivers/net/intel/ixgbe/ixgbe_rxtx.c
index 3eeec220fd..6b8ff20f61 100644
--- a/drivers/net/intel/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/intel/ixgbe/ixgbe_rxtx.c
@@ -708,12 +708,6 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		 */
 		nb_used = (uint16_t)(tx_pkt->nb_segs + new_ctx);
 
-		if (txp != NULL &&
-				nb_used + txq->nb_tx_used >= txq->tx_rs_thresh)
-			/* set RS on the previous packet in the burst */
-			txp->read.cmd_type_len |=
-				rte_cpu_to_le_32(IXGBE_TXD_CMD_RS);
-
 		/*
 		 * The number of descriptors that must be allocated for a
 		 * packet is the number of segments of that packet, plus 1
@@ -912,7 +906,6 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		 * The last packet data descriptor needs End Of Packet (EOP)
 		 */
 		cmd_type_len |= IXGBE_TXD_CMD_EOP;
-		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
 		/*
@@ -2551,7 +2544,6 @@ ixgbe_reset_tx_queue(struct ci_tx_queue *txq)
 	txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 	/*
 	 * Always allow 1 descriptor to be un-allocated to avoid
 	 * a H/W race condition
diff --git a/drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c b/drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c
index eb7c79eaf9..63c7cb50d3 100644
--- a/drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c
+++ b/drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c
@@ -47,7 +47,6 @@ ixgbe_reset_tx_queue_vec(struct ci_tx_queue *txq)
 	txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 	/*
 	 * Always allow 1 descriptor to be un-allocated to avoid
 	 * a H/W race condition
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [RFC PATCH 27/27] net/intel: remove index for tracking end of packet
  2025-12-19 17:25 [RFC PATCH 00/27] combine multiple Intel scalar Tx paths Bruce Richardson
                   ` (25 preceding siblings ...)
  2025-12-19 17:25 ` [RFC PATCH 26/27] net/intel: drop unused Tx queue used count Bruce Richardson
@ 2025-12-19 17:25 ` Bruce Richardson
  2025-12-20  9:05   ` Morten Brørup
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
                   ` (3 subsequent siblings)
  30 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2025-12-19 17:25 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Vladimir Medvedkin, Anatoly Burakov,
	Jingjing Wu, Praveen Shetty

The last_id value in each tx_sw_queue entry was no longer used in the
datapath, remove it and its initialization. For the function releasing
packets back, rather than relying on "last_id" to identify end of
packet, instead check for the next pointer being NULL.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h             | 1 -
 drivers/net/intel/i40e/i40e_rxtx.c        | 8 +++-----
 drivers/net/intel/iavf/iavf_rxtx.c        | 9 ++++-----
 drivers/net/intel/ice/ice_dcf_ethdev.c    | 1 -
 drivers/net/intel/ice/ice_rxtx.c          | 9 ++++-----
 drivers/net/intel/idpf/idpf_common_rxtx.c | 2 --
 drivers/net/intel/ixgbe/ixgbe_rxtx.c      | 9 ++++-----
 7 files changed, 15 insertions(+), 24 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 3976766f06..2d3626cbda 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -104,7 +104,6 @@ struct ci_tx_queue;
 struct ci_tx_entry {
 	struct rte_mbuf *mbuf; /* mbuf associated with TX desc, if any. */
 	uint16_t next_id; /* Index of next descriptor in ring. */
-	uint16_t last_id; /* Index of last scattered descriptor. */
 };
 
 /**
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index e1226d649b..9aa31d6168 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -2523,14 +2523,13 @@ i40e_tx_done_cleanup_full(struct ci_tx_queue *txq,
 			pkt_cnt < free_cnt &&
 			tx_id != tx_last; i++) {
 			if (swr_ring[tx_id].mbuf != NULL) {
-				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
-				swr_ring[tx_id].mbuf = NULL;
-
 				/*
 				 * last segment in the packet,
 				 * increment packet count
 				 */
-				pkt_cnt += (swr_ring[tx_id].last_id == tx_id);
+				pkt_cnt += (swr_ring[tx_id].mbuf->next == NULL) ? 1 : 0;
+				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
+				swr_ring[tx_id].mbuf = NULL;
 			}
 
 			tx_id = swr_ring[tx_id].next_id;
@@ -2623,7 +2622,6 @@ i40e_reset_tx_queue(struct ci_tx_queue *txq)
 		txd->cmd_type_offset_bsz =
 			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
-		txe[i].last_id = i;
 		txe[prev].next_id = i;
 		prev = i;
 	}
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index 9cac6e8841..558ae2598f 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -292,7 +292,6 @@ reset_tx_queue(struct ci_tx_queue *txq)
 		txq->ci_tx_ring[i].cmd_type_offset_bsz =
 			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
-		txe[i].last_id = i;
 		txe[prev].next_id = i;
 		prev = i;
 	}
@@ -4002,14 +4001,14 @@ iavf_tx_done_cleanup_full(struct ci_tx_queue *txq,
 	while (pkt_cnt < free_cnt) {
 		do {
 			if (swr_ring[tx_id].mbuf != NULL) {
-				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
-				swr_ring[tx_id].mbuf = NULL;
-
 				/*
 				 * last segment in the packet,
 				 * increment packet count
 				 */
-				pkt_cnt += (swr_ring[tx_id].last_id == tx_id);
+				pkt_cnt += (swr_ring[tx_id].mbuf->next == NULL) ? 1 : 0;
+				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
+				swr_ring[tx_id].mbuf = NULL;
+
 			}
 
 			tx_id = swr_ring[tx_id].next_id;
diff --git a/drivers/net/intel/ice/ice_dcf_ethdev.c b/drivers/net/intel/ice/ice_dcf_ethdev.c
index 02a23629d6..abd7875e7b 100644
--- a/drivers/net/intel/ice/ice_dcf_ethdev.c
+++ b/drivers/net/intel/ice/ice_dcf_ethdev.c
@@ -408,7 +408,6 @@ reset_tx_queue(struct ci_tx_queue *txq)
 		txq->ci_tx_ring[i].cmd_type_offset_bsz =
 			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
-		txe[i].last_id = i;
 		txe[prev].next_id = i;
 		prev = i;
 	}
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 092981f452..d11d9054f2 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -1118,7 +1118,6 @@ ice_reset_tx_queue(struct ci_tx_queue *txq)
 		txd->cmd_type_offset_bsz =
 			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
-		txe[i].last_id = i;
 		txe[prev].next_id = i;
 		prev = i;
 	}
@@ -3190,14 +3189,14 @@ ice_tx_done_cleanup_full(struct ci_tx_queue *txq,
 			pkt_cnt < free_cnt &&
 			tx_id != tx_last; i++) {
 			if (swr_ring[tx_id].mbuf != NULL) {
-				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
-				swr_ring[tx_id].mbuf = NULL;
-
 				/*
 				 * last segment in the packet,
 				 * increment packet count
 				 */
-				pkt_cnt += (swr_ring[tx_id].last_id == tx_id);
+				pkt_cnt += swr_ring[tx_id].mbuf->next == NULL ? 1 : 0;
+				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
+				swr_ring[tx_id].mbuf = NULL;
+
 			}
 
 			tx_id = swr_ring[tx_id].next_id;
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index b36e29c8d2..781310e564 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -218,7 +218,6 @@ idpf_qc_split_tx_descq_reset(struct ci_tx_queue *txq)
 	prev = (uint16_t)(txq->sw_nb_desc - 1);
 	for (i = 0; i < txq->sw_nb_desc; i++) {
 		txe[i].mbuf = NULL;
-		txe[i].last_id = i;
 		txe[prev].next_id = i;
 		prev = i;
 	}
@@ -277,7 +276,6 @@ idpf_qc_single_tx_queue_reset(struct ci_tx_queue *txq)
 		txq->ci_tx_ring[i].cmd_type_offset_bsz =
 			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
-		txe[i].last_id = i;
 		txe[prev].next_id = i;
 		prev = i;
 	}
diff --git a/drivers/net/intel/ixgbe/ixgbe_rxtx.c b/drivers/net/intel/ixgbe/ixgbe_rxtx.c
index 6b8ff20f61..5f4bee4f2f 100644
--- a/drivers/net/intel/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/intel/ixgbe/ixgbe_rxtx.c
@@ -2407,14 +2407,14 @@ ixgbe_tx_done_cleanup_full(struct ci_tx_queue *txq, uint32_t free_cnt)
 			pkt_cnt < free_cnt &&
 			tx_id != tx_last; i++) {
 			if (swr_ring[tx_id].mbuf != NULL) {
-				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
-				swr_ring[tx_id].mbuf = NULL;
-
 				/*
 				 * last segment in the packet,
 				 * increment packet count
 				 */
-				pkt_cnt += (swr_ring[tx_id].last_id == tx_id);
+				pkt_cnt += swr_ring[tx_id].mbuf->next == NULL ? 1 : 0;
+				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
+				swr_ring[tx_id].mbuf = NULL;
+
 			}
 
 			tx_id = swr_ring[tx_id].next_id;
@@ -2535,7 +2535,6 @@ ixgbe_reset_tx_queue(struct ci_tx_queue *txq)
 
 		txd->wb.status = rte_cpu_to_le_32(IXGBE_TXD_STAT_DD);
 		txe[i].mbuf = NULL;
-		txe[i].last_id = i;
 		txe[prev].next_id = i;
 		prev = i;
 	}
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* RE: [RFC PATCH 20/27] net/intel: write descriptors using non-volatile pointers
  2025-12-19 17:25 ` [RFC PATCH 20/27] net/intel: write descriptors using non-volatile pointers Bruce Richardson
@ 2025-12-20  8:43   ` Morten Brørup
  2025-12-22  9:50     ` Bruce Richardson
  0 siblings, 1 reply; 274+ messages in thread
From: Morten Brørup @ 2025-12-20  8:43 UTC (permalink / raw)
  To: Bruce Richardson, dev

> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> Sent: Friday, 19 December 2025 18.26
> 
> Use a non-volatile uint64_t pointer to store to the descriptor ring.
> This will allow the compiler to optionally merge the stores as it sees
> best.

I suppose there was a reason for the volatile.
Is removing it really safe?
E.g. this will also allow the compiler to reorder stores; not just the pair of 64-bits, but also stores to multiple descriptors.

One more comment inline below.

> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>  drivers/net/intel/common/tx_scalar_fns.h | 24 ++++++++++++++++--------
>  1 file changed, 16 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/intel/common/tx_scalar_fns.h
> b/drivers/net/intel/common/tx_scalar_fns.h
> index 7b643fcf44..95e9acbe60 100644
> --- a/drivers/net/intel/common/tx_scalar_fns.h
> +++ b/drivers/net/intel/common/tx_scalar_fns.h
> @@ -184,6 +184,15 @@ struct ci_timesstamp_queue_fns {
>  	write_ts_tail_t write_ts_tail;
>  };
> 
> +static inline void
> +write_txd(volatile void *txd, uint64_t qw0, uint64_t qw1)
> +{
> +	uint64_t *txd_qw = RTE_CAST_PTR(void *, txd);

If the descriptors are 16-byte aligned, you could mark them as such, so the compiler can use 128-bit stores on architectures where alignment matters.

> +
> +	txd_qw[0] = rte_cpu_to_le_64(qw0);
> +	txd_qw[1] = rte_cpu_to_le_64(qw1);
> +}
> +
>  static inline uint16_t
>  ci_xmit_pkts(struct ci_tx_queue *txq,
>  	     struct rte_mbuf **tx_pkts,
> @@ -313,8 +322,7 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
>  				txe->mbuf = NULL;
>  			}
> 
> -			ctx_txd[0] = cd_qw0;
> -			ctx_txd[1] = cd_qw1;
> +			write_txd(ctx_txd, cd_qw0, cd_qw1);
> 
>  			txe->last_id = tx_last;
>  			tx_id = txe->next_id;
> @@ -361,12 +369,12 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
> 
>  			while ((ol_flags & (RTE_MBUF_F_TX_TCP_SEG |
> RTE_MBUF_F_TX_UDP_SEG)) &&
>  					unlikely(slen > CI_MAX_DATA_PER_TXD)) {
> -				txd->buffer_addr =
> rte_cpu_to_le_64(buf_dma_addr);
> -				txd->cmd_type_offset_bsz =
> rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
> +				const uint64_t cmd_type_offset_bsz =
> CI_TX_DESC_DTYPE_DATA |
>  					((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
>  					((uint64_t)td_offset <<
> CI_TXD_QW1_OFFSET_S) |
>  					((uint64_t)CI_MAX_DATA_PER_TXD <<
> CI_TXD_QW1_TX_BUF_SZ_S) |
> -					((uint64_t)td_tag <<
> CI_TXD_QW1_L2TAG1_S));
> +					((uint64_t)td_tag <<
> CI_TXD_QW1_L2TAG1_S);
> +				write_txd(txd, buf_dma_addr,
> cmd_type_offset_bsz);
> 
>  				buf_dma_addr += CI_MAX_DATA_PER_TXD;
>  				slen -= CI_MAX_DATA_PER_TXD;
> @@ -382,12 +390,12 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
>  			if (m_seg->next == NULL)
>  				td_cmd |= CI_TX_DESC_CMD_EOP;
> 
> -			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
> -			txd->cmd_type_offset_bsz =
> rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
> +			const uint64_t cmd_type_offset_bsz =
> CI_TX_DESC_DTYPE_DATA |
>  				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
>  				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
>  				((uint64_t)slen << CI_TXD_QW1_TX_BUF_SZ_S) |
> -				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
> +				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S);
> +			write_txd(txd, buf_dma_addr, cmd_type_offset_bsz);
> 
>  			txe->last_id = tx_last;
>  			tx_id = txe->next_id;
> --
> 2.51.0


^ permalink raw reply	[flat|nested] 274+ messages in thread

* RE: [RFC PATCH 27/27] net/intel: remove index for tracking end of packet
  2025-12-19 17:25 ` [RFC PATCH 27/27] net/intel: remove index for tracking end of packet Bruce Richardson
@ 2025-12-20  9:05   ` Morten Brørup
  0 siblings, 0 replies; 274+ messages in thread
From: Morten Brørup @ 2025-12-20  9:05 UTC (permalink / raw)
  To: Bruce Richardson, dev
  Cc: Vladimir Medvedkin, Anatoly Burakov, Jingjing Wu, Praveen Shetty

> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> Sent: Friday, 19 December 2025 18.26
> 
> The last_id value in each tx_sw_queue entry was no longer used in the
> datapath, remove it and its initialization. For the function releasing
> packets back, rather than relying on "last_id" to identify end of
> packet, instead check for the next pointer being NULL.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---

[...]

> @@ -2523,14 +2523,13 @@ i40e_tx_done_cleanup_full(struct ci_tx_queue
> *txq,
>  			pkt_cnt < free_cnt &&
>  			tx_id != tx_last; i++) {
>  			if (swr_ring[tx_id].mbuf != NULL) {
> -				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
> -				swr_ring[tx_id].mbuf = NULL;
> -
>  				/*
>  				 * last segment in the packet,
>  				 * increment packet count
>  				 */
> -				pkt_cnt += (swr_ring[tx_id].last_id == tx_id);
> +				pkt_cnt += (swr_ring[tx_id].mbuf->next == NULL)
> ? 1 : 0;

Note to reviewers:
Dereferencing the mbuf (instead of checking last_id) does not add a potential cache miss, because rte_pktmbuf_free_seg() dereferences it anyway.

> +				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
> +				swr_ring[tx_id].mbuf = NULL;
>  			}
> 
>  			tx_id = swr_ring[tx_id].next_id;


^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [RFC PATCH 20/27] net/intel: write descriptors using non-volatile pointers
  2025-12-20  8:43   ` Morten Brørup
@ 2025-12-22  9:50     ` Bruce Richardson
  0 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2025-12-22  9:50 UTC (permalink / raw)
  To: Morten Brørup; +Cc: dev

On Sat, Dec 20, 2025 at 09:43:46AM +0100, Morten Brørup wrote:
> > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > Sent: Friday, 19 December 2025 18.26
> > 
> > Use a non-volatile uint64_t pointer to store to the descriptor ring.
> > This will allow the compiler to optionally merge the stores as it sees
> > best.
> 
> I suppose there was a reason for the volatile.
> Is removing it really safe?
> E.g. this will also allow the compiler to reorder stores; not just the pair of 64-bits, but also stores to multiple descriptors.
> 

It would be more risky to remove for reads than for writes, I believe,
since when reading we have the possibility of the NIC doing stores to the
descriptors at the same time. In the case of writing new descriptors we
know that the NIC will never read the descriptors until such time as we hit
the doorball/write the tail value. Therefore, so long as we have a fence
before the tail write, (or as part of the tail write), it doesn't matter
what actual order the descriptor stores hit the ring.

> One more comment inline below.
> 
> > 
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > ---
> >  drivers/net/intel/common/tx_scalar_fns.h | 24 ++++++++++++++++--------
> >  1 file changed, 16 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/net/intel/common/tx_scalar_fns.h
> > b/drivers/net/intel/common/tx_scalar_fns.h
> > index 7b643fcf44..95e9acbe60 100644
> > --- a/drivers/net/intel/common/tx_scalar_fns.h
> > +++ b/drivers/net/intel/common/tx_scalar_fns.h
> > @@ -184,6 +184,15 @@ struct ci_timesstamp_queue_fns {
> >  	write_ts_tail_t write_ts_tail;
> >  };
> > 
> > +static inline void
> > +write_txd(volatile void *txd, uint64_t qw0, uint64_t qw1)
> > +{
> > +	uint64_t *txd_qw = RTE_CAST_PTR(void *, txd);
> 
> If the descriptors are 16-byte aligned, you could mark them as such, so the compiler can use 128-bit stores on architectures where alignment matters.
> 
Sure, I can try adding an aligned tag to this.

/Bruce

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [PATCH v2 00/36] combine multiple Intel scalar Tx paths
  2025-12-19 17:25 [RFC PATCH 00/27] combine multiple Intel scalar Tx paths Bruce Richardson
                   ` (26 preceding siblings ...)
  2025-12-19 17:25 ` [RFC PATCH 27/27] net/intel: remove index for tracking end of packet Bruce Richardson
@ 2026-01-13 15:14 ` Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 01/36] net/intel: create common Tx descriptor structure Bruce Richardson
                     ` (37 more replies)
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
                   ` (2 subsequent siblings)
  30 siblings, 38 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-13 15:14 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

The scalar Tx paths, with support for offloads and multiple mbufs
per packet, are almost identical across drivers ice, i40e, iavf and
the single-queue mode of idpf. Therefore, we can do some rework to
combine these code paths into a single function which is parameterized
by compile-time constants, allowing code saving to give us a single
path to optimize and maintain - apart from edge cases like IPSec
support in iavf.

The ixgbe driver has a number of similarities too, which we take
advantage of where we can, but the overall descriptor format is
sufficiently different that its main scalar code path is kept
separate.

Once merged, we can then optimize the drivers a bit to improve
performance, and also easily extend some drivers to use additional
paths for better performance, e.g. add the "simple scalar" path
to IDPF driver for better performance on platforms without AVX.

V2:
 - reworked the simple-scalar path as well as full scalar one
 - added simple scalar path support to idpf driver
 - small cleanups, e.g. issues flagged by checkpatch

Bruce Richardson (36):
  net/intel: create common Tx descriptor structure
  net/intel: use common Tx ring structure
  net/intel: create common post-Tx cleanup function
  net/intel: consolidate definitions for Tx desc fields
  net/intel: create separate header for Tx scalar fns
  net/intel: add common fn to calculate needed descriptors
  net/ice: refactor context descriptor handling
  net/i40e: refactor context descriptor handling
  net/idpf: refactor context descriptor handling
  net/intel: consolidate checksum mask definition
  net/intel: create common checksum Tx offload function
  net/intel: create a common scalar Tx function
  net/i40e: use common scalar Tx function
  net/intel: add IPsec hooks to common Tx function
  net/intel: support configurable VLAN tag insertion on Tx
  net/iavf: use common scalar Tx function
  net/i40e: document requirement for QinQ support
  net/idpf: use common scalar Tx function
  net/intel: avoid writing the final pkt descriptor twice
  eal: add macro for marking assumed alignment
  net/intel: write descriptors using non-volatile pointers
  net/intel: remove unnecessary flag clearing
  net/intel: mark mid-burst ring cleanup as unlikely
  net/intel: add special handling for single desc packets
  net/intel: use separate array for desc status tracking
  net/ixgbe: use separate array for desc status tracking
  net/intel: drop unused Tx queue used count
  net/intel: remove index for tracking end of packet
  net/intel: merge ring writes in simple Tx for ice and i40e
  net/intel: consolidate ice and i40e buffer free function
  net/intel: complete merging simple Tx paths
  net/intel: use non-volatile stores in simple Tx function
  net/intel: align scalar simple Tx path with vector logic
  net/intel: use vector SW ring entry for simple path
  net/intel: use vector mbuf cleanup from simple scalar path
  net/idpf: enable simple Tx function

 doc/guides/nics/i40e.rst                      |  18 +
 drivers/net/intel/common/tx.h                 | 116 ++-
 drivers/net/intel/common/tx_scalar_fns.h      | 595 ++++++++++++++
 drivers/net/intel/cpfl/cpfl_rxtx.c            |   8 +-
 drivers/net/intel/i40e/i40e_fdir.c            |  34 +-
 drivers/net/intel/i40e/i40e_rxtx.c            | 670 +++-------------
 drivers/net/intel/i40e/i40e_rxtx.h            |  16 -
 .../net/intel/i40e/i40e_rxtx_vec_altivec.c    |  25 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c   |  36 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c |  52 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_common.h |   6 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_neon.c   |  25 +-
 drivers/net/intel/iavf/iavf_rxtx.c            | 642 ++++-----------
 drivers/net/intel/iavf/iavf_rxtx.h            |  30 +-
 drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c   |  55 +-
 drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c | 104 +--
 drivers/net/intel/iavf/iavf_rxtx_vec_common.h |  36 +-
 drivers/net/intel/ice/ice_dcf_ethdev.c        |  10 +-
 drivers/net/intel/ice/ice_rxtx.c              | 737 ++++--------------
 drivers/net/intel/ice/ice_rxtx.h              |  15 -
 drivers/net/intel/ice/ice_rxtx_vec_avx2.c     |  55 +-
 drivers/net/intel/ice/ice_rxtx_vec_avx512.c   |  53 +-
 drivers/net/intel/ice/ice_rxtx_vec_common.h   |  43 +-
 drivers/net/intel/idpf/idpf_common_device.h   |   2 +
 drivers/net/intel/idpf/idpf_common_rxtx.c     | 315 ++------
 drivers/net/intel/idpf/idpf_common_rxtx.h     |  24 +-
 .../net/intel/idpf/idpf_common_rxtx_avx2.c    |  53 +-
 .../net/intel/idpf/idpf_common_rxtx_avx512.c  |  55 +-
 drivers/net/intel/idpf/idpf_rxtx.c            |  43 +-
 drivers/net/intel/idpf/idpf_rxtx_vec_common.h |   6 +-
 drivers/net/intel/ixgbe/ixgbe_rxtx.c          | 103 ++-
 .../net/intel/ixgbe/ixgbe_rxtx_vec_common.c   |   3 +-
 lib/eal/include/rte_common.h                  |   6 +
 33 files changed, 1565 insertions(+), 2426 deletions(-)
 create mode 100644 drivers/net/intel/common/tx_scalar_fns.h

--
2.51.0


^ permalink raw reply	[flat|nested] 274+ messages in thread

* [PATCH v2 01/36] net/intel: create common Tx descriptor structure
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
@ 2026-01-13 15:14   ` Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 02/36] net/intel: use common Tx ring structure Bruce Richardson
                     ` (36 subsequent siblings)
  37 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-13 15:14 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Praveen Shetty, Vladimir Medvedkin,
	Anatoly Burakov, Jingjing Wu

The Tx descriptors used by the i40e, iavf, ice and idpf drivers are all
identical 16-byte descriptors, so define a common struct for them. Since
original struct definitions are in base code, leave them in place, but
only use the new structs in DPDK code.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h                 | 16 ++++++---
 drivers/net/intel/cpfl/cpfl_rxtx.c            |  2 +-
 drivers/net/intel/i40e/i40e_fdir.c            |  4 +--
 drivers/net/intel/i40e/i40e_rxtx.c            | 26 +++++++-------
 .../net/intel/i40e/i40e_rxtx_vec_altivec.c    |  6 ++--
 drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c   |  6 ++--
 drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c |  6 ++--
 drivers/net/intel/i40e/i40e_rxtx_vec_neon.c   |  6 ++--
 drivers/net/intel/iavf/iavf_rxtx.c            | 16 ++++-----
 drivers/net/intel/iavf/iavf_rxtx.h            |  2 +-
 drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c   |  6 ++--
 drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c | 12 +++----
 drivers/net/intel/ice/ice_dcf_ethdev.c        |  2 +-
 drivers/net/intel/ice/ice_rxtx.c              | 36 +++++++++----------
 drivers/net/intel/ice/ice_rxtx_vec_avx2.c     |  6 ++--
 drivers/net/intel/ice/ice_rxtx_vec_avx512.c   |  6 ++--
 drivers/net/intel/idpf/idpf_common_rxtx.c     | 20 +++++------
 drivers/net/intel/idpf/idpf_common_rxtx.h     |  2 +-
 .../net/intel/idpf/idpf_common_rxtx_avx2.c    |  8 ++---
 .../net/intel/idpf/idpf_common_rxtx_avx512.c  |  8 ++---
 drivers/net/intel/idpf/idpf_rxtx.c            |  2 +-
 drivers/net/intel/idpf/idpf_rxtx_vec_common.h |  2 +-
 22 files changed, 104 insertions(+), 96 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index b259d98904..722f87a70c 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -10,6 +10,14 @@
 #include <rte_ethdev.h>
 #include <rte_vect.h>
 
+/*
+ * Structure of a 16-byte Tx descriptor common across i40e, ice, iavf and idpf drivers
+ */
+struct ci_tx_desc {
+	uint64_t buffer_addr; /* Address of descriptor's data buf */
+	uint64_t cmd_type_offset_bsz;
+};
+
 /* forward declaration of the common intel (ci) queue structure */
 struct ci_tx_queue;
 
@@ -33,10 +41,10 @@ typedef void (*ice_tx_release_mbufs_t)(struct ci_tx_queue *txq);
 
 struct ci_tx_queue {
 	union { /* TX ring virtual address */
-		volatile struct i40e_tx_desc *i40e_tx_ring;
-		volatile struct iavf_tx_desc *iavf_tx_ring;
-		volatile struct ice_tx_desc *ice_tx_ring;
-		volatile struct idpf_base_tx_desc *idpf_tx_ring;
+		volatile struct ci_tx_desc *i40e_tx_ring;
+		volatile struct ci_tx_desc *iavf_tx_ring;
+		volatile struct ci_tx_desc *ice_tx_ring;
+		volatile struct ci_tx_desc *idpf_tx_ring;
 		volatile union ixgbe_adv_tx_desc *ixgbe_tx_ring;
 	};
 	volatile uint8_t *qtx_tail;               /* register address of tail */
diff --git a/drivers/net/intel/cpfl/cpfl_rxtx.c b/drivers/net/intel/cpfl/cpfl_rxtx.c
index 2e4cf3b875..57c6f6e736 100644
--- a/drivers/net/intel/cpfl/cpfl_rxtx.c
+++ b/drivers/net/intel/cpfl/cpfl_rxtx.c
@@ -131,7 +131,7 @@ cpfl_dma_zone_reserve(struct rte_eth_dev *dev, uint16_t queue_idx,
 			ring_size = RTE_ALIGN(len * sizeof(struct idpf_flex_tx_sched_desc),
 					      CPFL_DMA_MEM_ALIGN);
 		else
-			ring_size = RTE_ALIGN(len * sizeof(struct idpf_base_tx_desc),
+			ring_size = RTE_ALIGN(len * sizeof(struct ci_tx_desc),
 					      CPFL_DMA_MEM_ALIGN);
 		memcpy(ring_name, "cpfl Tx ring", sizeof("cpfl Tx ring"));
 		break;
diff --git a/drivers/net/intel/i40e/i40e_fdir.c b/drivers/net/intel/i40e/i40e_fdir.c
index 55d18c5d4a..605df73c9e 100644
--- a/drivers/net/intel/i40e/i40e_fdir.c
+++ b/drivers/net/intel/i40e/i40e_fdir.c
@@ -1377,7 +1377,7 @@ i40e_find_available_buffer(struct rte_eth_dev *dev)
 	 */
 	if (fdir_info->txq_available_buf_count <= 0) {
 		uint16_t tmp_tail;
-		volatile struct i40e_tx_desc *tmp_txdp;
+		volatile struct ci_tx_desc *tmp_txdp;
 
 		tmp_tail = txq->tx_tail;
 		tmp_txdp = &txq->i40e_tx_ring[tmp_tail + 1];
@@ -1628,7 +1628,7 @@ i40e_flow_fdir_filter_programming(struct i40e_pf *pf,
 	struct ci_tx_queue *txq = pf->fdir.txq;
 	struct ci_rx_queue *rxq = pf->fdir.rxq;
 	const struct i40e_fdir_action *fdir_action = &filter->action;
-	volatile struct i40e_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	volatile struct i40e_filter_program_desc *fdirdp;
 	uint32_t td_cmd;
 	uint16_t vsi_id;
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index daf86c975e..8600ef3e3a 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -388,7 +388,7 @@ static inline int
 i40e_xmit_cleanup(struct ci_tx_queue *txq)
 {
 	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	volatile struct i40e_tx_desc *txd = txq->i40e_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->i40e_tx_ring;
 	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
 	uint16_t nb_tx_desc = txq->nb_tx_desc;
 	uint16_t desc_to_clean_to;
@@ -1092,8 +1092,8 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	struct ci_tx_queue *txq;
 	struct ci_tx_entry *sw_ring;
 	struct ci_tx_entry *txe, *txn;
-	volatile struct i40e_tx_desc *txd;
-	volatile struct i40e_tx_desc *txr;
+	volatile struct ci_tx_desc *txd;
+	volatile struct ci_tx_desc *txr;
 	struct rte_mbuf *tx_pkt;
 	struct rte_mbuf *m_seg;
 	uint32_t cd_tunneling_params;
@@ -1393,7 +1393,7 @@ i40e_tx_free_bufs(struct ci_tx_queue *txq)
 
 /* Populate 4 descriptors with data from 4 mbufs */
 static inline void
-tx4(volatile struct i40e_tx_desc *txdp, struct rte_mbuf **pkts)
+tx4(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
 {
 	uint64_t dma_addr;
 	uint32_t i;
@@ -1409,7 +1409,7 @@ tx4(volatile struct i40e_tx_desc *txdp, struct rte_mbuf **pkts)
 
 /* Populate 1 descriptor with data from 1 mbuf */
 static inline void
-tx1(volatile struct i40e_tx_desc *txdp, struct rte_mbuf **pkts)
+tx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
 {
 	uint64_t dma_addr;
 
@@ -1426,7 +1426,7 @@ i40e_tx_fill_hw_ring(struct ci_tx_queue *txq,
 		     struct rte_mbuf **pkts,
 		     uint16_t nb_pkts)
 {
-	volatile struct i40e_tx_desc *txdp = &txq->i40e_tx_ring[txq->tx_tail];
+	volatile struct ci_tx_desc *txdp = &txq->i40e_tx_ring[txq->tx_tail];
 	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
 	const int N_PER_LOOP = 4;
 	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
@@ -1454,7 +1454,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
 	     uint16_t nb_pkts)
 {
-	volatile struct i40e_tx_desc *txr = txq->i40e_tx_ring;
+	volatile struct ci_tx_desc *txr = txq->i40e_tx_ring;
 	uint16_t n = 0;
 
 	/**
@@ -2601,7 +2601,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	}
 
 	/* Allocate TX hardware ring descriptors. */
-	ring_size = sizeof(struct i40e_tx_desc) * I40E_MAX_RING_DESC;
+	ring_size = sizeof(struct ci_tx_desc) * I40E_MAX_RING_DESC;
 	ring_size = RTE_ALIGN(ring_size, I40E_DMA_MEM_ALIGN);
 	tz = rte_eth_dma_zone_reserve(dev, "i40e_tx_ring", queue_idx,
 			      ring_size, I40E_RING_BASE_ALIGN, socket_id);
@@ -2623,7 +2623,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	txq->tx_deferred_start = tx_conf->tx_deferred_start;
 
 	txq->tx_ring_dma = tz->iova;
-	txq->i40e_tx_ring = (struct i40e_tx_desc *)tz->addr;
+	txq->i40e_tx_ring = (struct ci_tx_desc *)tz->addr;
 
 	/* Allocate software ring */
 	txq->sw_ring =
@@ -2896,13 +2896,13 @@ i40e_reset_tx_queue(struct ci_tx_queue *txq)
 	}
 
 	txe = txq->sw_ring;
-	size = sizeof(struct i40e_tx_desc) * txq->nb_tx_desc;
+	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
 		((volatile char *)txq->i40e_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		volatile struct i40e_tx_desc *txd = &txq->i40e_tx_ring[i];
+		volatile struct ci_tx_desc *txd = &txq->i40e_tx_ring[i];
 
 		txd->cmd_type_offset_bsz =
 			rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE);
@@ -3204,7 +3204,7 @@ i40e_fdir_setup_tx_resources(struct i40e_pf *pf)
 	}
 
 	/* Allocate TX hardware ring descriptors. */
-	ring_size = sizeof(struct i40e_tx_desc) * I40E_FDIR_NUM_TX_DESC;
+	ring_size = sizeof(struct ci_tx_desc) * I40E_FDIR_NUM_TX_DESC;
 	ring_size = RTE_ALIGN(ring_size, I40E_DMA_MEM_ALIGN);
 
 	tz = rte_eth_dma_zone_reserve(dev, "fdir_tx_ring",
@@ -3223,7 +3223,7 @@ i40e_fdir_setup_tx_resources(struct i40e_pf *pf)
 	txq->i40e_vsi = pf->fdir.fdir_vsi;
 
 	txq->tx_ring_dma = tz->iova;
-	txq->i40e_tx_ring = (struct i40e_tx_desc *)tz->addr;
+	txq->i40e_tx_ring = (struct ci_tx_desc *)tz->addr;
 
 	/*
 	 * don't need to allocate software ring and reset for the fdir
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
index bbb6d907cf..ef5b252898 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
@@ -446,7 +446,7 @@ i40e_recv_scattered_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 }
 
 static inline void
-vtx1(volatile struct i40e_tx_desc *txdp,
+vtx1(volatile struct ci_tx_desc *txdp,
 	struct rte_mbuf *pkt, uint64_t flags)
 {
 	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
@@ -459,7 +459,7 @@ vtx1(volatile struct i40e_tx_desc *txdp,
 }
 
 static inline void
-vtx(volatile struct i40e_tx_desc *txdp,
+vtx(volatile struct ci_tx_desc *txdp,
 	struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
 	int i;
@@ -473,7 +473,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 			  uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct i40e_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
index 4e398b3140..137c1f9765 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
@@ -681,7 +681,7 @@ i40e_recv_scattered_pkts_vec_avx2(void *rx_queue, struct rte_mbuf **rx_pkts,
 
 
 static inline void
-vtx1(volatile struct i40e_tx_desc *txdp,
+vtx1(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf *pkt, uint64_t flags)
 {
 	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
@@ -694,7 +694,7 @@ vtx1(volatile struct i40e_tx_desc *txdp,
 }
 
 static inline void
-vtx(volatile struct i40e_tx_desc *txdp,
+vtx(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
 	const uint64_t hi_qw_tmpl = (I40E_TX_DESC_DTYPE_DATA |
@@ -739,7 +739,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 			  uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct i40e_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
index 571987d27a..6971488750 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
@@ -750,7 +750,7 @@ i40e_recv_scattered_pkts_vec_avx512(void *rx_queue,
 }
 
 static inline void
-vtx1(volatile struct i40e_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags)
+vtx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags)
 {
 	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
 		((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT) |
@@ -762,7 +762,7 @@ vtx1(volatile struct i40e_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags)
 }
 
 static inline void
-vtx(volatile struct i40e_tx_desc *txdp,
+vtx(volatile struct ci_tx_desc *txdp,
 	struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
 	const uint64_t hi_qw_tmpl = (I40E_TX_DESC_DTYPE_DATA |
@@ -807,7 +807,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 				 uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct i40e_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
index b5be0c1b59..6404b70c56 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
@@ -597,7 +597,7 @@ i40e_recv_scattered_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 }
 
 static inline void
-vtx1(volatile struct i40e_tx_desc *txdp,
+vtx1(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf *pkt, uint64_t flags)
 {
 	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
@@ -609,7 +609,7 @@ vtx1(volatile struct i40e_tx_desc *txdp,
 }
 
 static inline void
-vtx(volatile struct i40e_tx_desc *txdp, struct rte_mbuf **pkt,
+vtx(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkt,
 		uint16_t nb_pkts,  uint64_t flags)
 {
 	int i;
@@ -623,7 +623,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict tx_queue,
 	struct rte_mbuf **__rte_restrict tx_pkts, uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct i40e_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index 7f3e7fa2e8..2057bf6d8d 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -281,7 +281,7 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	}
 
 	txe = txq->sw_ring;
-	size = sizeof(struct iavf_tx_desc) * txq->nb_tx_desc;
+	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
 		((volatile char *)txq->iavf_tx_ring)[i] = 0;
 
@@ -832,7 +832,7 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	}
 
 	/* Allocate TX hardware ring descriptors. */
-	ring_size = sizeof(struct iavf_tx_desc) * IAVF_MAX_RING_DESC;
+	ring_size = sizeof(struct ci_tx_desc) * IAVF_MAX_RING_DESC;
 	ring_size = RTE_ALIGN(ring_size, IAVF_DMA_MEM_ALIGN);
 	mz = rte_eth_dma_zone_reserve(dev, "iavf_tx_ring", queue_idx,
 				      ring_size, IAVF_RING_BASE_ALIGN,
@@ -844,7 +844,7 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
 		return -ENOMEM;
 	}
 	txq->tx_ring_dma = mz->iova;
-	txq->iavf_tx_ring = (struct iavf_tx_desc *)mz->addr;
+	txq->iavf_tx_ring = (struct ci_tx_desc *)mz->addr;
 
 	txq->mz = mz;
 	reset_tx_queue(txq);
@@ -2338,7 +2338,7 @@ iavf_xmit_cleanup(struct ci_tx_queue *txq)
 	uint16_t desc_to_clean_to;
 	uint16_t nb_tx_to_clean;
 
-	volatile struct iavf_tx_desc *txd = txq->iavf_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->iavf_tx_ring;
 
 	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
 	if (desc_to_clean_to >= nb_tx_desc)
@@ -2728,7 +2728,7 @@ iavf_calc_pkt_desc(struct rte_mbuf *tx_pkt)
 }
 
 static inline void
-iavf_fill_data_desc(volatile struct iavf_tx_desc *desc,
+iavf_fill_data_desc(volatile struct ci_tx_desc *desc,
 	uint64_t desc_template,	uint16_t buffsz,
 	uint64_t buffer_addr)
 {
@@ -2761,7 +2761,7 @@ uint16_t
 iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = tx_queue;
-	volatile struct iavf_tx_desc *txr = txq->iavf_tx_ring;
+	volatile struct ci_tx_desc *txr = txq->iavf_tx_ring;
 	struct ci_tx_entry *txe_ring = txq->sw_ring;
 	struct ci_tx_entry *txe, *txn;
 	struct rte_mbuf *mb, *mb_seg;
@@ -2779,7 +2779,7 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	txe = &txe_ring[desc_idx];
 
 	for (idx = 0; idx < nb_pkts; idx++) {
-		volatile struct iavf_tx_desc *ddesc;
+		volatile struct ci_tx_desc *ddesc;
 		struct iavf_ipsec_crypto_pkt_metadata *ipsec_md;
 
 		uint16_t nb_desc_ctx, nb_desc_ipsec;
@@ -2900,7 +2900,7 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		mb_seg = mb;
 
 		do {
-			ddesc = (volatile struct iavf_tx_desc *)
+			ddesc = (volatile struct ci_tx_desc *)
 					&txr[desc_idx];
 
 			txn = &txe_ring[txe->next_id];
diff --git a/drivers/net/intel/iavf/iavf_rxtx.h b/drivers/net/intel/iavf/iavf_rxtx.h
index e1f78dcde0..dd6d884fc1 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.h
+++ b/drivers/net/intel/iavf/iavf_rxtx.h
@@ -678,7 +678,7 @@ void iavf_dump_tx_descriptor(const struct ci_tx_queue *txq,
 			    const volatile void *desc, uint16_t tx_id)
 {
 	const char *name;
-	const volatile struct iavf_tx_desc *tx_desc = desc;
+	const volatile struct ci_tx_desc *tx_desc = desc;
 	enum iavf_tx_desc_dtype_value type;
 
 
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
index e29958e0bc..5b62d51cf7 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
@@ -1630,7 +1630,7 @@ iavf_recv_scattered_pkts_vec_avx2_flex_rxd_offload(void *rx_queue,
 
 
 static __rte_always_inline void
-iavf_vtx1(volatile struct iavf_tx_desc *txdp,
+iavf_vtx1(volatile struct ci_tx_desc *txdp,
 	  struct rte_mbuf *pkt, uint64_t flags, bool offload, uint8_t vlan_flag)
 {
 	uint64_t high_qw =
@@ -1646,7 +1646,7 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp,
 }
 
 static __rte_always_inline void
-iavf_vtx(volatile struct iavf_tx_desc *txdp,
+iavf_vtx(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags, bool offload, uint8_t vlan_flag)
 {
 	const uint64_t hi_qw_tmpl = (IAVF_TX_DESC_DTYPE_DATA |
@@ -1713,7 +1713,7 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 			       uint16_t nb_pkts, bool offload)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct iavf_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	/* bit2 is reserved and must be set to 1 according to Spec */
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c b/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
index 7c0907b7cf..d79d96c7b7 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
@@ -1840,7 +1840,7 @@ tx_backlog_entry_avx512(struct ci_tx_entry_vec *txep,
 }
 
 static __rte_always_inline void
-iavf_vtx1(volatile struct iavf_tx_desc *txdp,
+iavf_vtx1(volatile struct ci_tx_desc *txdp,
 	  struct rte_mbuf *pkt, uint64_t flags,
 	  bool offload, uint8_t vlan_flag)
 {
@@ -1859,7 +1859,7 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp,
 #define IAVF_TX_LEN_MASK 0xAA
 #define IAVF_TX_OFF_MASK 0x55
 static __rte_always_inline void
-iavf_vtx(volatile struct iavf_tx_desc *txdp,
+iavf_vtx(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags,
 		bool offload, uint8_t vlan_flag)
 {
@@ -2068,7 +2068,7 @@ iavf_fill_ctx_desc_tunnelling_field(volatile uint64_t *qw0,
 }
 
 static __rte_always_inline void
-ctx_vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt,
+ctx_vtx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf *pkt,
 		uint64_t flags, bool offload, uint8_t vlan_flag)
 {
 	uint64_t high_ctx_qw = IAVF_TX_DESC_DTYPE_CONTEXT;
@@ -2106,7 +2106,7 @@ ctx_vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt,
 }
 
 static __rte_always_inline void
-ctx_vtx(volatile struct iavf_tx_desc *txdp,
+ctx_vtx(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags,
 		bool offload, uint8_t vlan_flag)
 {
@@ -2203,7 +2203,7 @@ iavf_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 				 uint16_t nb_pkts, bool offload)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct iavf_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	/* bit2 is reserved and must be set to 1 according to Spec */
@@ -2271,7 +2271,7 @@ iavf_xmit_fixed_burst_vec_avx512_ctx(void *tx_queue, struct rte_mbuf **tx_pkts,
 				 uint16_t nb_pkts, bool offload)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct iavf_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, nb_mbuf, tx_id;
 	/* bit2 is reserved and must be set to 1 according to Spec */
diff --git a/drivers/net/intel/ice/ice_dcf_ethdev.c b/drivers/net/intel/ice/ice_dcf_ethdev.c
index 81da5a4656..ab1d499cef 100644
--- a/drivers/net/intel/ice/ice_dcf_ethdev.c
+++ b/drivers/net/intel/ice/ice_dcf_ethdev.c
@@ -399,7 +399,7 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	}
 
 	txe = txq->sw_ring;
-	size = sizeof(struct ice_tx_desc) * txq->nb_tx_desc;
+	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
 		((volatile char *)txq->ice_tx_ring)[i] = 0;
 
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index d9163e7c2e..e431346b84 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -1115,13 +1115,13 @@ ice_reset_tx_queue(struct ci_tx_queue *txq)
 	}
 
 	txe = txq->sw_ring;
-	size = sizeof(struct ice_tx_desc) * txq->nb_tx_desc;
+	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
 		((volatile char *)txq->ice_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		volatile struct ice_tx_desc *txd = &txq->ice_tx_ring[i];
+		volatile struct ci_tx_desc *txd = &txq->ice_tx_ring[i];
 
 		txd->cmd_type_offset_bsz =
 			rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE);
@@ -1613,7 +1613,7 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
 	}
 
 	/* Allocate TX hardware ring descriptors. */
-	ring_size = sizeof(struct ice_tx_desc) * ICE_MAX_NUM_DESC_BY_MAC(hw);
+	ring_size = sizeof(struct ci_tx_desc) * ICE_MAX_NUM_DESC_BY_MAC(hw);
 	ring_size = RTE_ALIGN(ring_size, ICE_DMA_MEM_ALIGN);
 	tz = rte_eth_dma_zone_reserve(dev, "ice_tx_ring", queue_idx,
 				      ring_size, ICE_RING_BASE_ALIGN,
@@ -2607,7 +2607,7 @@ ice_fdir_setup_tx_resources(struct ice_pf *pf)
 	}
 
 	/* Allocate TX hardware ring descriptors. */
-	ring_size = sizeof(struct ice_tx_desc) * ICE_FDIR_NUM_TX_DESC;
+	ring_size = sizeof(struct ci_tx_desc) * ICE_FDIR_NUM_TX_DESC;
 	ring_size = RTE_ALIGN(ring_size, ICE_DMA_MEM_ALIGN);
 
 	tz = rte_eth_dma_zone_reserve(dev, "fdir_tx_ring",
@@ -2626,7 +2626,7 @@ ice_fdir_setup_tx_resources(struct ice_pf *pf)
 	txq->ice_vsi = pf->fdir.fdir_vsi;
 
 	txq->tx_ring_dma = tz->iova;
-	txq->ice_tx_ring = (struct ice_tx_desc *)tz->addr;
+	txq->ice_tx_ring = (struct ci_tx_desc *)tz->addr;
 	/*
 	 * don't need to allocate software ring and reset for the fdir
 	 * program queue just set the queue has been configured.
@@ -3015,7 +3015,7 @@ static inline int
 ice_xmit_cleanup(struct ci_tx_queue *txq)
 {
 	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	volatile struct ice_tx_desc *txd = txq->ice_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->ice_tx_ring;
 	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
 	uint16_t nb_tx_desc = txq->nb_tx_desc;
 	uint16_t desc_to_clean_to;
@@ -3136,8 +3136,8 @@ uint16_t
 ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq;
-	volatile struct ice_tx_desc *ice_tx_ring;
-	volatile struct ice_tx_desc *txd;
+	volatile struct ci_tx_desc *ice_tx_ring;
+	volatile struct ci_tx_desc *txd;
 	struct ci_tx_entry *sw_ring;
 	struct ci_tx_entry *txe, *txn;
 	struct rte_mbuf *tx_pkt;
@@ -3300,7 +3300,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 			while ((ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) &&
 				unlikely(slen > ICE_MAX_DATA_PER_TXD)) {
-				txd->buf_addr = rte_cpu_to_le_64(buf_dma_addr);
+				txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
 				txd->cmd_type_offset_bsz =
 				rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DATA |
 				((uint64_t)td_cmd << ICE_TXD_QW1_CMD_S) |
@@ -3319,7 +3319,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 				txn = &sw_ring[txe->next_id];
 			}
 
-			txd->buf_addr = rte_cpu_to_le_64(buf_dma_addr);
+			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
 			txd->cmd_type_offset_bsz =
 				rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DATA |
 				((uint64_t)td_cmd << ICE_TXD_QW1_CMD_S) |
@@ -3546,14 +3546,14 @@ ice_tx_done_cleanup(void *txq, uint32_t free_cnt)
 
 /* Populate 4 descriptors with data from 4 mbufs */
 static inline void
-tx4(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkts)
+tx4(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
 {
 	uint64_t dma_addr;
 	uint32_t i;
 
 	for (i = 0; i < 4; i++, txdp++, pkts++) {
 		dma_addr = rte_mbuf_data_iova(*pkts);
-		txdp->buf_addr = rte_cpu_to_le_64(dma_addr);
+		txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
 		txdp->cmd_type_offset_bsz =
 			ice_build_ctob((uint32_t)ICE_TD_CMD, 0,
 				       (*pkts)->data_len, 0);
@@ -3562,12 +3562,12 @@ tx4(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkts)
 
 /* Populate 1 descriptor with data from 1 mbuf */
 static inline void
-tx1(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkts)
+tx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
 {
 	uint64_t dma_addr;
 
 	dma_addr = rte_mbuf_data_iova(*pkts);
-	txdp->buf_addr = rte_cpu_to_le_64(dma_addr);
+	txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
 	txdp->cmd_type_offset_bsz =
 		ice_build_ctob((uint32_t)ICE_TD_CMD, 0,
 			       (*pkts)->data_len, 0);
@@ -3577,7 +3577,7 @@ static inline void
 ice_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
 		    uint16_t nb_pkts)
 {
-	volatile struct ice_tx_desc *txdp = &txq->ice_tx_ring[txq->tx_tail];
+	volatile struct ci_tx_desc *txdp = &txq->ice_tx_ring[txq->tx_tail];
 	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
 	const int N_PER_LOOP = 4;
 	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
@@ -3610,7 +3610,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
 	     uint16_t nb_pkts)
 {
-	volatile struct ice_tx_desc *txr = txq->ice_tx_ring;
+	volatile struct ci_tx_desc *txr = txq->ice_tx_ring;
 	uint16_t n = 0;
 
 	/**
@@ -4865,7 +4865,7 @@ ice_fdir_programming(struct ice_pf *pf, struct ice_fltr_desc *fdir_desc)
 	struct ci_tx_queue *txq = pf->fdir.txq;
 	struct ci_rx_queue *rxq = pf->fdir.rxq;
 	volatile struct ice_fltr_desc *fdirdp;
-	volatile struct ice_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	uint32_t td_cmd;
 	uint16_t i;
 
@@ -4875,7 +4875,7 @@ ice_fdir_programming(struct ice_pf *pf, struct ice_fltr_desc *fdir_desc)
 	fdirdp->dtype_cmd_vsi_fdid = fdir_desc->dtype_cmd_vsi_fdid;
 
 	txdp = &txq->ice_tx_ring[txq->tx_tail + 1];
-	txdp->buf_addr = rte_cpu_to_le_64(pf->fdir.dma_addr);
+	txdp->buffer_addr = rte_cpu_to_le_64(pf->fdir.dma_addr);
 	td_cmd = ICE_TX_DESC_CMD_EOP |
 		ICE_TX_DESC_CMD_RS  |
 		ICE_TX_DESC_CMD_DUMMY;
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
index 0ba1d557ca..bef7bb00ba 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
@@ -774,7 +774,7 @@ ice_recv_scattered_pkts_vec_avx2_offload(void *rx_queue,
 }
 
 static __rte_always_inline void
-ice_vtx1(volatile struct ice_tx_desc *txdp,
+ice_vtx1(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf *pkt, uint64_t flags, bool offload)
 {
 	uint64_t high_qw =
@@ -789,7 +789,7 @@ ice_vtx1(volatile struct ice_tx_desc *txdp,
 }
 
 static __rte_always_inline void
-ice_vtx(volatile struct ice_tx_desc *txdp,
+ice_vtx(volatile struct ci_tx_desc *txdp,
 	struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags, bool offload)
 {
 	const uint64_t hi_qw_tmpl = (ICE_TX_DESC_DTYPE_DATA |
@@ -852,7 +852,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 			      uint16_t nb_pkts, bool offload)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct ice_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = ICE_TD_CMD;
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
index 7c6fe82072..1f6bf5fc8e 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
@@ -847,7 +847,7 @@ ice_recv_scattered_pkts_vec_avx512_offload(void *rx_queue,
 }
 
 static __rte_always_inline void
-ice_vtx1(volatile struct ice_tx_desc *txdp,
+ice_vtx1(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf *pkt, uint64_t flags, bool do_offload)
 {
 	uint64_t high_qw =
@@ -863,7 +863,7 @@ ice_vtx1(volatile struct ice_tx_desc *txdp,
 }
 
 static __rte_always_inline void
-ice_vtx(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkt,
+ice_vtx(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkt,
 	uint16_t nb_pkts,  uint64_t flags, bool do_offload)
 {
 	const uint64_t hi_qw_tmpl = (ICE_TX_DESC_DTYPE_DATA |
@@ -916,7 +916,7 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 				uint16_t nb_pkts, bool do_offload)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct ice_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = ICE_TD_CMD;
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 797ee515dd..be3c1ef216 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -264,13 +264,13 @@ idpf_qc_single_tx_queue_reset(struct ci_tx_queue *txq)
 	}
 
 	txe = txq->sw_ring;
-	size = sizeof(struct idpf_base_tx_desc) * txq->nb_tx_desc;
+	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
 		((volatile char *)txq->idpf_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		txq->idpf_tx_ring[i].qw1 =
+		txq->idpf_tx_ring[i].cmd_type_offset_bsz =
 			rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
@@ -1335,14 +1335,14 @@ idpf_xmit_cleanup(struct ci_tx_queue *txq)
 	uint16_t desc_to_clean_to;
 	uint16_t nb_tx_to_clean;
 
-	volatile struct idpf_base_tx_desc *txd = txq->idpf_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->idpf_tx_ring;
 
 	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
 	if (desc_to_clean_to >= nb_tx_desc)
 		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
 
 	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-	if ((txd[desc_to_clean_to].qw1 &
+	if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
 	     rte_cpu_to_le_64(IDPF_TXD_QW1_DTYPE_M)) !=
 	    rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE)) {
 		TX_LOG(DEBUG, "TX descriptor %4u is not done "
@@ -1358,7 +1358,7 @@ idpf_xmit_cleanup(struct ci_tx_queue *txq)
 		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
 					    last_desc_cleaned);
 
-	txd[desc_to_clean_to].qw1 = 0;
+	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
 
 	txq->last_desc_cleaned = desc_to_clean_to;
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
@@ -1372,8 +1372,8 @@ uint16_t
 idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			  uint16_t nb_pkts)
 {
-	volatile struct idpf_base_tx_desc *txd;
-	volatile struct idpf_base_tx_desc *txr;
+	volatile struct ci_tx_desc *txd;
+	volatile struct ci_tx_desc *txr;
 	union idpf_tx_offload tx_offload = {0};
 	struct ci_tx_entry *txe, *txn;
 	struct ci_tx_entry *sw_ring;
@@ -1491,8 +1491,8 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			/* Setup TX Descriptor */
 			slen = m_seg->data_len;
 			buf_dma_addr = rte_mbuf_data_iova(m_seg);
-			txd->buf_addr = rte_cpu_to_le_64(buf_dma_addr);
-			txd->qw1 = rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DATA |
+			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
+			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DATA |
 				((uint64_t)td_cmd  << IDPF_TXD_QW1_CMD_S) |
 				((uint64_t)td_offset << IDPF_TXD_QW1_OFFSET_S) |
 				((uint64_t)slen << IDPF_TXD_QW1_TX_BUF_SZ_S));
@@ -1519,7 +1519,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			txq->nb_tx_used = 0;
 		}
 
-		txd->qw1 |= rte_cpu_to_le_16(td_cmd << IDPF_TXD_QW1_CMD_S);
+		txd->cmd_type_offset_bsz |= rte_cpu_to_le_16(td_cmd << IDPF_TXD_QW1_CMD_S);
 	}
 
 end_of_tx:
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.h b/drivers/net/intel/idpf/idpf_common_rxtx.h
index 7c6ff5d047..2f2fa153b2 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.h
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.h
@@ -182,7 +182,7 @@ union idpf_tx_offload {
 };
 
 union idpf_tx_desc {
-	struct idpf_base_tx_desc *tx_ring;
+	struct ci_tx_desc *tx_ring;
 	struct idpf_flex_tx_sched_desc *desc_ring;
 	struct idpf_splitq_tx_compl_desc *compl_ring;
 };
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c b/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
index 21c8f79254..5f5d538dcb 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
@@ -483,7 +483,7 @@ idpf_dp_singleq_recv_pkts_avx2(void *rx_queue, struct rte_mbuf **rx_pkts, uint16
 }
 
 static inline void
-idpf_singleq_vtx1(volatile struct idpf_base_tx_desc *txdp,
+idpf_singleq_vtx1(volatile struct ci_tx_desc *txdp,
 		  struct rte_mbuf *pkt, uint64_t flags)
 {
 	uint64_t high_qw =
@@ -497,7 +497,7 @@ idpf_singleq_vtx1(volatile struct idpf_base_tx_desc *txdp,
 }
 
 static inline void
-idpf_singleq_vtx(volatile struct idpf_base_tx_desc *txdp,
+idpf_singleq_vtx(volatile struct ci_tx_desc *txdp,
 		 struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
 	const uint64_t hi_qw_tmpl = (IDPF_TX_DESC_DTYPE_DATA |
@@ -556,7 +556,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts
 				       uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct idpf_base_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = IDPF_TX_DESC_CMD_EOP;
@@ -604,7 +604,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->idpf_tx_ring[txq->tx_next_rs].qw1 |=
+		txq->idpf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)IDPF_TX_DESC_CMD_RS) <<
 					 IDPF_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
index bc2cadd738..c1ec3d1222 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
@@ -1000,7 +1000,7 @@ idpf_dp_splitq_recv_pkts_avx512(void *rx_queue, struct rte_mbuf **rx_pkts,
 }
 
 static __rte_always_inline void
-idpf_singleq_vtx1(volatile struct idpf_base_tx_desc *txdp,
+idpf_singleq_vtx1(volatile struct ci_tx_desc *txdp,
 	  struct rte_mbuf *pkt, uint64_t flags)
 {
 	uint64_t high_qw =
@@ -1016,7 +1016,7 @@ idpf_singleq_vtx1(volatile struct idpf_base_tx_desc *txdp,
 #define IDPF_TX_LEN_MASK 0xAA
 #define IDPF_TX_OFF_MASK 0x55
 static __rte_always_inline void
-idpf_singleq_vtx(volatile struct idpf_base_tx_desc *txdp,
+idpf_singleq_vtx(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
 	const uint64_t hi_qw_tmpl = (IDPF_TX_DESC_DTYPE_DATA  |
@@ -1072,7 +1072,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pk
 					 uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = tx_queue;
-	volatile struct idpf_base_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = IDPF_TX_DESC_CMD_EOP;
@@ -1123,7 +1123,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pk
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->idpf_tx_ring[txq->tx_next_rs].qw1 |=
+		txq->idpf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)IDPF_TX_DESC_CMD_RS) <<
 					 IDPF_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/idpf/idpf_rxtx.c b/drivers/net/intel/idpf/idpf_rxtx.c
index 47f8347b41..9b63e44341 100644
--- a/drivers/net/intel/idpf/idpf_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_rxtx.c
@@ -72,7 +72,7 @@ idpf_dma_zone_reserve(struct rte_eth_dev *dev, uint16_t queue_idx,
 			ring_size = RTE_ALIGN(len * sizeof(struct idpf_flex_tx_sched_desc),
 					      IDPF_DMA_MEM_ALIGN);
 		else
-			ring_size = RTE_ALIGN(len * sizeof(struct idpf_base_tx_desc),
+			ring_size = RTE_ALIGN(len * sizeof(struct ci_tx_desc),
 					      IDPF_DMA_MEM_ALIGN);
 		rte_memcpy(ring_name, "idpf Tx ring", sizeof("idpf Tx ring"));
 		break;
diff --git a/drivers/net/intel/idpf/idpf_rxtx_vec_common.h b/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
index 425f0792a1..4702061484 100644
--- a/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
+++ b/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
@@ -31,7 +31,7 @@ idpf_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 	if (txq->complq != NULL)
 		return 1;
 
-	return (txq->idpf_tx_ring[idx].qw1 &
+	return (txq->idpf_tx_ring[idx].cmd_type_offset_bsz &
 			rte_cpu_to_le_64(IDPF_TXD_QW1_DTYPE_M)) ==
 				rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE);
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 02/36] net/intel: use common Tx ring structure
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 01/36] net/intel: create common Tx descriptor structure Bruce Richardson
@ 2026-01-13 15:14   ` Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 03/36] net/intel: create common post-Tx cleanup function Bruce Richardson
                     ` (35 subsequent siblings)
  37 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-13 15:14 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Praveen Shetty, Vladimir Medvedkin,
	Anatoly Burakov, Jingjing Wu

Rather than having separate per-driver ring pointers in a union, since
we now have a common descriptor type, we can merge all but the ixgbe
pointer into one value.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h                 |  5 +--
 drivers/net/intel/cpfl/cpfl_rxtx.c            |  2 +-
 drivers/net/intel/i40e/i40e_fdir.c            |  6 ++--
 drivers/net/intel/i40e/i40e_rxtx.c            | 22 ++++++------
 .../net/intel/i40e/i40e_rxtx_vec_altivec.c    |  6 ++--
 drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c   |  6 ++--
 drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c |  6 ++--
 drivers/net/intel/i40e/i40e_rxtx_vec_common.h |  2 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_neon.c   |  6 ++--
 drivers/net/intel/iavf/iavf_rxtx.c            | 14 ++++----
 drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c   |  6 ++--
 drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c | 12 +++----
 drivers/net/intel/iavf/iavf_rxtx_vec_common.h |  2 +-
 drivers/net/intel/ice/ice_dcf_ethdev.c        |  4 +--
 drivers/net/intel/ice/ice_rxtx.c              | 34 +++++++++----------
 drivers/net/intel/ice/ice_rxtx_vec_avx2.c     |  6 ++--
 drivers/net/intel/ice/ice_rxtx_vec_avx512.c   |  6 ++--
 drivers/net/intel/ice/ice_rxtx_vec_common.h   |  2 +-
 drivers/net/intel/idpf/idpf_common_rxtx.c     |  8 ++---
 .../net/intel/idpf/idpf_common_rxtx_avx2.c    |  6 ++--
 .../net/intel/idpf/idpf_common_rxtx_avx512.c  |  6 ++--
 drivers/net/intel/idpf/idpf_rxtx.c            |  2 +-
 drivers/net/intel/idpf/idpf_rxtx_vec_common.h |  2 +-
 23 files changed, 84 insertions(+), 87 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 722f87a70c..a9ff3bebd5 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -41,10 +41,7 @@ typedef void (*ice_tx_release_mbufs_t)(struct ci_tx_queue *txq);
 
 struct ci_tx_queue {
 	union { /* TX ring virtual address */
-		volatile struct ci_tx_desc *i40e_tx_ring;
-		volatile struct ci_tx_desc *iavf_tx_ring;
-		volatile struct ci_tx_desc *ice_tx_ring;
-		volatile struct ci_tx_desc *idpf_tx_ring;
+		volatile struct ci_tx_desc *ci_tx_ring;
 		volatile union ixgbe_adv_tx_desc *ixgbe_tx_ring;
 	};
 	volatile uint8_t *qtx_tail;               /* register address of tail */
diff --git a/drivers/net/intel/cpfl/cpfl_rxtx.c b/drivers/net/intel/cpfl/cpfl_rxtx.c
index 57c6f6e736..a3127e7c97 100644
--- a/drivers/net/intel/cpfl/cpfl_rxtx.c
+++ b/drivers/net/intel/cpfl/cpfl_rxtx.c
@@ -594,7 +594,7 @@ cpfl_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 	}
 
 	if (!is_splitq) {
-		txq->idpf_tx_ring = mz->addr;
+		txq->ci_tx_ring = mz->addr;
 		idpf_qc_single_tx_queue_reset(txq);
 	} else {
 		txq->desc_ring = mz->addr;
diff --git a/drivers/net/intel/i40e/i40e_fdir.c b/drivers/net/intel/i40e/i40e_fdir.c
index 605df73c9e..8a01aec0e2 100644
--- a/drivers/net/intel/i40e/i40e_fdir.c
+++ b/drivers/net/intel/i40e/i40e_fdir.c
@@ -1380,7 +1380,7 @@ i40e_find_available_buffer(struct rte_eth_dev *dev)
 		volatile struct ci_tx_desc *tmp_txdp;
 
 		tmp_tail = txq->tx_tail;
-		tmp_txdp = &txq->i40e_tx_ring[tmp_tail + 1];
+		tmp_txdp = &txq->ci_tx_ring[tmp_tail + 1];
 
 		do {
 			if ((tmp_txdp->cmd_type_offset_bsz &
@@ -1637,7 +1637,7 @@ i40e_flow_fdir_filter_programming(struct i40e_pf *pf,
 
 	PMD_DRV_LOG(INFO, "filling filter programming descriptor.");
 	fdirdp = (volatile struct i40e_filter_program_desc *)
-				(&txq->i40e_tx_ring[txq->tx_tail]);
+				(&txq->ci_tx_ring[txq->tx_tail]);
 
 	fdirdp->qindex_flex_ptype_vsi =
 			rte_cpu_to_le_32((fdir_action->rx_queue <<
@@ -1707,7 +1707,7 @@ i40e_flow_fdir_filter_programming(struct i40e_pf *pf,
 	fdirdp->fd_id = rte_cpu_to_le_32(filter->soft_id);
 
 	PMD_DRV_LOG(INFO, "filling transmit descriptor.");
-	txdp = &txq->i40e_tx_ring[txq->tx_tail + 1];
+	txdp = &txq->ci_tx_ring[txq->tx_tail + 1];
 	txdp->buffer_addr = rte_cpu_to_le_64(pf->fdir.dma_addr[txq->tx_tail >> 1]);
 
 	td_cmd = I40E_TX_DESC_CMD_EOP |
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 8600ef3e3a..51e9c1a1f0 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -388,7 +388,7 @@ static inline int
 i40e_xmit_cleanup(struct ci_tx_queue *txq)
 {
 	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	volatile struct ci_tx_desc *txd = txq->i40e_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
 	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
 	uint16_t nb_tx_desc = txq->nb_tx_desc;
 	uint16_t desc_to_clean_to;
@@ -1112,7 +1112,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 	txq = tx_queue;
 	sw_ring = txq->sw_ring;
-	txr = txq->i40e_tx_ring;
+	txr = txq->ci_tx_ring;
 	tx_id = txq->tx_tail;
 	txe = &sw_ring[tx_id];
 
@@ -1347,7 +1347,7 @@ i40e_tx_free_bufs(struct ci_tx_queue *txq)
 	const uint16_t k = RTE_ALIGN_FLOOR(tx_rs_thresh, I40E_TX_MAX_FREE_BUF_SZ);
 	const uint16_t m = tx_rs_thresh % I40E_TX_MAX_FREE_BUF_SZ;
 
-	if ((txq->i40e_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
+	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
 			rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) !=
 			rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE))
 		return 0;
@@ -1426,7 +1426,7 @@ i40e_tx_fill_hw_ring(struct ci_tx_queue *txq,
 		     struct rte_mbuf **pkts,
 		     uint16_t nb_pkts)
 {
-	volatile struct ci_tx_desc *txdp = &txq->i40e_tx_ring[txq->tx_tail];
+	volatile struct ci_tx_desc *txdp = &txq->ci_tx_ring[txq->tx_tail];
 	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
 	const int N_PER_LOOP = 4;
 	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
@@ -1454,7 +1454,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
 	     uint16_t nb_pkts)
 {
-	volatile struct ci_tx_desc *txr = txq->i40e_tx_ring;
+	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
 	uint16_t n = 0;
 
 	/**
@@ -2406,7 +2406,7 @@ i40e_dev_tx_descriptor_status(void *tx_queue, uint16_t offset)
 			desc -= txq->nb_tx_desc;
 	}
 
-	status = &txq->i40e_tx_ring[desc].cmd_type_offset_bsz;
+	status = &txq->ci_tx_ring[desc].cmd_type_offset_bsz;
 	mask = rte_le_to_cpu_64(I40E_TXD_QW1_DTYPE_MASK);
 	expect = rte_cpu_to_le_64(
 		I40E_TX_DESC_DTYPE_DESC_DONE << I40E_TXD_QW1_DTYPE_SHIFT);
@@ -2603,7 +2603,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	/* Allocate TX hardware ring descriptors. */
 	ring_size = sizeof(struct ci_tx_desc) * I40E_MAX_RING_DESC;
 	ring_size = RTE_ALIGN(ring_size, I40E_DMA_MEM_ALIGN);
-	tz = rte_eth_dma_zone_reserve(dev, "i40e_tx_ring", queue_idx,
+	tz = rte_eth_dma_zone_reserve(dev, "ci_tx_ring", queue_idx,
 			      ring_size, I40E_RING_BASE_ALIGN, socket_id);
 	if (!tz) {
 		i40e_tx_queue_release(txq);
@@ -2623,7 +2623,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	txq->tx_deferred_start = tx_conf->tx_deferred_start;
 
 	txq->tx_ring_dma = tz->iova;
-	txq->i40e_tx_ring = (struct ci_tx_desc *)tz->addr;
+	txq->ci_tx_ring = (struct ci_tx_desc *)tz->addr;
 
 	/* Allocate software ring */
 	txq->sw_ring =
@@ -2898,11 +2898,11 @@ i40e_reset_tx_queue(struct ci_tx_queue *txq)
 	txe = txq->sw_ring;
 	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
-		((volatile char *)txq->i40e_tx_ring)[i] = 0;
+		((volatile char *)txq->ci_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		volatile struct ci_tx_desc *txd = &txq->i40e_tx_ring[i];
+		volatile struct ci_tx_desc *txd = &txq->ci_tx_ring[i];
 
 		txd->cmd_type_offset_bsz =
 			rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE);
@@ -3223,7 +3223,7 @@ i40e_fdir_setup_tx_resources(struct i40e_pf *pf)
 	txq->i40e_vsi = pf->fdir.fdir_vsi;
 
 	txq->tx_ring_dma = tz->iova;
-	txq->i40e_tx_ring = (struct ci_tx_desc *)tz->addr;
+	txq->ci_tx_ring = (struct ci_tx_desc *)tz->addr;
 
 	/*
 	 * don't need to allocate software ring and reset for the fdir
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
index ef5b252898..81e9e2bc0b 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
@@ -489,7 +489,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->i40e_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = &txq->sw_ring_vec[tx_id];
 
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
@@ -509,7 +509,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->i40e_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = &txq->sw_ring_vec[tx_id];
 	}
 
@@ -519,7 +519,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->i40e_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
 						I40E_TXD_QW1_CMD_SHIFT);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
index 137c1f9765..f054bd41bf 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
@@ -753,7 +753,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->i40e_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = &txq->sw_ring_vec[tx_id];
 
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
@@ -774,7 +774,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->i40e_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = &txq->sw_ring_vec[tx_id];
 	}
 
@@ -784,7 +784,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->i40e_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
 						I40E_TXD_QW1_CMD_SHIFT);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
index 6971488750..9a967faeee 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
@@ -821,7 +821,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->i40e_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = (void *)txq->sw_ring;
 	txep += tx_id;
 
@@ -843,7 +843,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = txq->i40e_tx_ring;
+		txdp = txq->ci_tx_ring;
 		txep = (void *)txq->sw_ring;
 	}
 
@@ -853,7 +853,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->i40e_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
 						I40E_TXD_QW1_CMD_SHIFT);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_common.h b/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
index 14651f2f06..1fd7fc75bf 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
@@ -15,7 +15,7 @@
 static inline int
 i40e_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 {
-	return (txq->i40e_tx_ring[idx].cmd_type_offset_bsz &
+	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
 			rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) ==
 				rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE);
 }
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
index 6404b70c56..0b95152232 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
@@ -638,7 +638,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict tx_queue,
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->i40e_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = &txq->sw_ring_vec[tx_id];
 
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
@@ -658,7 +658,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict tx_queue,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->i40e_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = &txq->sw_ring_vec[tx_id];
 	}
 
@@ -668,7 +668,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict tx_queue,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->i40e_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
 						I40E_TXD_QW1_CMD_SHIFT);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index 2057bf6d8d..ea4aaf08a3 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -283,11 +283,11 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	txe = txq->sw_ring;
 	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
-		((volatile char *)txq->iavf_tx_ring)[i] = 0;
+		((volatile char *)txq->ci_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		txq->iavf_tx_ring[i].cmd_type_offset_bsz =
+		txq->ci_tx_ring[i].cmd_type_offset_bsz =
 			rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
@@ -834,7 +834,7 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	/* Allocate TX hardware ring descriptors. */
 	ring_size = sizeof(struct ci_tx_desc) * IAVF_MAX_RING_DESC;
 	ring_size = RTE_ALIGN(ring_size, IAVF_DMA_MEM_ALIGN);
-	mz = rte_eth_dma_zone_reserve(dev, "iavf_tx_ring", queue_idx,
+	mz = rte_eth_dma_zone_reserve(dev, "ci_tx_ring", queue_idx,
 				      ring_size, IAVF_RING_BASE_ALIGN,
 				      socket_id);
 	if (!mz) {
@@ -844,7 +844,7 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
 		return -ENOMEM;
 	}
 	txq->tx_ring_dma = mz->iova;
-	txq->iavf_tx_ring = (struct ci_tx_desc *)mz->addr;
+	txq->ci_tx_ring = (struct ci_tx_desc *)mz->addr;
 
 	txq->mz = mz;
 	reset_tx_queue(txq);
@@ -2338,7 +2338,7 @@ iavf_xmit_cleanup(struct ci_tx_queue *txq)
 	uint16_t desc_to_clean_to;
 	uint16_t nb_tx_to_clean;
 
-	volatile struct ci_tx_desc *txd = txq->iavf_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
 
 	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
 	if (desc_to_clean_to >= nb_tx_desc)
@@ -2761,7 +2761,7 @@ uint16_t
 iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = tx_queue;
-	volatile struct ci_tx_desc *txr = txq->iavf_tx_ring;
+	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
 	struct ci_tx_entry *txe_ring = txq->sw_ring;
 	struct ci_tx_entry *txe, *txn;
 	struct rte_mbuf *mb, *mb_seg;
@@ -4467,7 +4467,7 @@ iavf_dev_tx_desc_status(void *tx_queue, uint16_t offset)
 			desc -= txq->nb_tx_desc;
 	}
 
-	status = &txq->iavf_tx_ring[desc].cmd_type_offset_bsz;
+	status = &txq->ci_tx_ring[desc].cmd_type_offset_bsz;
 	mask = rte_le_to_cpu_64(IAVF_TXD_QW1_DTYPE_MASK);
 	expect = rte_cpu_to_le_64(
 		 IAVF_TX_DESC_DTYPE_DESC_DONE << IAVF_TXD_QW1_DTYPE_SHIFT);
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
index 5b62d51cf7..89ce841b9e 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
@@ -1729,7 +1729,7 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	nb_commit = nb_pkts;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->iavf_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = &txq->sw_ring_vec[tx_id];
 
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
@@ -1750,7 +1750,7 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->iavf_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = &txq->sw_ring_vec[tx_id];
 	}
 
@@ -1760,7 +1760,7 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->iavf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
 					 IAVF_TXD_QW1_CMD_SHIFT);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c b/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
index d79d96c7b7..ad1b0b90cd 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
@@ -2219,7 +2219,7 @@ iavf_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	nb_commit = nb_pkts;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->iavf_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = (void *)txq->sw_ring;
 	txep += tx_id;
 
@@ -2241,7 +2241,7 @@ iavf_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->iavf_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = (void *)txq->sw_ring;
 		txep += tx_id;
 	}
@@ -2252,7 +2252,7 @@ iavf_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->iavf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
 					 IAVF_TXD_QW1_CMD_SHIFT);
 		txq->tx_next_rs =
@@ -2288,7 +2288,7 @@ iavf_xmit_fixed_burst_vec_avx512_ctx(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	nb_pkts = nb_commit >> 1;
 	tx_id = txq->tx_tail;
-	txdp = &txq->iavf_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = (void *)txq->sw_ring;
 	txep += (tx_id >> 1);
 
@@ -2309,7 +2309,7 @@ iavf_xmit_fixed_burst_vec_avx512_ctx(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 		tx_id = 0;
 		/* avoid reach the end of ring */
-		txdp = txq->iavf_tx_ring;
+		txdp = txq->ci_tx_ring;
 		txep = (void *)txq->sw_ring;
 	}
 
@@ -2320,7 +2320,7 @@ iavf_xmit_fixed_burst_vec_avx512_ctx(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 
 	if (tx_id > txq->tx_next_rs) {
-		txq->iavf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
 					 IAVF_TXD_QW1_CMD_SHIFT);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_common.h b/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
index f1ea57034f..1832b76f89 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
@@ -14,7 +14,7 @@
 static inline int
 iavf_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 {
-	return (txq->iavf_tx_ring[idx].cmd_type_offset_bsz &
+	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
 			rte_cpu_to_le_64(IAVF_TXD_QW1_DTYPE_MASK)) ==
 				rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE);
 }
diff --git a/drivers/net/intel/ice/ice_dcf_ethdev.c b/drivers/net/intel/ice/ice_dcf_ethdev.c
index ab1d499cef..5f537b4c12 100644
--- a/drivers/net/intel/ice/ice_dcf_ethdev.c
+++ b/drivers/net/intel/ice/ice_dcf_ethdev.c
@@ -401,11 +401,11 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	txe = txq->sw_ring;
 	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
-		((volatile char *)txq->ice_tx_ring)[i] = 0;
+		((volatile char *)txq->ci_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		txq->ice_tx_ring[i].cmd_type_offset_bsz =
+		txq->ci_tx_ring[i].cmd_type_offset_bsz =
 			rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index e431346b84..b1e67c2c67 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -1117,11 +1117,11 @@ ice_reset_tx_queue(struct ci_tx_queue *txq)
 	txe = txq->sw_ring;
 	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
-		((volatile char *)txq->ice_tx_ring)[i] = 0;
+		((volatile char *)txq->ci_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		volatile struct ci_tx_desc *txd = &txq->ice_tx_ring[i];
+		volatile struct ci_tx_desc *txd = &txq->ci_tx_ring[i];
 
 		txd->cmd_type_offset_bsz =
 			rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE);
@@ -1615,7 +1615,7 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
 	/* Allocate TX hardware ring descriptors. */
 	ring_size = sizeof(struct ci_tx_desc) * ICE_MAX_NUM_DESC_BY_MAC(hw);
 	ring_size = RTE_ALIGN(ring_size, ICE_DMA_MEM_ALIGN);
-	tz = rte_eth_dma_zone_reserve(dev, "ice_tx_ring", queue_idx,
+	tz = rte_eth_dma_zone_reserve(dev, "ci_tx_ring", queue_idx,
 				      ring_size, ICE_RING_BASE_ALIGN,
 				      socket_id);
 	if (!tz) {
@@ -1637,7 +1637,7 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
 	txq->tx_deferred_start = tx_conf->tx_deferred_start;
 
 	txq->tx_ring_dma = tz->iova;
-	txq->ice_tx_ring = tz->addr;
+	txq->ci_tx_ring = tz->addr;
 
 	/* Allocate software ring */
 	txq->sw_ring =
@@ -2543,7 +2543,7 @@ ice_tx_descriptor_status(void *tx_queue, uint16_t offset)
 			desc -= txq->nb_tx_desc;
 	}
 
-	status = &txq->ice_tx_ring[desc].cmd_type_offset_bsz;
+	status = &txq->ci_tx_ring[desc].cmd_type_offset_bsz;
 	mask = rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M);
 	expect = rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE <<
 				  ICE_TXD_QW1_DTYPE_S);
@@ -2626,7 +2626,7 @@ ice_fdir_setup_tx_resources(struct ice_pf *pf)
 	txq->ice_vsi = pf->fdir.fdir_vsi;
 
 	txq->tx_ring_dma = tz->iova;
-	txq->ice_tx_ring = (struct ci_tx_desc *)tz->addr;
+	txq->ci_tx_ring = (struct ci_tx_desc *)tz->addr;
 	/*
 	 * don't need to allocate software ring and reset for the fdir
 	 * program queue just set the queue has been configured.
@@ -3015,7 +3015,7 @@ static inline int
 ice_xmit_cleanup(struct ci_tx_queue *txq)
 {
 	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	volatile struct ci_tx_desc *txd = txq->ice_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
 	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
 	uint16_t nb_tx_desc = txq->nb_tx_desc;
 	uint16_t desc_to_clean_to;
@@ -3136,7 +3136,7 @@ uint16_t
 ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq;
-	volatile struct ci_tx_desc *ice_tx_ring;
+	volatile struct ci_tx_desc *ci_tx_ring;
 	volatile struct ci_tx_desc *txd;
 	struct ci_tx_entry *sw_ring;
 	struct ci_tx_entry *txe, *txn;
@@ -3159,7 +3159,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 	txq = tx_queue;
 	sw_ring = txq->sw_ring;
-	ice_tx_ring = txq->ice_tx_ring;
+	ci_tx_ring = txq->ci_tx_ring;
 	tx_id = txq->tx_tail;
 	txe = &sw_ring[tx_id];
 
@@ -3245,7 +3245,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			/* Setup TX context descriptor if required */
 			volatile struct ice_tx_ctx_desc *ctx_txd =
 				(volatile struct ice_tx_ctx_desc *)
-					&ice_tx_ring[tx_id];
+					&ci_tx_ring[tx_id];
 			uint16_t cd_l2tag2 = 0;
 			uint64_t cd_type_cmd_tso_mss = ICE_TX_DESC_DTYPE_CTX;
 
@@ -3287,7 +3287,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		m_seg = tx_pkt;
 
 		do {
-			txd = &ice_tx_ring[tx_id];
+			txd = &ci_tx_ring[tx_id];
 			txn = &sw_ring[txe->next_id];
 
 			if (txe->mbuf)
@@ -3315,7 +3315,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 				txe->last_id = tx_last;
 				tx_id = txe->next_id;
 				txe = txn;
-				txd = &ice_tx_ring[tx_id];
+				txd = &ci_tx_ring[tx_id];
 				txn = &sw_ring[txe->next_id];
 			}
 
@@ -3398,7 +3398,7 @@ ice_tx_free_bufs(struct ci_tx_queue *txq)
 	struct ci_tx_entry *txep;
 	uint16_t i;
 
-	if ((txq->ice_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
+	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
 	     rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M)) !=
 	    rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE))
 		return 0;
@@ -3577,7 +3577,7 @@ static inline void
 ice_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
 		    uint16_t nb_pkts)
 {
-	volatile struct ci_tx_desc *txdp = &txq->ice_tx_ring[txq->tx_tail];
+	volatile struct ci_tx_desc *txdp = &txq->ci_tx_ring[txq->tx_tail];
 	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
 	const int N_PER_LOOP = 4;
 	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
@@ -3610,7 +3610,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
 	     uint16_t nb_pkts)
 {
-	volatile struct ci_tx_desc *txr = txq->ice_tx_ring;
+	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
 	uint16_t n = 0;
 
 	/**
@@ -4870,11 +4870,11 @@ ice_fdir_programming(struct ice_pf *pf, struct ice_fltr_desc *fdir_desc)
 	uint16_t i;
 
 	fdirdp = (volatile struct ice_fltr_desc *)
-		(&txq->ice_tx_ring[txq->tx_tail]);
+		(&txq->ci_tx_ring[txq->tx_tail]);
 	fdirdp->qidx_compq_space_stat = fdir_desc->qidx_compq_space_stat;
 	fdirdp->dtype_cmd_vsi_fdid = fdir_desc->dtype_cmd_vsi_fdid;
 
-	txdp = &txq->ice_tx_ring[txq->tx_tail + 1];
+	txdp = &txq->ci_tx_ring[txq->tx_tail + 1];
 	txdp->buffer_addr = rte_cpu_to_le_64(pf->fdir.dma_addr);
 	td_cmd = ICE_TX_DESC_CMD_EOP |
 		ICE_TX_DESC_CMD_RS  |
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
index bef7bb00ba..0a1df0b2f6 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
@@ -869,7 +869,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->ice_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = &txq->sw_ring_vec[tx_id];
 
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
@@ -890,7 +890,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->ice_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = &txq->sw_ring_vec[tx_id];
 	}
 
@@ -900,7 +900,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->ice_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS) <<
 					 ICE_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
index 1f6bf5fc8e..d42f41461f 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
@@ -933,7 +933,7 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->ice_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = (void *)txq->sw_ring;
 	txep += tx_id;
 
@@ -955,7 +955,7 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = txq->ice_tx_ring;
+		txdp = txq->ci_tx_ring;
 		txep = (void *)txq->sw_ring;
 	}
 
@@ -965,7 +965,7 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->ice_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS) <<
 					 ICE_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_common.h b/drivers/net/intel/ice/ice_rxtx_vec_common.h
index ff46a8fb49..8ba591e403 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_common.h
+++ b/drivers/net/intel/ice/ice_rxtx_vec_common.h
@@ -11,7 +11,7 @@
 static inline int
 ice_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 {
-	return (txq->ice_tx_ring[idx].cmd_type_offset_bsz &
+	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
 			rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M)) ==
 				rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE);
 }
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index be3c1ef216..51074bda3a 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -266,11 +266,11 @@ idpf_qc_single_tx_queue_reset(struct ci_tx_queue *txq)
 	txe = txq->sw_ring;
 	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
-		((volatile char *)txq->idpf_tx_ring)[i] = 0;
+		((volatile char *)txq->ci_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		txq->idpf_tx_ring[i].cmd_type_offset_bsz =
+		txq->ci_tx_ring[i].cmd_type_offset_bsz =
 			rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
@@ -1335,7 +1335,7 @@ idpf_xmit_cleanup(struct ci_tx_queue *txq)
 	uint16_t desc_to_clean_to;
 	uint16_t nb_tx_to_clean;
 
-	volatile struct ci_tx_desc *txd = txq->idpf_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
 
 	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
 	if (desc_to_clean_to >= nb_tx_desc)
@@ -1398,7 +1398,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		return nb_tx;
 
 	sw_ring = txq->sw_ring;
-	txr = txq->idpf_tx_ring;
+	txr = txq->ci_tx_ring;
 	tx_id = txq->tx_tail;
 	txe = &sw_ring[tx_id];
 
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c b/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
index 5f5d538dcb..04efee3722 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
@@ -573,7 +573,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->idpf_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = &txq->sw_ring_vec[tx_id];
 
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
@@ -594,7 +594,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->idpf_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = &txq->sw_ring_vec[tx_id];
 	}
 
@@ -604,7 +604,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->idpf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)IDPF_TX_DESC_CMD_RS) <<
 					 IDPF_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
index c1ec3d1222..d5e5a2ca5f 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
@@ -1090,7 +1090,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pk
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->idpf_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = (void *)txq->sw_ring;
 	txep += tx_id;
 
@@ -1112,7 +1112,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pk
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->idpf_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = (void *)txq->sw_ring;
 		txep += tx_id;
 	}
@@ -1123,7 +1123,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pk
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->idpf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)IDPF_TX_DESC_CMD_RS) <<
 					 IDPF_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/idpf/idpf_rxtx.c b/drivers/net/intel/idpf/idpf_rxtx.c
index 9b63e44341..e974eb44b0 100644
--- a/drivers/net/intel/idpf/idpf_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_rxtx.c
@@ -469,7 +469,7 @@ idpf_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 	}
 
 	if (!is_splitq) {
-		txq->idpf_tx_ring = mz->addr;
+		txq->ci_tx_ring = mz->addr;
 		idpf_qc_single_tx_queue_reset(txq);
 	} else {
 		txq->desc_ring = mz->addr;
diff --git a/drivers/net/intel/idpf/idpf_rxtx_vec_common.h b/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
index 4702061484..b5e8574667 100644
--- a/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
+++ b/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
@@ -31,7 +31,7 @@ idpf_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 	if (txq->complq != NULL)
 		return 1;
 
-	return (txq->idpf_tx_ring[idx].cmd_type_offset_bsz &
+	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
 			rte_cpu_to_le_64(IDPF_TXD_QW1_DTYPE_M)) ==
 				rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE);
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 03/36] net/intel: create common post-Tx cleanup function
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 01/36] net/intel: create common Tx descriptor structure Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 02/36] net/intel: use common Tx ring structure Bruce Richardson
@ 2026-01-13 15:14   ` Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 04/36] net/intel: consolidate definitions for Tx desc fields Bruce Richardson
                     ` (34 subsequent siblings)
  37 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-13 15:14 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Vladimir Medvedkin, Anatoly Burakov,
	Jingjing Wu, Praveen Shetty

The code used in ice, iavf, idpf and i40e for doing cleanup of mbufs
after they had been transmitted was identical. Therefore deduplicate it
by moving to common and remove the driver-specific versions.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h             | 53 ++++++++++++++++++++
 drivers/net/intel/i40e/i40e_rxtx.c        | 49 ++----------------
 drivers/net/intel/iavf/iavf_rxtx.c        | 50 ++-----------------
 drivers/net/intel/ice/ice_rxtx.c          | 60 ++---------------------
 drivers/net/intel/idpf/idpf_common_rxtx.c | 46 ++---------------
 5 files changed, 71 insertions(+), 187 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index a9ff3bebd5..5b87c15da0 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -249,6 +249,59 @@ ci_tx_free_bufs_vec(struct ci_tx_queue *txq, ci_desc_done_fn desc_done, bool ctx
 	return txq->tx_rs_thresh;
 }
 
+/*
+ * Common transmit descriptor cleanup function for Intel drivers.
+ * Used by ice, i40e, iavf, and idpf drivers.
+ *
+ * Returns:
+ *   0 on success
+ *  -1 if cleanup cannot proceed (descriptors not yet processed by HW)
+ */
+static __rte_always_inline int
+ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
+{
+	struct ci_tx_entry *sw_ring = txq->sw_ring;
+	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
+	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
+	uint16_t nb_tx_desc = txq->nb_tx_desc;
+	uint16_t desc_to_clean_to;
+	uint16_t nb_tx_to_clean;
+
+	/* Determine the last descriptor needing to be cleaned */
+	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
+	if (desc_to_clean_to >= nb_tx_desc)
+		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
+
+	/* Check to make sure the last descriptor to clean is done */
+	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
+
+	/* Check if descriptor is done - all drivers use 0xF as done value in bits 3:0 */
+	if ((txd[desc_to_clean_to].cmd_type_offset_bsz & rte_cpu_to_le_64(0xFUL)) !=
+			rte_cpu_to_le_64(0xFUL)) {
+		/* Descriptor not yet processed by hardware */
+		return -1;
+	}
+
+	/* Figure out how many descriptors will be cleaned */
+	if (last_desc_cleaned > desc_to_clean_to)
+		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) + desc_to_clean_to);
+	else
+		nb_tx_to_clean = (uint16_t)(desc_to_clean_to - last_desc_cleaned);
+
+	/* The last descriptor to clean is done, so that means all the
+	 * descriptors from the last descriptor that was cleaned
+	 * up to the last descriptor with the RS bit set
+	 * are done. Only reset the threshold descriptor.
+	 */
+	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
+
+	/* Update the txq to reflect the last descriptor that was cleaned */
+	txq->last_desc_cleaned = desc_to_clean_to;
+	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
+
+	return 0;
+}
+
 static inline void
 ci_txq_release_all_mbufs(struct ci_tx_queue *txq, bool use_ctx)
 {
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 51e9c1a1f0..d4efd0ab64 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -384,45 +384,6 @@ i40e_build_ctob(uint32_t td_cmd,
 			((uint64_t)td_tag  << I40E_TXD_QW1_L2TAG1_SHIFT));
 }
 
-static inline int
-i40e_xmit_cleanup(struct ci_tx_queue *txq)
-{
-	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
-	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
-	uint16_t nb_tx_desc = txq->nb_tx_desc;
-	uint16_t desc_to_clean_to;
-	uint16_t nb_tx_to_clean;
-
-	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
-	if (desc_to_clean_to >= nb_tx_desc)
-		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
-
-	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-	if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) !=
-			rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE)) {
-		PMD_TX_LOG(DEBUG, "TX descriptor %4u is not done "
-			   "(port=%d queue=%d)", desc_to_clean_to,
-			   txq->port_id, txq->queue_id);
-		return -1;
-	}
-
-	if (last_desc_cleaned > desc_to_clean_to)
-		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) +
-							desc_to_clean_to);
-	else
-		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
-					last_desc_cleaned);
-
-	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
-
-	txq->last_desc_cleaned = desc_to_clean_to;
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
-
-	return 0;
-}
-
 static inline int
 #ifdef RTE_LIBRTE_I40E_RX_ALLOW_BULK_ALLOC
 check_rx_burst_bulk_alloc_preconditions(struct ci_rx_queue *rxq)
@@ -1118,7 +1079,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 	/* Check if the descriptor ring needs to be cleaned. */
 	if (txq->nb_tx_free < txq->tx_free_thresh)
-		(void)i40e_xmit_cleanup(txq);
+		(void)ci_tx_xmit_cleanup(txq);
 
 	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
 		td_cmd = 0;
@@ -1159,14 +1120,14 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
 
 		if (nb_used > txq->nb_tx_free) {
-			if (i40e_xmit_cleanup(txq) != 0) {
+			if (ci_tx_xmit_cleanup(txq) != 0) {
 				if (nb_tx == 0)
 					return 0;
 				goto end_of_tx;
 			}
 			if (unlikely(nb_used > txq->tx_rs_thresh)) {
 				while (nb_used > txq->nb_tx_free) {
-					if (i40e_xmit_cleanup(txq) != 0) {
+					if (ci_tx_xmit_cleanup(txq) != 0) {
 						if (nb_tx == 0)
 							return 0;
 						goto end_of_tx;
@@ -2791,7 +2752,7 @@ i40e_tx_done_cleanup_full(struct ci_tx_queue *txq,
 	tx_last = txq->tx_tail;
 	tx_id  = swr_ring[tx_last].next_id;
 
-	if (txq->nb_tx_free == 0 && i40e_xmit_cleanup(txq))
+	if (txq->nb_tx_free == 0 && ci_tx_xmit_cleanup(txq))
 		return 0;
 
 	nb_tx_to_clean = txq->nb_tx_free;
@@ -2825,7 +2786,7 @@ i40e_tx_done_cleanup_full(struct ci_tx_queue *txq,
 			break;
 
 		if (pkt_cnt < free_cnt) {
-			if (i40e_xmit_cleanup(txq))
+			if (ci_tx_xmit_cleanup(txq))
 				break;
 
 			nb_tx_to_clean = txq->nb_tx_free - nb_tx_free_last;
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index ea4aaf08a3..3b7773f483 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -2329,46 +2329,6 @@ iavf_recv_pkts_bulk_alloc(void *rx_queue,
 	return nb_rx;
 }
 
-static inline int
-iavf_xmit_cleanup(struct ci_tx_queue *txq)
-{
-	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
-	uint16_t nb_tx_desc = txq->nb_tx_desc;
-	uint16_t desc_to_clean_to;
-	uint16_t nb_tx_to_clean;
-
-	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
-
-	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
-	if (desc_to_clean_to >= nb_tx_desc)
-		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
-
-	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-	if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(IAVF_TXD_QW1_DTYPE_MASK)) !=
-			rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE)) {
-		PMD_TX_LOG(DEBUG, "TX descriptor %4u is not done "
-			   "(port=%d queue=%d)", desc_to_clean_to,
-			   txq->port_id, txq->queue_id);
-		return -1;
-	}
-
-	if (last_desc_cleaned > desc_to_clean_to)
-		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) +
-							desc_to_clean_to);
-	else
-		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
-					last_desc_cleaned);
-
-	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
-
-	txq->last_desc_cleaned = desc_to_clean_to;
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
-
-	return 0;
-}
-
 /* Check if the context descriptor is needed for TX offloading */
 static inline uint16_t
 iavf_calc_context_desc(struct rte_mbuf *mb, uint8_t vlan_flag)
@@ -2773,7 +2733,7 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 	/* Check if the descriptor ring needs to be cleaned. */
 	if (txq->nb_tx_free < txq->tx_free_thresh)
-		iavf_xmit_cleanup(txq);
+		ci_tx_xmit_cleanup(txq);
 
 	desc_idx = txq->tx_tail;
 	txe = &txe_ring[desc_idx];
@@ -2828,14 +2788,14 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			txq->port_id, txq->queue_id, desc_idx, desc_idx_last);
 
 		if (nb_desc_required > txq->nb_tx_free) {
-			if (iavf_xmit_cleanup(txq)) {
+			if (ci_tx_xmit_cleanup(txq)) {
 				if (idx == 0)
 					return 0;
 				goto end_of_tx;
 			}
 			if (unlikely(nb_desc_required > txq->tx_rs_thresh)) {
 				while (nb_desc_required > txq->nb_tx_free) {
-					if (iavf_xmit_cleanup(txq)) {
+					if (ci_tx_xmit_cleanup(txq)) {
 						if (idx == 0)
 							return 0;
 						goto end_of_tx;
@@ -4305,7 +4265,7 @@ iavf_tx_done_cleanup_full(struct ci_tx_queue *txq,
 	tx_id = txq->tx_tail;
 	tx_last = tx_id;
 
-	if (txq->nb_tx_free == 0 && iavf_xmit_cleanup(txq))
+	if (txq->nb_tx_free == 0 && ci_tx_xmit_cleanup(txq))
 		return 0;
 
 	nb_tx_to_clean = txq->nb_tx_free;
@@ -4337,7 +4297,7 @@ iavf_tx_done_cleanup_full(struct ci_tx_queue *txq,
 			break;
 
 		if (pkt_cnt < free_cnt) {
-			if (iavf_xmit_cleanup(txq))
+			if (ci_tx_xmit_cleanup(txq))
 				break;
 
 			nb_tx_to_clean = txq->nb_tx_free - nb_tx_free_last;
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index b1e67c2c67..12f05ae446 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3011,56 +3011,6 @@ ice_txd_enable_checksum(uint64_t ol_flags,
 	}
 }
 
-static inline int
-ice_xmit_cleanup(struct ci_tx_queue *txq)
-{
-	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
-	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
-	uint16_t nb_tx_desc = txq->nb_tx_desc;
-	uint16_t desc_to_clean_to;
-	uint16_t nb_tx_to_clean;
-
-	/* Determine the last descriptor needing to be cleaned */
-	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
-	if (desc_to_clean_to >= nb_tx_desc)
-		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
-
-	/* Check to make sure the last descriptor to clean is done */
-	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-	if (!(txd[desc_to_clean_to].cmd_type_offset_bsz &
-	    rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE))) {
-		PMD_TX_LOG(DEBUG, "TX descriptor %4u is not done "
-			   "(port=%d queue=%d) value=0x%"PRIx64,
-			   desc_to_clean_to,
-			   txq->port_id, txq->queue_id,
-			   txd[desc_to_clean_to].cmd_type_offset_bsz);
-		/* Failed to clean any descriptors */
-		return -1;
-	}
-
-	/* Figure out how many descriptors will be cleaned */
-	if (last_desc_cleaned > desc_to_clean_to)
-		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) +
-					    desc_to_clean_to);
-	else
-		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
-					    last_desc_cleaned);
-
-	/* The last descriptor to clean is done, so that means all the
-	 * descriptors from the last descriptor that was cleaned
-	 * up to the last descriptor with the RS bit set
-	 * are done. Only reset the threshold descriptor.
-	 */
-	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
-
-	/* Update the txq to reflect the last descriptor that was cleaned */
-	txq->last_desc_cleaned = desc_to_clean_to;
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
-
-	return 0;
-}
-
 /* Construct the tx flags */
 static inline uint64_t
 ice_build_ctob(uint32_t td_cmd,
@@ -3168,7 +3118,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 	/* Check if the descriptor ring needs to be cleaned. */
 	if (txq->nb_tx_free < txq->tx_free_thresh)
-		(void)ice_xmit_cleanup(txq);
+		(void)ci_tx_xmit_cleanup(txq);
 
 	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
 		tx_pkt = *tx_pkts++;
@@ -3205,14 +3155,14 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
 
 		if (nb_used > txq->nb_tx_free) {
-			if (ice_xmit_cleanup(txq) != 0) {
+			if (ci_tx_xmit_cleanup(txq) != 0) {
 				if (nb_tx == 0)
 					return 0;
 				goto end_of_tx;
 			}
 			if (unlikely(nb_used > txq->tx_rs_thresh)) {
 				while (nb_used > txq->nb_tx_free) {
-					if (ice_xmit_cleanup(txq) != 0) {
+					if (ci_tx_xmit_cleanup(txq) != 0) {
 						if (nb_tx == 0)
 							return 0;
 						goto end_of_tx;
@@ -3442,7 +3392,7 @@ ice_tx_done_cleanup_full(struct ci_tx_queue *txq,
 	tx_last = txq->tx_tail;
 	tx_id  = swr_ring[tx_last].next_id;
 
-	if (txq->nb_tx_free == 0 && ice_xmit_cleanup(txq))
+	if (txq->nb_tx_free == 0 && ci_tx_xmit_cleanup(txq))
 		return 0;
 
 	nb_tx_to_clean = txq->nb_tx_free;
@@ -3476,7 +3426,7 @@ ice_tx_done_cleanup_full(struct ci_tx_queue *txq,
 			break;
 
 		if (pkt_cnt < free_cnt) {
-			if (ice_xmit_cleanup(txq))
+			if (ci_tx_xmit_cleanup(txq))
 				break;
 
 			nb_tx_to_clean = txq->nb_tx_free - nb_tx_free_last;
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 51074bda3a..23666539ab 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -1326,46 +1326,6 @@ idpf_dp_singleq_recv_scatter_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 	return nb_rx;
 }
 
-static inline int
-idpf_xmit_cleanup(struct ci_tx_queue *txq)
-{
-	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
-	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	uint16_t nb_tx_desc = txq->nb_tx_desc;
-	uint16_t desc_to_clean_to;
-	uint16_t nb_tx_to_clean;
-
-	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
-
-	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
-	if (desc_to_clean_to >= nb_tx_desc)
-		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
-
-	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-	if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
-	     rte_cpu_to_le_64(IDPF_TXD_QW1_DTYPE_M)) !=
-	    rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE)) {
-		TX_LOG(DEBUG, "TX descriptor %4u is not done "
-		       "(port=%d queue=%d)", desc_to_clean_to,
-		       txq->port_id, txq->queue_id);
-		return -1;
-	}
-
-	if (last_desc_cleaned > desc_to_clean_to)
-		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) +
-					    desc_to_clean_to);
-	else
-		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
-					    last_desc_cleaned);
-
-	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
-
-	txq->last_desc_cleaned = desc_to_clean_to;
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
-
-	return 0;
-}
-
 /* TX function */
 RTE_EXPORT_INTERNAL_SYMBOL(idpf_dp_singleq_xmit_pkts)
 uint16_t
@@ -1404,7 +1364,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	/* Check if the descriptor ring needs to be cleaned. */
 	if (txq->nb_tx_free < txq->tx_free_thresh)
-		(void)idpf_xmit_cleanup(txq);
+		(void)ci_tx_xmit_cleanup(txq);
 
 	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
 		td_cmd = 0;
@@ -1437,14 +1397,14 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		       txq->port_id, txq->queue_id, tx_id, tx_last);
 
 		if (nb_used > txq->nb_tx_free) {
-			if (idpf_xmit_cleanup(txq) != 0) {
+			if (ci_tx_xmit_cleanup(txq) != 0) {
 				if (nb_tx == 0)
 					return 0;
 				goto end_of_tx;
 			}
 			if (unlikely(nb_used > txq->tx_rs_thresh)) {
 				while (nb_used > txq->nb_tx_free) {
-					if (idpf_xmit_cleanup(txq) != 0) {
+					if (ci_tx_xmit_cleanup(txq) != 0) {
 						if (nb_tx == 0)
 							return 0;
 						goto end_of_tx;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 04/36] net/intel: consolidate definitions for Tx desc fields
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (2 preceding siblings ...)
  2026-01-13 15:14   ` [PATCH v2 03/36] net/intel: create common post-Tx cleanup function Bruce Richardson
@ 2026-01-13 15:14   ` Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 05/36] net/intel: create separate header for Tx scalar fns Bruce Richardson
                     ` (33 subsequent siblings)
  37 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-13 15:14 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Vladimir Medvedkin, Anatoly Burakov,
	Jingjing Wu, Praveen Shetty

The offsets of the various fields within the Tx descriptors are common
for i40e, iavf, ice and idpf, so put a single set of defines in tx.h and
use those throughout all drivers. (NOTE: there was a small difference in
mask of CMD field between drivers depending on whether reserved fields
or not were included. Those can be ignored as those bits are unused in
the drivers for which they are reserved). Similarly, the various flag
fields, such as End-of-packet (EOP) and Report-status (RS) are the same,
as are offload definitions so consolidate them.

Original definitions are in base code, and are left in place because of
that, but are unused.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h                 |  64 +++++++-
 drivers/net/intel/i40e/i40e_fdir.c            |  24 +--
 drivers/net/intel/i40e/i40e_rxtx.c            |  92 ++++++------
 drivers/net/intel/i40e/i40e_rxtx.h            |  17 +--
 .../net/intel/i40e/i40e_rxtx_vec_altivec.c    |  11 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c   |  22 ++-
 drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c |  38 ++---
 drivers/net/intel/i40e/i40e_rxtx_vec_common.h |   4 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_neon.c   |  11 +-
 drivers/net/intel/iavf/iavf_rxtx.c            |  68 +++++----
 drivers/net/intel/iavf/iavf_rxtx.h            |  20 +--
 drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c   |  41 ++----
 drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c |  80 ++++------
 drivers/net/intel/iavf/iavf_rxtx_vec_common.h |  34 ++---
 drivers/net/intel/ice/ice_dcf_ethdev.c        |   2 +-
 drivers/net/intel/ice/ice_rxtx.c              | 137 ++++++++----------
 drivers/net/intel/ice/ice_rxtx.h              |  15 +-
 drivers/net/intel/ice/ice_rxtx_vec_avx2.c     |  41 ++----
 drivers/net/intel/ice/ice_rxtx_vec_avx512.c   |  39 ++---
 drivers/net/intel/ice/ice_rxtx_vec_common.h   |  41 +++---
 drivers/net/intel/idpf/idpf_common_rxtx.c     |  22 +--
 drivers/net/intel/idpf/idpf_common_rxtx.h     |  12 --
 .../net/intel/idpf/idpf_common_rxtx_avx2.c    |  41 ++----
 .../net/intel/idpf/idpf_common_rxtx_avx512.c  |  41 ++----
 drivers/net/intel/idpf/idpf_rxtx_vec_common.h |   4 +-
 25 files changed, 408 insertions(+), 513 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 5b87c15da0..3d3d9ad8e3 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -10,6 +10,66 @@
 #include <rte_ethdev.h>
 #include <rte_vect.h>
 
+/* Common TX Descriptor QW1 Field Definitions */
+#define CI_TXD_QW1_DTYPE_S      0
+#define CI_TXD_QW1_DTYPE_M      (0xFUL << CI_TXD_QW1_DTYPE_S)
+#define CI_TXD_QW1_CMD_S        4
+#define CI_TXD_QW1_CMD_M        (0xFFFUL << CI_TXD_QW1_CMD_S)
+#define CI_TXD_QW1_OFFSET_S     16
+#define CI_TXD_QW1_OFFSET_M     (0x3FFFFULL << CI_TXD_QW1_OFFSET_S)
+#define CI_TXD_QW1_TX_BUF_SZ_S  34
+#define CI_TXD_QW1_TX_BUF_SZ_M  (0x3FFFULL << CI_TXD_QW1_TX_BUF_SZ_S)
+#define CI_TXD_QW1_L2TAG1_S     48
+#define CI_TXD_QW1_L2TAG1_M     (0xFFFFULL << CI_TXD_QW1_L2TAG1_S)
+
+/* Common Descriptor Types */
+#define CI_TX_DESC_DTYPE_DATA           0x0
+#define CI_TX_DESC_DTYPE_CTX            0x1
+#define CI_TX_DESC_DTYPE_DESC_DONE      0xF
+
+/* Common TX Descriptor Command Flags */
+#define CI_TX_DESC_CMD_EOP              0x0001
+#define CI_TX_DESC_CMD_RS               0x0002
+#define CI_TX_DESC_CMD_ICRC             0x0004
+#define CI_TX_DESC_CMD_IL2TAG1          0x0008
+#define CI_TX_DESC_CMD_DUMMY            0x0010
+#define CI_TX_DESC_CMD_IIPT_IPV6        0x0020
+#define CI_TX_DESC_CMD_IIPT_IPV4        0x0040
+#define CI_TX_DESC_CMD_IIPT_IPV4_CSUM   0x0060
+#define CI_TX_DESC_CMD_L4T_EOFT_TCP     0x0100
+#define CI_TX_DESC_CMD_L4T_EOFT_SCTP    0x0200
+#define CI_TX_DESC_CMD_L4T_EOFT_UDP     0x0300
+
+/* Common TX Context Descriptor Commands */
+#define CI_TX_CTX_DESC_TSO              0x01
+#define CI_TX_CTX_DESC_TSYN             0x02
+#define CI_TX_CTX_DESC_IL2TAG2          0x04
+
+/* Common TX Descriptor Length Field Shifts */
+#define CI_TX_DESC_LEN_MACLEN_S         0  /* 7 BITS */
+#define CI_TX_DESC_LEN_IPLEN_S          7  /* 7 BITS */
+#define CI_TX_DESC_LEN_L4_LEN_S         14 /* 4 BITS */
+
+/* Common maximum data per TX descriptor */
+#define CI_MAX_DATA_PER_TXD     (CI_TXD_QW1_TX_BUF_SZ_M >> CI_TXD_QW1_TX_BUF_SZ_S)
+
+/**
+ * Common TX offload union for Intel drivers.
+ * Supports both basic offloads (l2_len, l3_len, l4_len, tso_segsz) and
+ * extended offloads (outer_l2_len, outer_l3_len) for tunneling support.
+ */
+union ci_tx_offload {
+	uint64_t data;
+	struct {
+		uint64_t l2_len:7;        /**< L2 (MAC) Header Length. */
+		uint64_t l3_len:9;        /**< L3 (IP) Header Length. */
+		uint64_t l4_len:8;        /**< L4 Header Length. */
+		uint64_t tso_segsz:16;    /**< TCP TSO segment size */
+		uint64_t outer_l2_len:8;  /**< outer L2 Header Length */
+		uint64_t outer_l3_len:16; /**< outer L3 Header Length */
+	};
+};
+
 /*
  * Structure of a 16-byte Tx descriptor common across i40e, ice, iavf and idpf drivers
  */
@@ -276,8 +336,8 @@ ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
 	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
 
 	/* Check if descriptor is done - all drivers use 0xF as done value in bits 3:0 */
-	if ((txd[desc_to_clean_to].cmd_type_offset_bsz & rte_cpu_to_le_64(0xFUL)) !=
-			rte_cpu_to_le_64(0xFUL)) {
+	if ((txd[desc_to_clean_to].cmd_type_offset_bsz & rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE)) {
 		/* Descriptor not yet processed by hardware */
 		return -1;
 	}
diff --git a/drivers/net/intel/i40e/i40e_fdir.c b/drivers/net/intel/i40e/i40e_fdir.c
index 8a01aec0e2..3b099d5a9e 100644
--- a/drivers/net/intel/i40e/i40e_fdir.c
+++ b/drivers/net/intel/i40e/i40e_fdir.c
@@ -916,11 +916,11 @@ i40e_build_ctob(uint32_t td_cmd,
 		unsigned int size,
 		uint32_t td_tag)
 {
-	return rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DATA |
-			((uint64_t)td_cmd  << I40E_TXD_QW1_CMD_SHIFT) |
-			((uint64_t)td_offset << I40E_TXD_QW1_OFFSET_SHIFT) |
-			((uint64_t)size  << I40E_TXD_QW1_TX_BUF_SZ_SHIFT) |
-			((uint64_t)td_tag  << I40E_TXD_QW1_L2TAG1_SHIFT));
+	return rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+			((uint64_t)td_cmd  << CI_TXD_QW1_CMD_S) |
+			((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+			((uint64_t)size  << CI_TXD_QW1_TX_BUF_SZ_S) |
+			((uint64_t)td_tag  << CI_TXD_QW1_L2TAG1_S));
 }
 
 /*
@@ -1384,8 +1384,8 @@ i40e_find_available_buffer(struct rte_eth_dev *dev)
 
 		do {
 			if ((tmp_txdp->cmd_type_offset_bsz &
-					rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) ==
-					rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE))
+					rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+					rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
 				fdir_info->txq_available_buf_count++;
 			else
 				break;
@@ -1710,9 +1710,9 @@ i40e_flow_fdir_filter_programming(struct i40e_pf *pf,
 	txdp = &txq->ci_tx_ring[txq->tx_tail + 1];
 	txdp->buffer_addr = rte_cpu_to_le_64(pf->fdir.dma_addr[txq->tx_tail >> 1]);
 
-	td_cmd = I40E_TX_DESC_CMD_EOP |
-		 I40E_TX_DESC_CMD_RS  |
-		 I40E_TX_DESC_CMD_DUMMY;
+	td_cmd = CI_TX_DESC_CMD_EOP |
+		 CI_TX_DESC_CMD_RS  |
+		 CI_TX_DESC_CMD_DUMMY;
 
 	txdp->cmd_type_offset_bsz =
 		i40e_build_ctob(td_cmd, 0, I40E_FDIR_PKT_LEN, 0);
@@ -1731,8 +1731,8 @@ i40e_flow_fdir_filter_programming(struct i40e_pf *pf,
 	if (wait_status) {
 		for (i = 0; i < I40E_FDIR_MAX_WAIT_US; i++) {
 			if ((txdp->cmd_type_offset_bsz &
-					rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) ==
-					rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE))
+					rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+					rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
 				break;
 			rte_delay_us(1);
 		}
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index d4efd0ab64..701129eba3 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -45,7 +45,7 @@
 /* Base address of the HW descriptor ring should be 128B aligned. */
 #define I40E_RING_BASE_ALIGN	128
 
-#define I40E_TXD_CMD (I40E_TX_DESC_CMD_EOP | I40E_TX_DESC_CMD_RS)
+#define I40E_TXD_CMD (CI_TX_DESC_CMD_EOP | CI_TX_DESC_CMD_RS)
 
 #ifdef RTE_LIBRTE_IEEE1588
 #define I40E_TX_IEEE1588_TMST RTE_MBUF_F_TX_IEEE1588_TMST
@@ -260,7 +260,7 @@ i40e_rxd_build_fdir(volatile union ci_rx_desc *rxdp, struct rte_mbuf *mb)
 
 static inline void
 i40e_parse_tunneling_params(uint64_t ol_flags,
-			    union i40e_tx_offload tx_offload,
+			    union ci_tx_offload tx_offload,
 			    uint32_t *cd_tunneling)
 {
 	/* EIPT: External (outer) IP header type */
@@ -319,51 +319,51 @@ static inline void
 i40e_txd_enable_checksum(uint64_t ol_flags,
 			uint32_t *td_cmd,
 			uint32_t *td_offset,
-			union i40e_tx_offload tx_offload)
+			union ci_tx_offload tx_offload)
 {
 	/* Set MACLEN */
 	if (!(ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK))
 		*td_offset |= (tx_offload.l2_len >> 1)
-			<< I40E_TX_DESC_LENGTH_MACLEN_SHIFT;
+			<< CI_TX_DESC_LEN_MACLEN_S;
 
 	/* Enable L3 checksum offloads */
 	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
-		*td_cmd |= I40E_TX_DESC_CMD_IIPT_IPV4_CSUM;
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
 		*td_offset |= (tx_offload.l3_len >> 2)
-				<< I40E_TX_DESC_LENGTH_IPLEN_SHIFT;
+				<< CI_TX_DESC_LEN_IPLEN_S;
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
-		*td_cmd |= I40E_TX_DESC_CMD_IIPT_IPV4;
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
 		*td_offset |= (tx_offload.l3_len >> 2)
-				<< I40E_TX_DESC_LENGTH_IPLEN_SHIFT;
+				<< CI_TX_DESC_LEN_IPLEN_S;
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
-		*td_cmd |= I40E_TX_DESC_CMD_IIPT_IPV6;
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
 		*td_offset |= (tx_offload.l3_len >> 2)
-				<< I40E_TX_DESC_LENGTH_IPLEN_SHIFT;
+				<< CI_TX_DESC_LEN_IPLEN_S;
 	}
 
 	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
-		*td_cmd |= I40E_TX_DESC_CMD_L4T_EOFT_TCP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
 		*td_offset |= (tx_offload.l4_len >> 2)
-			<< I40E_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+			<< CI_TX_DESC_LEN_L4_LEN_S;
 		return;
 	}
 
 	/* Enable L4 checksum offloads */
 	switch (ol_flags & RTE_MBUF_F_TX_L4_MASK) {
 	case RTE_MBUF_F_TX_TCP_CKSUM:
-		*td_cmd |= I40E_TX_DESC_CMD_L4T_EOFT_TCP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
 		*td_offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
-				I40E_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+				CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_SCTP_CKSUM:
-		*td_cmd |= I40E_TX_DESC_CMD_L4T_EOFT_SCTP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
 		*td_offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
-				I40E_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+				CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_UDP_CKSUM:
-		*td_cmd |= I40E_TX_DESC_CMD_L4T_EOFT_UDP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
 		*td_offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
-				I40E_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+				CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	default:
 		break;
@@ -377,11 +377,11 @@ i40e_build_ctob(uint32_t td_cmd,
 		unsigned int size,
 		uint32_t td_tag)
 {
-	return rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DATA |
-			((uint64_t)td_cmd  << I40E_TXD_QW1_CMD_SHIFT) |
-			((uint64_t)td_offset << I40E_TXD_QW1_OFFSET_SHIFT) |
-			((uint64_t)size  << I40E_TXD_QW1_TX_BUF_SZ_SHIFT) |
-			((uint64_t)td_tag  << I40E_TXD_QW1_L2TAG1_SHIFT));
+	return rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+			((uint64_t)td_cmd  << CI_TXD_QW1_CMD_S) |
+			((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+			((uint64_t)size  << CI_TXD_QW1_TX_BUF_SZ_S) |
+			((uint64_t)td_tag  << CI_TXD_QW1_L2TAG1_S));
 }
 
 static inline int
@@ -1004,7 +1004,7 @@ i40e_calc_context_desc(uint64_t flags)
 
 /* set i40e TSO context descriptor */
 static inline uint64_t
-i40e_set_tso_ctx(struct rte_mbuf *mbuf, union i40e_tx_offload tx_offload)
+i40e_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 {
 	uint64_t ctx_desc = 0;
 	uint32_t cd_cmd, hdr_len, cd_tso_len;
@@ -1029,9 +1029,6 @@ i40e_set_tso_ctx(struct rte_mbuf *mbuf, union i40e_tx_offload tx_offload)
 	return ctx_desc;
 }
 
-/* HW requires that Tx buffer size ranges from 1B up to (16K-1)B. */
-#define I40E_MAX_DATA_PER_TXD \
-	(I40E_TXD_QW1_TX_BUF_SZ_MASK >> I40E_TXD_QW1_TX_BUF_SZ_SHIFT)
 /* Calculate the number of TX descriptors needed for each pkt */
 static inline uint16_t
 i40e_calc_pkt_desc(struct rte_mbuf *tx_pkt)
@@ -1040,7 +1037,7 @@ i40e_calc_pkt_desc(struct rte_mbuf *tx_pkt)
 	uint16_t count = 0;
 
 	while (txd != NULL) {
-		count += DIV_ROUND_UP(txd->data_len, I40E_MAX_DATA_PER_TXD);
+		count += DIV_ROUND_UP(txd->data_len, CI_MAX_DATA_PER_TXD);
 		txd = txd->next;
 	}
 
@@ -1069,7 +1066,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	uint16_t tx_last;
 	uint16_t slen;
 	uint64_t buf_dma_addr;
-	union i40e_tx_offload tx_offload = {0};
+	union ci_tx_offload tx_offload = {0};
 
 	txq = tx_queue;
 	sw_ring = txq->sw_ring;
@@ -1138,18 +1135,18 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		/* Descriptor based VLAN insertion */
 		if (ol_flags & (RTE_MBUF_F_TX_VLAN | RTE_MBUF_F_TX_QINQ)) {
-			td_cmd |= I40E_TX_DESC_CMD_IL2TAG1;
+			td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
 			td_tag = tx_pkt->vlan_tci;
 		}
 
 		/* Always enable CRC offload insertion */
-		td_cmd |= I40E_TX_DESC_CMD_ICRC;
+		td_cmd |= CI_TX_DESC_CMD_ICRC;
 
 		/* Fill in tunneling parameters if necessary */
 		cd_tunneling_params = 0;
 		if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) {
 			td_offset |= (tx_offload.outer_l2_len >> 1)
-					<< I40E_TX_DESC_LENGTH_MACLEN_SHIFT;
+					<< CI_TX_DESC_LEN_MACLEN_S;
 			i40e_parse_tunneling_params(ol_flags, tx_offload,
 						    &cd_tunneling_params);
 		}
@@ -1229,16 +1226,16 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			buf_dma_addr = rte_mbuf_data_iova(m_seg);
 
 			while ((ol_flags & RTE_MBUF_F_TX_TCP_SEG) &&
-				unlikely(slen > I40E_MAX_DATA_PER_TXD)) {
+				unlikely(slen > CI_MAX_DATA_PER_TXD)) {
 				txd->buffer_addr =
 					rte_cpu_to_le_64(buf_dma_addr);
 				txd->cmd_type_offset_bsz =
 					i40e_build_ctob(td_cmd,
-					td_offset, I40E_MAX_DATA_PER_TXD,
+					td_offset, CI_MAX_DATA_PER_TXD,
 					td_tag);
 
-				buf_dma_addr += I40E_MAX_DATA_PER_TXD;
-				slen -= I40E_MAX_DATA_PER_TXD;
+				buf_dma_addr += CI_MAX_DATA_PER_TXD;
+				slen -= CI_MAX_DATA_PER_TXD;
 
 				txe->last_id = tx_last;
 				tx_id = txe->next_id;
@@ -1265,7 +1262,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		} while (m_seg != NULL);
 
 		/* The last packet data descriptor needs End Of Packet (EOP) */
-		td_cmd |= I40E_TX_DESC_CMD_EOP;
+		td_cmd |= CI_TX_DESC_CMD_EOP;
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
@@ -1275,15 +1272,14 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 				   "%4u (port=%d queue=%d)",
 				   tx_last, txq->port_id, txq->queue_id);
 
-			td_cmd |= I40E_TX_DESC_CMD_RS;
+			td_cmd |= CI_TX_DESC_CMD_RS;
 
 			/* Update txq RS bit counters */
 			txq->nb_tx_used = 0;
 		}
 
 		txd->cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)td_cmd) <<
-					I40E_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S);
 	}
 
 end_of_tx:
@@ -1309,8 +1305,8 @@ i40e_tx_free_bufs(struct ci_tx_queue *txq)
 	const uint16_t m = tx_rs_thresh % I40E_TX_MAX_FREE_BUF_SZ;
 
 	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) !=
-			rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE))
+			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
 		return 0;
 
 	txep = &txq->sw_ring[txq->tx_next_dd - (tx_rs_thresh - 1)];
@@ -1436,8 +1432,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 		n = (uint16_t)(txq->nb_tx_desc - txq->tx_tail);
 		i40e_tx_fill_hw_ring(txq, tx_pkts, n);
 		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
-						I40E_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 		txq->tx_tail = 0;
 	}
@@ -1449,8 +1444,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	/* Determine if RS bit needs to be set */
 	if (txq->tx_tail > txq->tx_next_rs) {
 		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
-						I40E_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 		if (txq->tx_next_rs >= txq->nb_tx_desc)
@@ -2368,9 +2362,9 @@ i40e_dev_tx_descriptor_status(void *tx_queue, uint16_t offset)
 	}
 
 	status = &txq->ci_tx_ring[desc].cmd_type_offset_bsz;
-	mask = rte_le_to_cpu_64(I40E_TXD_QW1_DTYPE_MASK);
+	mask = rte_le_to_cpu_64(CI_TXD_QW1_DTYPE_M);
 	expect = rte_cpu_to_le_64(
-		I40E_TX_DESC_DTYPE_DESC_DONE << I40E_TXD_QW1_DTYPE_SHIFT);
+		CI_TX_DESC_DTYPE_DESC_DONE << CI_TXD_QW1_DTYPE_S);
 	if ((*status & mask) == expect)
 		return RTE_ETH_TX_DESC_DONE;
 
@@ -2866,7 +2860,7 @@ i40e_reset_tx_queue(struct ci_tx_queue *txq)
 		volatile struct ci_tx_desc *txd = &txq->ci_tx_ring[i];
 
 		txd->cmd_type_offset_bsz =
-			rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
 		txe[prev].next_id = i;
diff --git a/drivers/net/intel/i40e/i40e_rxtx.h b/drivers/net/intel/i40e/i40e_rxtx.h
index ed173d8f17..307ffa3049 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.h
+++ b/drivers/net/intel/i40e/i40e_rxtx.h
@@ -47,8 +47,8 @@
 #define I40E_RX_DESC_EXT_STATUS_FLEXBL_MASK   0x03
 #define I40E_RX_DESC_EXT_STATUS_FLEXBL_FLEX   0x01
 
-#define I40E_TD_CMD (I40E_TX_DESC_CMD_ICRC |\
-		     I40E_TX_DESC_CMD_EOP)
+#define I40E_TD_CMD (CI_TX_DESC_CMD_ICRC |\
+		     CI_TX_DESC_CMD_EOP)
 
 enum i40e_header_split_mode {
 	i40e_header_split_none = 0,
@@ -110,19 +110,6 @@ enum i40e_header_split_mode {
 
 #define I40E_TX_VECTOR_OFFLOADS RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE
 
-/** Offload features */
-union i40e_tx_offload {
-	uint64_t data;
-	struct {
-		uint64_t l2_len:7; /**< L2 (MAC) Header Length. */
-		uint64_t l3_len:9; /**< L3 (IP) Header Length. */
-		uint64_t l4_len:8; /**< L4 Header Length. */
-		uint64_t tso_segsz:16; /**< TCP TSO segment size */
-		uint64_t outer_l2_len:8; /**< outer L2 Header Length */
-		uint64_t outer_l3_len:16; /**< outer L3 Header Length */
-	};
-};
-
 int i40e_dev_rx_queue_start(struct rte_eth_dev *dev, uint16_t rx_queue_id);
 int i40e_dev_rx_queue_stop(struct rte_eth_dev *dev, uint16_t rx_queue_id);
 int i40e_dev_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id);
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
index 81e9e2bc0b..4c36748d94 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
@@ -449,9 +449,9 @@ static inline void
 vtx1(volatile struct ci_tx_desc *txdp,
 	struct rte_mbuf *pkt, uint64_t flags)
 {
-	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
-		((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT) |
-		((uint64_t)pkt->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+		((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	__vector unsigned long descriptor = (__vector unsigned long){
 		pkt->buf_iova + pkt->data_off, high_qw};
@@ -477,7 +477,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
-	uint64_t rs = I40E_TX_DESC_CMD_RS | I40E_TD_CMD;
+	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
 	int i;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
@@ -520,8 +520,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
-						I40E_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
index f054bd41bf..502a1842c6 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
@@ -684,9 +684,9 @@ static inline void
 vtx1(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf *pkt, uint64_t flags)
 {
-	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT) |
-			((uint64_t)pkt->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+			((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+			((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	__m128i descriptor = _mm_set_epi64x(high_qw,
 				pkt->buf_iova + pkt->data_off);
@@ -697,8 +697,7 @@ static inline void
 vtx(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
-	const uint64_t hi_qw_tmpl = (I40E_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags << CI_TXD_QW1_CMD_S));
 
 	/* if unaligned on 32-bit boundary, do one to align */
 	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
@@ -709,13 +708,13 @@ vtx(volatile struct ci_tx_desc *txdp,
 	/* do two at a time while possible, in bursts */
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
 		uint64_t hi_qw3 = hi_qw_tmpl |
-				((uint64_t)pkt[3]->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
+				((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		uint64_t hi_qw2 = hi_qw_tmpl |
-				((uint64_t)pkt[2]->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
+				((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		uint64_t hi_qw1 = hi_qw_tmpl |
-				((uint64_t)pkt[1]->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
+				((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		uint64_t hi_qw0 = hi_qw_tmpl |
-				((uint64_t)pkt[0]->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
+				((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 
 		__m256i desc2_3 = _mm256_set_epi64x(
 				hi_qw3, pkt[3]->buf_iova + pkt[3]->data_off,
@@ -743,7 +742,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
-	uint64_t rs = I40E_TX_DESC_CMD_RS | I40E_TD_CMD;
+	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
 		ci_tx_free_bufs_vec(txq, i40e_tx_desc_done, false);
@@ -785,8 +784,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
-						I40E_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
index 9a967faeee..d48ff9f51e 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
@@ -752,9 +752,9 @@ i40e_recv_scattered_pkts_vec_avx512(void *rx_queue,
 static inline void
 vtx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags)
 {
-	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
-		((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT) |
-		((uint64_t)pkt->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+		((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	__m128i descriptor = _mm_set_epi64x(high_qw,
 				pkt->buf_iova + pkt->data_off);
@@ -765,26 +765,17 @@ static inline void
 vtx(volatile struct ci_tx_desc *txdp,
 	struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
-	const uint64_t hi_qw_tmpl = (I40E_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags << CI_TXD_QW1_CMD_S));
 
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
-		uint64_t hi_qw3 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[3]->data_len <<
-			 I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
-		uint64_t hi_qw2 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[2]->data_len <<
-			 I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
-		uint64_t hi_qw1 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[1]->data_len <<
-			 I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
-		uint64_t hi_qw0 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[0]->data_len <<
-			 I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
+		uint64_t hi_qw3 = hi_qw_tmpl |
+			((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw2 = hi_qw_tmpl |
+			((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw1 = hi_qw_tmpl |
+			((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw0 = hi_qw_tmpl |
+			((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 
 		__m512i desc0_3 =
 			_mm512_set_epi64
@@ -811,7 +802,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
-	uint64_t rs = I40E_TX_DESC_CMD_RS | I40E_TD_CMD;
+	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
 		ci_tx_free_bufs_vec(txq, i40e_tx_desc_done, false);
@@ -854,8 +845,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
-						I40E_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_common.h b/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
index 1fd7fc75bf..292a39501e 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
@@ -16,8 +16,8 @@ static inline int
 i40e_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 {
 	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) ==
-				rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+				rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 }
 
 static inline void
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
index 0b95152232..be4c64942e 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
@@ -600,9 +600,9 @@ static inline void
 vtx1(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf *pkt, uint64_t flags)
 {
-	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT) |
-			((uint64_t)pkt->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+			((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+			((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off, high_qw};
 	vst1q_u64(RTE_CAST_PTR(uint64_t *, txdp), descriptor);
@@ -627,7 +627,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict tx_queue,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
-	uint64_t rs = I40E_TX_DESC_CMD_RS | I40E_TD_CMD;
+	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
 	int i;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
@@ -669,8 +669,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict tx_queue,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
-						I40E_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index 3b7773f483..0bf15aae5e 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -288,7 +288,7 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
 		txq->ci_tx_ring[i].cmd_type_offset_bsz =
-			rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
 		txe[prev].next_id = i;
@@ -2356,12 +2356,12 @@ iavf_fill_ctx_desc_cmd_field(volatile uint64_t *field, struct rte_mbuf *m,
 
 	/* TSO enabled */
 	if (m->ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
-		cmd = IAVF_TX_CTX_DESC_TSO << IAVF_TXD_CTX_QW1_CMD_SHIFT;
+		cmd = CI_TX_CTX_DESC_TSO << IAVF_TXD_CTX_QW1_CMD_SHIFT;
 
 	if ((m->ol_flags & RTE_MBUF_F_TX_VLAN &&
 			vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2) ||
 			m->ol_flags & RTE_MBUF_F_TX_QINQ) {
-		cmd |= IAVF_TX_CTX_DESC_IL2TAG2
+		cmd |= CI_TX_CTX_DESC_IL2TAG2
 			<< IAVF_TXD_CTX_QW1_CMD_SHIFT;
 	}
 
@@ -2582,20 +2582,20 @@ iavf_build_data_desc_cmd_offset_fields(volatile uint64_t *qw1,
 	uint64_t offset = 0;
 	uint64_t l2tag1 = 0;
 
-	*qw1 = IAVF_TX_DESC_DTYPE_DATA;
+	*qw1 = CI_TX_DESC_DTYPE_DATA;
 
-	command = (uint64_t)IAVF_TX_DESC_CMD_ICRC;
+	command = (uint64_t)CI_TX_DESC_CMD_ICRC;
 
 	/* Descriptor based VLAN insertion */
 	if ((vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1) &&
 			m->ol_flags & RTE_MBUF_F_TX_VLAN) {
-		command |= (uint64_t)IAVF_TX_DESC_CMD_IL2TAG1;
+		command |= (uint64_t)CI_TX_DESC_CMD_IL2TAG1;
 		l2tag1 |= m->vlan_tci;
 	}
 
 	/* Descriptor based QinQ insertion. vlan_flag specifies outer tag location. */
 	if (m->ol_flags & RTE_MBUF_F_TX_QINQ) {
-		command |= (uint64_t)IAVF_TX_DESC_CMD_IL2TAG1;
+		command |= (uint64_t)CI_TX_DESC_CMD_IL2TAG1;
 		l2tag1 = vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1 ? m->vlan_tci_outer :
 									m->vlan_tci;
 	}
@@ -2608,32 +2608,32 @@ iavf_build_data_desc_cmd_offset_fields(volatile uint64_t *qw1,
 	if (m->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK &&
 			!(m->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD))
 		offset |= (m->outer_l2_len >> 1)
-			<< IAVF_TX_DESC_LENGTH_MACLEN_SHIFT;
+			<< CI_TX_DESC_LEN_MACLEN_S;
 	else
 		offset |= (m->l2_len >> 1)
-			<< IAVF_TX_DESC_LENGTH_MACLEN_SHIFT;
+			<< CI_TX_DESC_LEN_MACLEN_S;
 
 	/* Enable L3 checksum offloading inner */
 	if (m->ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
 		if (m->ol_flags & RTE_MBUF_F_TX_IPV4) {
-			command |= IAVF_TX_DESC_CMD_IIPT_IPV4_CSUM;
-			offset |= (m->l3_len >> 2) << IAVF_TX_DESC_LENGTH_IPLEN_SHIFT;
+			command |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
+			offset |= (m->l3_len >> 2) << CI_TX_DESC_LEN_IPLEN_S;
 		}
 	} else if (m->ol_flags & RTE_MBUF_F_TX_IPV4) {
-		command |= IAVF_TX_DESC_CMD_IIPT_IPV4;
-		offset |= (m->l3_len >> 2) << IAVF_TX_DESC_LENGTH_IPLEN_SHIFT;
+		command |= CI_TX_DESC_CMD_IIPT_IPV4;
+		offset |= (m->l3_len >> 2) << CI_TX_DESC_LEN_IPLEN_S;
 	} else if (m->ol_flags & RTE_MBUF_F_TX_IPV6) {
-		command |= IAVF_TX_DESC_CMD_IIPT_IPV6;
-		offset |= (m->l3_len >> 2) << IAVF_TX_DESC_LENGTH_IPLEN_SHIFT;
+		command |= CI_TX_DESC_CMD_IIPT_IPV6;
+		offset |= (m->l3_len >> 2) << CI_TX_DESC_LEN_IPLEN_S;
 	}
 
 	if (m->ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) {
 		if (m->ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-			command |= IAVF_TX_DESC_CMD_L4T_EOFT_TCP;
+			command |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
 		else
-			command |= IAVF_TX_DESC_CMD_L4T_EOFT_UDP;
+			command |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
 		offset |= (m->l4_len >> 2) <<
-			      IAVF_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+			      CI_TX_DESC_LEN_L4_LEN_S;
 
 		*qw1 = rte_cpu_to_le_64((((uint64_t)command <<
 			IAVF_TXD_DATA_QW1_CMD_SHIFT) & IAVF_TXD_DATA_QW1_CMD_MASK) |
@@ -2647,19 +2647,19 @@ iavf_build_data_desc_cmd_offset_fields(volatile uint64_t *qw1,
 	/* Enable L4 checksum offloads */
 	switch (m->ol_flags & RTE_MBUF_F_TX_L4_MASK) {
 	case RTE_MBUF_F_TX_TCP_CKSUM:
-		command |= IAVF_TX_DESC_CMD_L4T_EOFT_TCP;
+		command |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
 		offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
-				IAVF_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+				CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_SCTP_CKSUM:
-		command |= IAVF_TX_DESC_CMD_L4T_EOFT_SCTP;
+		command |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
 		offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
-				IAVF_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+				CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_UDP_CKSUM:
-		command |= IAVF_TX_DESC_CMD_L4T_EOFT_UDP;
+		command |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
 		offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
-				IAVF_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+				CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	}
 
@@ -2679,8 +2679,7 @@ iavf_calc_pkt_desc(struct rte_mbuf *tx_pkt)
 	uint16_t count = 0;
 
 	while (txd != NULL) {
-		count += (txd->data_len + IAVF_MAX_DATA_PER_TXD - 1) /
-			IAVF_MAX_DATA_PER_TXD;
+		count += (txd->data_len + CI_MAX_DATA_PER_TXD - 1) / CI_MAX_DATA_PER_TXD;
 		txd = txd->next;
 	}
 
@@ -2886,14 +2885,14 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			buf_dma_addr = rte_mbuf_data_iova(mb_seg);
 			while ((mb_seg->ol_flags & (RTE_MBUF_F_TX_TCP_SEG |
 					RTE_MBUF_F_TX_UDP_SEG)) &&
-					unlikely(slen > IAVF_MAX_DATA_PER_TXD)) {
+					unlikely(slen > CI_MAX_DATA_PER_TXD)) {
 				iavf_fill_data_desc(ddesc, ddesc_template,
-					IAVF_MAX_DATA_PER_TXD, buf_dma_addr);
+					CI_MAX_DATA_PER_TXD, buf_dma_addr);
 
 				IAVF_DUMP_TX_DESC(txq, ddesc, desc_idx);
 
-				buf_dma_addr += IAVF_MAX_DATA_PER_TXD;
-				slen -= IAVF_MAX_DATA_PER_TXD;
+				buf_dma_addr += CI_MAX_DATA_PER_TXD;
+				slen -= CI_MAX_DATA_PER_TXD;
 
 				txe->last_id = desc_idx_last;
 				desc_idx = txe->next_id;
@@ -2914,7 +2913,7 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		} while (mb_seg);
 
 		/* The last packet data descriptor needs End Of Packet (EOP) */
-		ddesc_cmd = IAVF_TX_DESC_CMD_EOP;
+		ddesc_cmd = CI_TX_DESC_CMD_EOP;
 
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_desc_required);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_desc_required);
@@ -2924,7 +2923,7 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 				   "%4u (port=%d queue=%d)",
 				   desc_idx_last, txq->port_id, txq->queue_id);
 
-			ddesc_cmd |= IAVF_TX_DESC_CMD_RS;
+			ddesc_cmd |= CI_TX_DESC_CMD_RS;
 
 			/* Update txq RS bit counters */
 			txq->nb_tx_used = 0;
@@ -4428,9 +4427,8 @@ iavf_dev_tx_desc_status(void *tx_queue, uint16_t offset)
 	}
 
 	status = &txq->ci_tx_ring[desc].cmd_type_offset_bsz;
-	mask = rte_le_to_cpu_64(IAVF_TXD_QW1_DTYPE_MASK);
-	expect = rte_cpu_to_le_64(
-		 IAVF_TX_DESC_DTYPE_DESC_DONE << IAVF_TXD_QW1_DTYPE_SHIFT);
+	mask = rte_le_to_cpu_64(CI_TXD_QW1_DTYPE_M);
+	expect = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE << CI_TXD_QW1_DTYPE_S);
 	if ((*status & mask) == expect)
 		return RTE_ETH_TX_DESC_DONE;
 
diff --git a/drivers/net/intel/iavf/iavf_rxtx.h b/drivers/net/intel/iavf/iavf_rxtx.h
index dd6d884fc1..395d97b4ee 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.h
+++ b/drivers/net/intel/iavf/iavf_rxtx.h
@@ -162,10 +162,6 @@
 #define IAVF_TX_OFFLOAD_NOTSUP_MASK \
 		(RTE_MBUF_F_TX_OFFLOAD_MASK ^ IAVF_TX_OFFLOAD_MASK)
 
-/* HW requires that TX buffer size ranges from 1B up to (16K-1)B. */
-#define IAVF_MAX_DATA_PER_TXD \
-	(IAVF_TXD_QW1_TX_BUF_SZ_MASK >> IAVF_TXD_QW1_TX_BUF_SZ_SHIFT)
-
 #define IAVF_TX_LLDP_DYNFIELD "intel_pmd_dynfield_tx_lldp"
 #define IAVF_CHECK_TX_LLDP(m) \
 	((rte_pmd_iavf_tx_lldp_dynfield_offset > 0) && \
@@ -195,18 +191,6 @@ struct iavf_rx_queue_stats {
 	struct iavf_ipsec_crypto_stats ipsec_crypto;
 };
 
-/* Offload features */
-union iavf_tx_offload {
-	uint64_t data;
-	struct {
-		uint64_t l2_len:7; /* L2 (MAC) Header Length. */
-		uint64_t l3_len:9; /* L3 (IP) Header Length. */
-		uint64_t l4_len:8; /* L4 Header Length. */
-		uint64_t tso_segsz:16; /* TCP TSO segment size */
-		/* uint64_t unused : 24; */
-	};
-};
-
 /* Rx Flex Descriptor
  * RxDID Profile ID 16-21
  * Flex-field 0: RSS hash lower 16-bits
@@ -409,7 +393,7 @@ enum iavf_rx_flex_desc_ipsec_crypto_status {
 
 
 #define IAVF_TXD_DATA_QW1_DTYPE_SHIFT	(0)
-#define IAVF_TXD_DATA_QW1_DTYPE_MASK	(0xFUL << IAVF_TXD_QW1_DTYPE_SHIFT)
+#define IAVF_TXD_DATA_QW1_DTYPE_MASK	(0xFUL << CI_TXD_QW1_DTYPE_S)
 
 #define IAVF_TXD_DATA_QW1_CMD_SHIFT	(4)
 #define IAVF_TXD_DATA_QW1_CMD_MASK	(0x3FFUL << IAVF_TXD_DATA_QW1_CMD_SHIFT)
@@ -686,7 +670,7 @@ void iavf_dump_tx_descriptor(const struct ci_tx_queue *txq,
 		rte_le_to_cpu_64(tx_desc->cmd_type_offset_bsz &
 			rte_cpu_to_le_64(IAVF_TXD_DATA_QW1_DTYPE_MASK));
 	switch (type) {
-	case IAVF_TX_DESC_DTYPE_DATA:
+	case CI_TX_DESC_DTYPE_DATA:
 		name = "Tx_data_desc";
 		break;
 	case IAVF_TX_DESC_DTYPE_CONTEXT:
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
index 89ce841b9e..cea4ee9863 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
@@ -1633,10 +1633,9 @@ static __rte_always_inline void
 iavf_vtx1(volatile struct ci_tx_desc *txdp,
 	  struct rte_mbuf *pkt, uint64_t flags, bool offload, uint8_t vlan_flag)
 {
-	uint64_t high_qw =
-		(IAVF_TX_DESC_DTYPE_DATA |
-		 ((uint64_t)flags  << IAVF_TXD_QW1_CMD_SHIFT) |
-		 ((uint64_t)pkt->data_len << IAVF_TXD_QW1_TX_BUF_SZ_SHIFT));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		 ((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+		 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 	if (offload)
 		iavf_txd_enable_offload(pkt, &high_qw, vlan_flag);
 
@@ -1649,8 +1648,7 @@ static __rte_always_inline void
 iavf_vtx(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags, bool offload, uint8_t vlan_flag)
 {
-	const uint64_t hi_qw_tmpl = (IAVF_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << IAVF_TXD_QW1_CMD_SHIFT));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags << CI_TXD_QW1_CMD_S));
 
 	/* if unaligned on 32-bit boundary, do one to align */
 	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
@@ -1660,28 +1658,20 @@ iavf_vtx(volatile struct ci_tx_desc *txdp,
 
 	/* do two at a time while possible, in bursts */
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
-		uint64_t hi_qw3 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[3]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
+		uint64_t hi_qw3 = hi_qw_tmpl |
+			((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			iavf_txd_enable_offload(pkt[3], &hi_qw3, vlan_flag);
-		uint64_t hi_qw2 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[2]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
+		uint64_t hi_qw2 = hi_qw_tmpl |
+			((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			iavf_txd_enable_offload(pkt[2], &hi_qw2, vlan_flag);
-		uint64_t hi_qw1 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[1]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
+		uint64_t hi_qw1 = hi_qw_tmpl |
+			((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			iavf_txd_enable_offload(pkt[1], &hi_qw1, vlan_flag);
-		uint64_t hi_qw0 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[0]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
+		uint64_t hi_qw0 = hi_qw_tmpl |
+			((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			iavf_txd_enable_offload(pkt[0], &hi_qw0, vlan_flag);
 
@@ -1717,8 +1707,8 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	/* bit2 is reserved and must be set to 1 according to Spec */
-	uint64_t flags = IAVF_TX_DESC_CMD_EOP | IAVF_TX_DESC_CMD_ICRC;
-	uint64_t rs = IAVF_TX_DESC_CMD_RS | flags;
+	uint64_t flags = CI_TX_DESC_CMD_EOP | CI_TX_DESC_CMD_ICRC;
+	uint64_t rs = CI_TX_DESC_CMD_RS | flags;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
 		ci_tx_free_bufs_vec(txq, iavf_tx_desc_done, false);
@@ -1761,8 +1751,7 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
-					 IAVF_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c b/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
index ad1b0b90cd..01477fd501 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
@@ -1844,10 +1844,9 @@ iavf_vtx1(volatile struct ci_tx_desc *txdp,
 	  struct rte_mbuf *pkt, uint64_t flags,
 	  bool offload, uint8_t vlan_flag)
 {
-	uint64_t high_qw =
-		(IAVF_TX_DESC_DTYPE_DATA |
-		 ((uint64_t)flags  << IAVF_TXD_QW1_CMD_SHIFT) |
-		 ((uint64_t)pkt->data_len << IAVF_TXD_QW1_TX_BUF_SZ_SHIFT));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		 ((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+		 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 	if (offload)
 		iavf_txd_enable_offload(pkt, &high_qw, vlan_flag);
 
@@ -1863,8 +1862,7 @@ iavf_vtx(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags,
 		bool offload, uint8_t vlan_flag)
 {
-	const uint64_t hi_qw_tmpl = (IAVF_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << IAVF_TXD_QW1_CMD_SHIFT));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags << CI_TXD_QW1_CMD_S));
 
 	/* if unaligned on 32-bit boundary, do one to align */
 	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
@@ -1874,22 +1872,14 @@ iavf_vtx(volatile struct ci_tx_desc *txdp,
 
 	/* do 4 at a time while possible, in bursts */
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
-		uint64_t hi_qw3 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[3]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
-		uint64_t hi_qw2 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[2]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
-		uint64_t hi_qw1 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[1]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
-		uint64_t hi_qw0 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[0]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
+		uint64_t hi_qw3 = hi_qw_tmpl |
+			((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw2 = hi_qw_tmpl |
+			((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw1 = hi_qw_tmpl |
+			((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw0 = hi_qw_tmpl |
+			((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload) {
 			iavf_txd_enable_offload(pkt[3], &hi_qw3, vlan_flag);
 			iavf_txd_enable_offload(pkt[2], &hi_qw2, vlan_flag);
@@ -2093,9 +2083,9 @@ ctx_vtx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf *pkt,
 	if (IAVF_CHECK_TX_LLDP(pkt))
 		high_ctx_qw |= IAVF_TX_CTX_DESC_SWTCH_UPLINK
 			<< IAVF_TXD_CTX_QW1_CMD_SHIFT;
-	uint64_t high_data_qw = (IAVF_TX_DESC_DTYPE_DATA |
-				((uint64_t)flags  << IAVF_TXD_QW1_CMD_SHIFT) |
-				((uint64_t)pkt->data_len << IAVF_TXD_QW1_TX_BUF_SZ_SHIFT));
+	uint64_t high_data_qw = (CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+				((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 	if (offload)
 		iavf_txd_enable_offload(pkt, &high_data_qw, vlan_flag);
 
@@ -2110,8 +2100,7 @@ ctx_vtx(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags,
 		bool offload, uint8_t vlan_flag)
 {
-	uint64_t hi_data_qw_tmpl = (IAVF_TX_DESC_DTYPE_DATA |
-					((uint64_t)flags  << IAVF_TXD_QW1_CMD_SHIFT));
+	uint64_t hi_data_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags << CI_TXD_QW1_CMD_S));
 
 	/* if unaligned on 32-bit boundary, do one to align */
 	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
@@ -2128,11 +2117,9 @@ ctx_vtx(volatile struct ci_tx_desc *txdp,
 		uint64_t hi_data_qw0 = 0;
 
 		hi_data_qw1 = hi_data_qw_tmpl |
-				((uint64_t)pkt[1]->data_len <<
-					IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
+				((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		hi_data_qw0 = hi_data_qw_tmpl |
-				((uint64_t)pkt[0]->data_len <<
-					IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
+				((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 
 #ifdef IAVF_TX_VLAN_QINQ_OFFLOAD
 		if (offload) {
@@ -2140,13 +2127,11 @@ ctx_vtx(volatile struct ci_tx_desc *txdp,
 				uint64_t qinq_tag = vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2 ?
 					(uint64_t)pkt[1]->vlan_tci :
 					(uint64_t)pkt[1]->vlan_tci_outer;
-				hi_ctx_qw1 |= IAVF_TX_CTX_DESC_IL2TAG2 <<
-						IAVF_TXD_CTX_QW1_CMD_SHIFT;
+				hi_ctx_qw1 |= CI_TX_CTX_DESC_IL2TAG2 << CI_TXD_QW1_CMD_S;
 				low_ctx_qw1 |= qinq_tag << IAVF_TXD_CTX_QW0_L2TAG2_PARAM;
 			} else if (pkt[1]->ol_flags & RTE_MBUF_F_TX_VLAN &&
 					vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2) {
-				hi_ctx_qw1 |=
-					IAVF_TX_CTX_DESC_IL2TAG2 << IAVF_TXD_CTX_QW1_CMD_SHIFT;
+				hi_ctx_qw1 |= IAVF_TX_CTX_DESC_IL2TAG2 << CI_TXD_QW1_CMD_S;
 				low_ctx_qw1 |=
 					(uint64_t)pkt[1]->vlan_tci << IAVF_TXD_CTX_QW0_L2TAG2_PARAM;
 			}
@@ -2154,7 +2139,7 @@ ctx_vtx(volatile struct ci_tx_desc *txdp,
 #endif
 		if (IAVF_CHECK_TX_LLDP(pkt[1]))
 			hi_ctx_qw1 |= IAVF_TX_CTX_DESC_SWTCH_UPLINK
-				<< IAVF_TXD_CTX_QW1_CMD_SHIFT;
+				<< CI_TXD_QW1_CMD_S;
 
 #ifdef IAVF_TX_VLAN_QINQ_OFFLOAD
 		if (offload) {
@@ -2162,21 +2147,18 @@ ctx_vtx(volatile struct ci_tx_desc *txdp,
 				uint64_t qinq_tag = vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2 ?
 					(uint64_t)pkt[0]->vlan_tci :
 					(uint64_t)pkt[0]->vlan_tci_outer;
-				hi_ctx_qw0 |= IAVF_TX_CTX_DESC_IL2TAG2 <<
-						IAVF_TXD_CTX_QW1_CMD_SHIFT;
+				hi_ctx_qw0 |= IAVF_TX_CTX_DESC_IL2TAG2 << CI_TXD_QW1_CMD_S;
 				low_ctx_qw0 |= qinq_tag << IAVF_TXD_CTX_QW0_L2TAG2_PARAM;
 			} else if (pkt[0]->ol_flags & RTE_MBUF_F_TX_VLAN &&
 					vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2) {
-				hi_ctx_qw0 |=
-					IAVF_TX_CTX_DESC_IL2TAG2 << IAVF_TXD_CTX_QW1_CMD_SHIFT;
+				hi_ctx_qw0 |= IAVF_TX_CTX_DESC_IL2TAG2 << CI_TXD_QW1_CMD_S;
 				low_ctx_qw0 |=
 					(uint64_t)pkt[0]->vlan_tci << IAVF_TXD_CTX_QW0_L2TAG2_PARAM;
 			}
 		}
 #endif
 		if (IAVF_CHECK_TX_LLDP(pkt[0]))
-			hi_ctx_qw0 |= IAVF_TX_CTX_DESC_SWTCH_UPLINK
-				<< IAVF_TXD_CTX_QW1_CMD_SHIFT;
+			hi_ctx_qw0 |= IAVF_TX_CTX_DESC_SWTCH_UPLINK << CI_TXD_QW1_CMD_S;
 
 		if (offload) {
 			iavf_txd_enable_offload(pkt[1], &hi_data_qw1, vlan_flag);
@@ -2207,8 +2189,8 @@ iavf_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	/* bit2 is reserved and must be set to 1 according to Spec */
-	uint64_t flags = IAVF_TX_DESC_CMD_EOP | IAVF_TX_DESC_CMD_ICRC;
-	uint64_t rs = IAVF_TX_DESC_CMD_RS | flags;
+	uint64_t flags = CI_TX_DESC_CMD_EOP | CI_TX_DESC_CMD_ICRC;
+	uint64_t rs = CI_TX_DESC_CMD_RS | flags;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
 		ci_tx_free_bufs_vec(txq, iavf_tx_desc_done, false);
@@ -2253,8 +2235,7 @@ iavf_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
-					 IAVF_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
@@ -2275,8 +2256,8 @@ iavf_xmit_fixed_burst_vec_avx512_ctx(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, nb_mbuf, tx_id;
 	/* bit2 is reserved and must be set to 1 according to Spec */
-	uint64_t flags = IAVF_TX_DESC_CMD_EOP | IAVF_TX_DESC_CMD_ICRC;
-	uint64_t rs = IAVF_TX_DESC_CMD_RS | flags;
+	uint64_t flags = CI_TX_DESC_CMD_EOP | CI_TX_DESC_CMD_ICRC;
+	uint64_t rs = CI_TX_DESC_CMD_RS | flags;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
 		ci_tx_free_bufs_vec(txq, iavf_tx_desc_done, true);
@@ -2321,8 +2302,7 @@ iavf_xmit_fixed_burst_vec_avx512_ctx(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
-					 IAVF_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_common.h b/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
index 1832b76f89..1538a44892 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
@@ -15,8 +15,8 @@ static inline int
 iavf_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 {
 	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(IAVF_TXD_QW1_DTYPE_MASK)) ==
-				rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+				rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 }
 
 static inline void
@@ -147,26 +147,26 @@ iavf_txd_enable_offload(__rte_unused struct rte_mbuf *tx_pkt,
 	/* Set MACLEN */
 	if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)
 		td_offset |= (tx_pkt->outer_l2_len >> 1)
-			<< IAVF_TX_DESC_LENGTH_MACLEN_SHIFT;
+			<< CI_TX_DESC_LEN_MACLEN_S;
 	else
 		td_offset |= (tx_pkt->l2_len >> 1)
-			<< IAVF_TX_DESC_LENGTH_MACLEN_SHIFT;
+			<< CI_TX_DESC_LEN_MACLEN_S;
 
 	/* Enable L3 checksum offloads */
 	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
 		if (ol_flags & RTE_MBUF_F_TX_IPV4) {
-			td_cmd |= IAVF_TX_DESC_CMD_IIPT_IPV4_CSUM;
+			td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
 			td_offset |= (tx_pkt->l3_len >> 2) <<
-				     IAVF_TX_DESC_LENGTH_IPLEN_SHIFT;
+				     CI_TX_DESC_LEN_IPLEN_S;
 		}
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
-		td_cmd |= IAVF_TX_DESC_CMD_IIPT_IPV4;
+		td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
 		td_offset |= (tx_pkt->l3_len >> 2) <<
-			     IAVF_TX_DESC_LENGTH_IPLEN_SHIFT;
+			     CI_TX_DESC_LEN_IPLEN_S;
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
-		td_cmd |= IAVF_TX_DESC_CMD_IIPT_IPV6;
+		td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
 		td_offset |= (tx_pkt->l3_len >> 2) <<
-			     IAVF_TX_DESC_LENGTH_IPLEN_SHIFT;
+			     CI_TX_DESC_LEN_IPLEN_S;
 	}
 
 	/* Enable L4 checksum offloads */
@@ -190,7 +190,7 @@ iavf_txd_enable_offload(__rte_unused struct rte_mbuf *tx_pkt,
 		break;
 	}
 
-	*txd_hi |= ((uint64_t)td_offset) << IAVF_TXD_QW1_OFFSET_SHIFT;
+	*txd_hi |= ((uint64_t)td_offset) << CI_TXD_QW1_OFFSET_S;
 #endif
 
 #ifdef IAVF_TX_VLAN_QINQ_OFFLOAD
@@ -198,17 +198,15 @@ iavf_txd_enable_offload(__rte_unused struct rte_mbuf *tx_pkt,
 		td_cmd |= IAVF_TX_DESC_CMD_IL2TAG1;
 		/* vlan_flag specifies outer tag location for QinQ. */
 		if (vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1)
-			*txd_hi |= ((uint64_t)tx_pkt->vlan_tci_outer <<
-					IAVF_TXD_QW1_L2TAG1_SHIFT);
+			*txd_hi |= ((uint64_t)tx_pkt->vlan_tci_outer << CI_TXD_QW1_L2TAG1_S);
 		else
-			*txd_hi |= ((uint64_t)tx_pkt->vlan_tci <<
-					IAVF_TXD_QW1_L2TAG1_SHIFT);
+			*txd_hi |= ((uint64_t)tx_pkt->vlan_tci << CI_TXD_QW1_L2TAG1_S);
 	} else if (ol_flags & RTE_MBUF_F_TX_VLAN && vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1) {
-		td_cmd |= IAVF_TX_DESC_CMD_IL2TAG1;
-		*txd_hi |= ((uint64_t)tx_pkt->vlan_tci << IAVF_TXD_QW1_L2TAG1_SHIFT);
+		td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
+		*txd_hi |= ((uint64_t)tx_pkt->vlan_tci << CI_TXD_QW1_L2TAG1_S);
 	}
 #endif
 
-	*txd_hi |= ((uint64_t)td_cmd) << IAVF_TXD_QW1_CMD_SHIFT;
+	*txd_hi |= ((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S;
 }
 #endif
diff --git a/drivers/net/intel/ice/ice_dcf_ethdev.c b/drivers/net/intel/ice/ice_dcf_ethdev.c
index 5f537b4c12..4ceecc15c6 100644
--- a/drivers/net/intel/ice/ice_dcf_ethdev.c
+++ b/drivers/net/intel/ice/ice_dcf_ethdev.c
@@ -406,7 +406,7 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
 		txq->ci_tx_ring[i].cmd_type_offset_bsz =
-			rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
 		txe[prev].next_id = i;
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 12f05ae446..b5395a803f 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -1124,7 +1124,7 @@ ice_reset_tx_queue(struct ci_tx_queue *txq)
 		volatile struct ci_tx_desc *txd = &txq->ci_tx_ring[i];
 
 		txd->cmd_type_offset_bsz =
-			rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
 		txe[prev].next_id = i;
@@ -2544,9 +2544,8 @@ ice_tx_descriptor_status(void *tx_queue, uint16_t offset)
 	}
 
 	status = &txq->ci_tx_ring[desc].cmd_type_offset_bsz;
-	mask = rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M);
-	expect = rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE <<
-				  ICE_TXD_QW1_DTYPE_S);
+	mask = rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M);
+	expect = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE << CI_TXD_QW1_DTYPE_S);
 	if ((*status & mask) == expect)
 		return RTE_ETH_TX_DESC_DONE;
 
@@ -2892,7 +2891,7 @@ ice_recv_pkts(void *rx_queue,
 
 static inline void
 ice_parse_tunneling_params(uint64_t ol_flags,
-			    union ice_tx_offload tx_offload,
+			    union ci_tx_offload tx_offload,
 			    uint32_t *cd_tunneling)
 {
 	/* EIPT: External (outer) IP header type */
@@ -2953,58 +2952,58 @@ static inline void
 ice_txd_enable_checksum(uint64_t ol_flags,
 			uint32_t *td_cmd,
 			uint32_t *td_offset,
-			union ice_tx_offload tx_offload)
+			union ci_tx_offload tx_offload)
 {
 	/* Set MACLEN */
 	if (!(ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK))
 		*td_offset |= (tx_offload.l2_len >> 1)
-			<< ICE_TX_DESC_LEN_MACLEN_S;
+			<< CI_TX_DESC_LEN_MACLEN_S;
 
 	/* Enable L3 checksum offloads */
 	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
-		*td_cmd |= ICE_TX_DESC_CMD_IIPT_IPV4_CSUM;
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
 		*td_offset |= (tx_offload.l3_len >> 2) <<
-			ICE_TX_DESC_LEN_IPLEN_S;
+			CI_TX_DESC_LEN_IPLEN_S;
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
-		*td_cmd |= ICE_TX_DESC_CMD_IIPT_IPV4;
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
 		*td_offset |= (tx_offload.l3_len >> 2) <<
-			ICE_TX_DESC_LEN_IPLEN_S;
+			CI_TX_DESC_LEN_IPLEN_S;
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
-		*td_cmd |= ICE_TX_DESC_CMD_IIPT_IPV6;
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
 		*td_offset |= (tx_offload.l3_len >> 2) <<
-			ICE_TX_DESC_LEN_IPLEN_S;
+			CI_TX_DESC_LEN_IPLEN_S;
 	}
 
 	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
-		*td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_TCP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
 		*td_offset |= (tx_offload.l4_len >> 2) <<
-			      ICE_TX_DESC_LEN_L4_LEN_S;
+			      CI_TX_DESC_LEN_L4_LEN_S;
 		return;
 	}
 
 	if (ol_flags & RTE_MBUF_F_TX_UDP_SEG) {
-		*td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_UDP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
 		*td_offset |= (tx_offload.l4_len >> 2) <<
-			      ICE_TX_DESC_LEN_L4_LEN_S;
+			      CI_TX_DESC_LEN_L4_LEN_S;
 		return;
 	}
 
 	/* Enable L4 checksum offloads */
 	switch (ol_flags & RTE_MBUF_F_TX_L4_MASK) {
 	case RTE_MBUF_F_TX_TCP_CKSUM:
-		*td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_TCP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
 		*td_offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
-			      ICE_TX_DESC_LEN_L4_LEN_S;
+			      CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_SCTP_CKSUM:
-		*td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_SCTP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
 		*td_offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
-			      ICE_TX_DESC_LEN_L4_LEN_S;
+			      CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_UDP_CKSUM:
-		*td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_UDP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
 		*td_offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
-			      ICE_TX_DESC_LEN_L4_LEN_S;
+			      CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	default:
 		break;
@@ -3018,11 +3017,11 @@ ice_build_ctob(uint32_t td_cmd,
 	       uint16_t size,
 	       uint32_t td_tag)
 {
-	return rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DATA |
-				((uint64_t)td_cmd << ICE_TXD_QW1_CMD_S) |
-				((uint64_t)td_offset << ICE_TXD_QW1_OFFSET_S) |
-				((uint64_t)size << ICE_TXD_QW1_TX_BUF_SZ_S) |
-				((uint64_t)td_tag << ICE_TXD_QW1_L2TAG1_S));
+	return rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
+				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+				((uint64_t)size << CI_TXD_QW1_TX_BUF_SZ_S) |
+				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
 }
 
 /* Check if the context descriptor is needed for TX offloading */
@@ -3041,7 +3040,7 @@ ice_calc_context_desc(uint64_t flags)
 
 /* set ice TSO context descriptor */
 static inline uint64_t
-ice_set_tso_ctx(struct rte_mbuf *mbuf, union ice_tx_offload tx_offload)
+ice_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 {
 	uint64_t ctx_desc = 0;
 	uint32_t cd_cmd, hdr_len, cd_tso_len;
@@ -3055,18 +3054,15 @@ ice_set_tso_ctx(struct rte_mbuf *mbuf, union ice_tx_offload tx_offload)
 	hdr_len += (mbuf->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) ?
 		   tx_offload.outer_l2_len + tx_offload.outer_l3_len : 0;
 
-	cd_cmd = ICE_TX_CTX_DESC_TSO;
+	cd_cmd = CI_TX_CTX_DESC_TSO;
 	cd_tso_len = mbuf->pkt_len - hdr_len;
-	ctx_desc |= ((uint64_t)cd_cmd << ICE_TXD_CTX_QW1_CMD_S) |
+	ctx_desc |= ((uint64_t)cd_cmd << CI_TXD_QW1_CMD_S) |
 		    ((uint64_t)cd_tso_len << ICE_TXD_CTX_QW1_TSO_LEN_S) |
 		    ((uint64_t)mbuf->tso_segsz << ICE_TXD_CTX_QW1_MSS_S);
 
 	return ctx_desc;
 }
 
-/* HW requires that TX buffer size ranges from 1B up to (16K-1)B. */
-#define ICE_MAX_DATA_PER_TXD \
-	(ICE_TXD_QW1_TX_BUF_SZ_M >> ICE_TXD_QW1_TX_BUF_SZ_S)
 /* Calculate the number of TX descriptors needed for each pkt */
 static inline uint16_t
 ice_calc_pkt_desc(struct rte_mbuf *tx_pkt)
@@ -3075,7 +3071,7 @@ ice_calc_pkt_desc(struct rte_mbuf *tx_pkt)
 	uint16_t count = 0;
 
 	while (txd != NULL) {
-		count += DIV_ROUND_UP(txd->data_len, ICE_MAX_DATA_PER_TXD);
+		count += DIV_ROUND_UP(txd->data_len, CI_MAX_DATA_PER_TXD);
 		txd = txd->next;
 	}
 
@@ -3105,7 +3101,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	uint16_t slen;
 	uint64_t buf_dma_addr;
 	uint64_t ol_flags;
-	union ice_tx_offload tx_offload = {0};
+	union ci_tx_offload tx_offload = {0};
 
 	txq = tx_queue;
 	sw_ring = txq->sw_ring;
@@ -3173,7 +3169,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		/* Descriptor based VLAN insertion */
 		if (ol_flags & (RTE_MBUF_F_TX_VLAN | RTE_MBUF_F_TX_QINQ)) {
-			td_cmd |= ICE_TX_DESC_CMD_IL2TAG1;
+			td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
 			td_tag = tx_pkt->vlan_tci;
 		}
 
@@ -3181,7 +3177,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		cd_tunneling_params = 0;
 		if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) {
 			td_offset |= (tx_offload.outer_l2_len >> 1)
-				<< ICE_TX_DESC_LEN_MACLEN_S;
+				<< CI_TX_DESC_LEN_MACLEN_S;
 			ice_parse_tunneling_params(ol_flags, tx_offload,
 						   &cd_tunneling_params);
 		}
@@ -3211,8 +3207,8 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 					ice_set_tso_ctx(tx_pkt, tx_offload);
 			else if (ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST)
 				cd_type_cmd_tso_mss |=
-					((uint64_t)ICE_TX_CTX_DESC_TSYN <<
-					ICE_TXD_CTX_QW1_CMD_S) |
+					((uint64_t)CI_TX_CTX_DESC_TSYN <<
+					CI_TXD_QW1_CMD_S) |
 					 (((uint64_t)txq->ice_vsi->adapter->ptp_tx_index <<
 					 ICE_TXD_CTX_QW1_TSYN_S) & ICE_TXD_CTX_QW1_TSYN_M);
 
@@ -3223,8 +3219,8 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			if (ol_flags & RTE_MBUF_F_TX_QINQ) {
 				cd_l2tag2 = tx_pkt->vlan_tci_outer;
 				cd_type_cmd_tso_mss |=
-					((uint64_t)ICE_TX_CTX_DESC_IL2TAG2 <<
-					 ICE_TXD_CTX_QW1_CMD_S);
+					((uint64_t)CI_TX_CTX_DESC_IL2TAG2 <<
+					 CI_TXD_QW1_CMD_S);
 			}
 			ctx_txd->l2tag2 = rte_cpu_to_le_16(cd_l2tag2);
 			ctx_txd->qw1 =
@@ -3249,18 +3245,16 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			buf_dma_addr = rte_mbuf_data_iova(m_seg);
 
 			while ((ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) &&
-				unlikely(slen > ICE_MAX_DATA_PER_TXD)) {
+					unlikely(slen > CI_MAX_DATA_PER_TXD)) {
 				txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-				txd->cmd_type_offset_bsz =
-				rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DATA |
-				((uint64_t)td_cmd << ICE_TXD_QW1_CMD_S) |
-				((uint64_t)td_offset << ICE_TXD_QW1_OFFSET_S) |
-				((uint64_t)ICE_MAX_DATA_PER_TXD <<
-				 ICE_TXD_QW1_TX_BUF_SZ_S) |
-				((uint64_t)td_tag << ICE_TXD_QW1_L2TAG1_S));
+				txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+					((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
+					((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+					((uint64_t)CI_MAX_DATA_PER_TXD << CI_TXD_QW1_TX_BUF_SZ_S) |
+					((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
 
-				buf_dma_addr += ICE_MAX_DATA_PER_TXD;
-				slen -= ICE_MAX_DATA_PER_TXD;
+				buf_dma_addr += CI_MAX_DATA_PER_TXD;
+				slen -= CI_MAX_DATA_PER_TXD;
 
 				txe->last_id = tx_last;
 				tx_id = txe->next_id;
@@ -3270,12 +3264,11 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			}
 
 			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-			txd->cmd_type_offset_bsz =
-				rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DATA |
-				((uint64_t)td_cmd << ICE_TXD_QW1_CMD_S) |
-				((uint64_t)td_offset << ICE_TXD_QW1_OFFSET_S) |
-				((uint64_t)slen << ICE_TXD_QW1_TX_BUF_SZ_S) |
-				((uint64_t)td_tag << ICE_TXD_QW1_L2TAG1_S));
+			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
+				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+				((uint64_t)slen << CI_TXD_QW1_TX_BUF_SZ_S) |
+				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
 
 			txe->last_id = tx_last;
 			tx_id = txe->next_id;
@@ -3284,7 +3277,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		} while (m_seg);
 
 		/* fill the last descriptor with End of Packet (EOP) bit */
-		td_cmd |= ICE_TX_DESC_CMD_EOP;
+		td_cmd |= CI_TX_DESC_CMD_EOP;
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
@@ -3295,14 +3288,13 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 				   "%4u (port=%d queue=%d)",
 				   tx_last, txq->port_id, txq->queue_id);
 
-			td_cmd |= ICE_TX_DESC_CMD_RS;
+			td_cmd |= CI_TX_DESC_CMD_RS;
 
 			/* Update txq RS bit counters */
 			txq->nb_tx_used = 0;
 		}
 		txd->cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)td_cmd) <<
-					 ICE_TXD_QW1_CMD_S);
+			rte_cpu_to_le_64(((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S);
 
 		if (txq->tsq != NULL && txq->tsq->ts_flag > 0) {
 			uint64_t txtime = *RTE_MBUF_DYNFIELD(tx_pkt,
@@ -3349,8 +3341,8 @@ ice_tx_free_bufs(struct ci_tx_queue *txq)
 	uint16_t i;
 
 	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
-	     rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M)) !=
-	    rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE))
+	     rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
+	    rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
 		return 0;
 
 	txep = &txq->sw_ring[txq->tx_next_dd - (txq->tx_rs_thresh - 1)];
@@ -3581,8 +3573,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 		n = (uint16_t)(txq->nb_tx_desc - txq->tx_tail);
 		ice_tx_fill_hw_ring(txq, tx_pkts, n);
 		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS) <<
-					 ICE_TXD_QW1_CMD_S);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 		txq->tx_tail = 0;
 	}
@@ -3594,8 +3585,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	/* Determine if RS bit needs to be set */
 	if (txq->tx_tail > txq->tx_next_rs) {
 		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS) <<
-					 ICE_TXD_QW1_CMD_S);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 		if (txq->tx_next_rs >= txq->nb_tx_desc)
@@ -4826,9 +4816,9 @@ ice_fdir_programming(struct ice_pf *pf, struct ice_fltr_desc *fdir_desc)
 
 	txdp = &txq->ci_tx_ring[txq->tx_tail + 1];
 	txdp->buffer_addr = rte_cpu_to_le_64(pf->fdir.dma_addr);
-	td_cmd = ICE_TX_DESC_CMD_EOP |
-		ICE_TX_DESC_CMD_RS  |
-		ICE_TX_DESC_CMD_DUMMY;
+	td_cmd = CI_TX_DESC_CMD_EOP |
+		CI_TX_DESC_CMD_RS  |
+		CI_TX_DESC_CMD_DUMMY;
 
 	txdp->cmd_type_offset_bsz =
 		ice_build_ctob(td_cmd, 0, ICE_FDIR_PKT_LEN, 0);
@@ -4839,9 +4829,8 @@ ice_fdir_programming(struct ice_pf *pf, struct ice_fltr_desc *fdir_desc)
 	/* Update the tx tail register */
 	ICE_PCI_REG_WRITE(txq->qtx_tail, txq->tx_tail);
 	for (i = 0; i < ICE_FDIR_MAX_WAIT_US; i++) {
-		if ((txdp->cmd_type_offset_bsz &
-		     rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M)) ==
-		    rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE))
+		if ((txdp->cmd_type_offset_bsz & rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+		    rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
 			break;
 		rte_delay_us(1);
 	}
diff --git a/drivers/net/intel/ice/ice_rxtx.h b/drivers/net/intel/ice/ice_rxtx.h
index c524e9f756..cd5fa93d1c 100644
--- a/drivers/net/intel/ice/ice_rxtx.h
+++ b/drivers/net/intel/ice/ice_rxtx.h
@@ -46,7 +46,7 @@
 
 #define ICE_SUPPORT_CHAIN_NUM 5
 
-#define ICE_TD_CMD                      ICE_TX_DESC_CMD_EOP
+#define ICE_TD_CMD                      CI_TX_DESC_CMD_EOP
 
 #define ICE_VPMD_RX_BURST            CI_VPMD_RX_BURST
 #define ICE_VPMD_TX_BURST            32
@@ -169,19 +169,6 @@ struct ice_txtime {
 	const struct rte_memzone *ts_mz;
 };
 
-/* Offload features */
-union ice_tx_offload {
-	uint64_t data;
-	struct {
-		uint64_t l2_len:7; /* L2 (MAC) Header Length. */
-		uint64_t l3_len:9; /* L3 (IP) Header Length. */
-		uint64_t l4_len:8; /* L4 Header Length. */
-		uint64_t tso_segsz:16; /* TCP TSO segment size */
-		uint64_t outer_l2_len:8; /* outer L2 Header Length */
-		uint64_t outer_l3_len:16; /* outer L3 Header Length */
-	};
-};
-
 /* Rx Flex Descriptor for Comms Package Profile
  * RxDID Profile ID 22 (swap Hash and FlowID)
  * Flex-field 0: Flow ID lower 16-bits
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
index 0a1df0b2f6..2922671158 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
@@ -777,10 +777,9 @@ static __rte_always_inline void
 ice_vtx1(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf *pkt, uint64_t flags, bool offload)
 {
-	uint64_t high_qw =
-		(ICE_TX_DESC_DTYPE_DATA |
-		 ((uint64_t)flags  << ICE_TXD_QW1_CMD_S) |
-		 ((uint64_t)pkt->data_len << ICE_TXD_QW1_TX_BUF_SZ_S));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		 ((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+		 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 	if (offload)
 		ice_txd_enable_offload(pkt, &high_qw);
 
@@ -792,8 +791,7 @@ static __rte_always_inline void
 ice_vtx(volatile struct ci_tx_desc *txdp,
 	struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags, bool offload)
 {
-	const uint64_t hi_qw_tmpl = (ICE_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << ICE_TXD_QW1_CMD_S));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags << CI_TXD_QW1_CMD_S));
 
 	/* if unaligned on 32-bit boundary, do one to align */
 	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
@@ -801,30 +799,22 @@ ice_vtx(volatile struct ci_tx_desc *txdp,
 		nb_pkts--, txdp++, pkt++;
 	}
 
-	/* do two at a time while possible, in bursts */
+	/* do four at a time while possible, in bursts */
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
-		uint64_t hi_qw3 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[3]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw3 = hi_qw_tmpl |
+			((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			ice_txd_enable_offload(pkt[3], &hi_qw3);
-		uint64_t hi_qw2 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[2]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw2 = hi_qw_tmpl |
+			((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			ice_txd_enable_offload(pkt[2], &hi_qw2);
-		uint64_t hi_qw1 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[1]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw1 = hi_qw_tmpl |
+			((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			ice_txd_enable_offload(pkt[1], &hi_qw1);
-		uint64_t hi_qw0 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[0]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw0 = hi_qw_tmpl |
+			((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			ice_txd_enable_offload(pkt[0], &hi_qw0);
 
@@ -856,7 +846,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = ICE_TD_CMD;
-	uint64_t rs = ICE_TX_DESC_CMD_RS | ICE_TD_CMD;
+	uint64_t rs = CI_TX_DESC_CMD_RS | ICE_TD_CMD;
 
 	/* cross rx_thresh boundary is not allowed */
 	nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);
@@ -901,8 +891,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS) <<
-					 ICE_TXD_QW1_CMD_S);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
index d42f41461f..e64b6e227b 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
@@ -850,10 +850,9 @@ static __rte_always_inline void
 ice_vtx1(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf *pkt, uint64_t flags, bool do_offload)
 {
-	uint64_t high_qw =
-		(ICE_TX_DESC_DTYPE_DATA |
-		 ((uint64_t)flags  << ICE_TXD_QW1_CMD_S) |
-		 ((uint64_t)pkt->data_len << ICE_TXD_QW1_TX_BUF_SZ_S));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		 ((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+		 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	if (do_offload)
 		ice_txd_enable_offload(pkt, &high_qw);
@@ -866,32 +865,23 @@ static __rte_always_inline void
 ice_vtx(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkt,
 	uint16_t nb_pkts,  uint64_t flags, bool do_offload)
 {
-	const uint64_t hi_qw_tmpl = (ICE_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << ICE_TXD_QW1_CMD_S));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags << CI_TXD_QW1_CMD_S));
 
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
-		uint64_t hi_qw3 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[3]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw3 = hi_qw_tmpl |
+			((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (do_offload)
 			ice_txd_enable_offload(pkt[3], &hi_qw3);
-		uint64_t hi_qw2 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[2]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw2 = hi_qw_tmpl |
+			((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (do_offload)
 			ice_txd_enable_offload(pkt[2], &hi_qw2);
-		uint64_t hi_qw1 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[1]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw1 = hi_qw_tmpl |
+			((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (do_offload)
 			ice_txd_enable_offload(pkt[1], &hi_qw1);
-		uint64_t hi_qw0 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[0]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw0 = hi_qw_tmpl |
+			((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (do_offload)
 			ice_txd_enable_offload(pkt[0], &hi_qw0);
 
@@ -920,7 +910,7 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = ICE_TD_CMD;
-	uint64_t rs = ICE_TX_DESC_CMD_RS | ICE_TD_CMD;
+	uint64_t rs = CI_TX_DESC_CMD_RS | ICE_TD_CMD;
 
 	/* cross rx_thresh boundary is not allowed */
 	nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);
@@ -966,8 +956,7 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS) <<
-					 ICE_TXD_QW1_CMD_S);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_common.h b/drivers/net/intel/ice/ice_rxtx_vec_common.h
index 8ba591e403..1d83a087cc 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_common.h
+++ b/drivers/net/intel/ice/ice_rxtx_vec_common.h
@@ -12,8 +12,8 @@ static inline int
 ice_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 {
 	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M)) ==
-				rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+				rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 }
 
 static inline void
@@ -124,53 +124,52 @@ ice_txd_enable_offload(struct rte_mbuf *tx_pkt,
 	/* Tx Checksum Offload */
 	/* SET MACLEN */
 	td_offset |= (tx_pkt->l2_len >> 1) <<
-		ICE_TX_DESC_LEN_MACLEN_S;
+		CI_TX_DESC_LEN_MACLEN_S;
 
 	/* Enable L3 checksum offload */
 	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
-		td_cmd |= ICE_TX_DESC_CMD_IIPT_IPV4_CSUM;
+		td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
 		td_offset |= (tx_pkt->l3_len >> 2) <<
-			ICE_TX_DESC_LEN_IPLEN_S;
+			CI_TX_DESC_LEN_IPLEN_S;
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
-		td_cmd |= ICE_TX_DESC_CMD_IIPT_IPV4;
+		td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
 		td_offset |= (tx_pkt->l3_len >> 2) <<
-			ICE_TX_DESC_LEN_IPLEN_S;
+			CI_TX_DESC_LEN_IPLEN_S;
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
-		td_cmd |= ICE_TX_DESC_CMD_IIPT_IPV6;
+		td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
 		td_offset |= (tx_pkt->l3_len >> 2) <<
-			ICE_TX_DESC_LEN_IPLEN_S;
+			CI_TX_DESC_LEN_IPLEN_S;
 	}
 
 	/* Enable L4 checksum offloads */
 	switch (ol_flags & RTE_MBUF_F_TX_L4_MASK) {
 	case RTE_MBUF_F_TX_TCP_CKSUM:
-		td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_TCP;
+		td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
 		td_offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
-			ICE_TX_DESC_LEN_L4_LEN_S;
+			CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_SCTP_CKSUM:
-		td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_SCTP;
+		td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
 		td_offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
-			ICE_TX_DESC_LEN_L4_LEN_S;
+			CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_UDP_CKSUM:
-		td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_UDP;
+		td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
 		td_offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
-			ICE_TX_DESC_LEN_L4_LEN_S;
+			CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	default:
 		break;
 	}
 
-	*txd_hi |= ((uint64_t)td_offset) << ICE_TXD_QW1_OFFSET_S;
+	*txd_hi |= ((uint64_t)td_offset) << CI_TXD_QW1_OFFSET_S;
 
-	/* Tx VLAN insertion Offload */
+	/* Tx VLAN/QINQ insertion Offload */
 	if (ol_flags & RTE_MBUF_F_TX_VLAN) {
-		td_cmd |= ICE_TX_DESC_CMD_IL2TAG1;
-		*txd_hi |= ((uint64_t)tx_pkt->vlan_tci <<
-				ICE_TXD_QW1_L2TAG1_S);
+		td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
+		*txd_hi |= ((uint64_t)tx_pkt->vlan_tci << CI_TXD_QW1_L2TAG1_S);
 	}
 
-	*txd_hi |= ((uint64_t)td_cmd) << ICE_TXD_QW1_CMD_S;
+	*txd_hi |= ((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S;
 }
 #endif
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 23666539ab..587871b54a 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -271,7 +271,7 @@ idpf_qc_single_tx_queue_reset(struct ci_tx_queue *txq)
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
 		txq->ci_tx_ring[i].cmd_type_offset_bsz =
-			rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
 		txe[prev].next_id = i;
@@ -849,7 +849,7 @@ idpf_calc_context_desc(uint64_t flags)
  */
 static inline void
 idpf_set_splitq_tso_ctx(struct rte_mbuf *mbuf,
-			union idpf_tx_offload tx_offload,
+			union ci_tx_offload tx_offload,
 			volatile union idpf_flex_tx_ctx_desc *ctx_desc)
 {
 	uint16_t cmd_dtype;
@@ -887,7 +887,7 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	volatile struct idpf_flex_tx_sched_desc *txr;
 	volatile struct idpf_flex_tx_sched_desc *txd;
 	struct ci_tx_entry *sw_ring;
-	union idpf_tx_offload tx_offload = {0};
+	union ci_tx_offload tx_offload = {0};
 	struct ci_tx_entry *txe, *txn;
 	uint16_t nb_used, tx_id, sw_id;
 	struct rte_mbuf *tx_pkt;
@@ -1334,7 +1334,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 {
 	volatile struct ci_tx_desc *txd;
 	volatile struct ci_tx_desc *txr;
-	union idpf_tx_offload tx_offload = {0};
+	union ci_tx_offload tx_offload = {0};
 	struct ci_tx_entry *txe, *txn;
 	struct ci_tx_entry *sw_ring;
 	struct ci_tx_queue *txq;
@@ -1452,10 +1452,10 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			slen = m_seg->data_len;
 			buf_dma_addr = rte_mbuf_data_iova(m_seg);
 			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DATA |
-				((uint64_t)td_cmd  << IDPF_TXD_QW1_CMD_S) |
-				((uint64_t)td_offset << IDPF_TXD_QW1_OFFSET_S) |
-				((uint64_t)slen << IDPF_TXD_QW1_TX_BUF_SZ_S));
+			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)td_cmd  << CI_TXD_QW1_CMD_S) |
+				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+				((uint64_t)slen << CI_TXD_QW1_TX_BUF_SZ_S));
 
 			txe->last_id = tx_last;
 			tx_id = txe->next_id;
@@ -1464,7 +1464,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		} while (m_seg);
 
 		/* The last packet data descriptor needs End Of Packet (EOP) */
-		td_cmd |= IDPF_TX_DESC_CMD_EOP;
+		td_cmd |= CI_TX_DESC_CMD_EOP;
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
@@ -1473,13 +1473,13 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			       "%4u (port=%d queue=%d)",
 			       tx_last, txq->port_id, txq->queue_id);
 
-			td_cmd |= IDPF_TX_DESC_CMD_RS;
+			td_cmd |= CI_TX_DESC_CMD_RS;
 
 			/* Update txq RS bit counters */
 			txq->nb_tx_used = 0;
 		}
 
-		txd->cmd_type_offset_bsz |= rte_cpu_to_le_16(td_cmd << IDPF_TXD_QW1_CMD_S);
+		txd->cmd_type_offset_bsz |= rte_cpu_to_le_16(td_cmd << CI_TXD_QW1_CMD_S);
 	}
 
 end_of_tx:
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.h b/drivers/net/intel/idpf/idpf_common_rxtx.h
index 2f2fa153b2..b88a87402d 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.h
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.h
@@ -169,18 +169,6 @@ struct idpf_rx_queue {
 	uint32_t hw_register_set;
 };
 
-/* Offload features */
-union idpf_tx_offload {
-	uint64_t data;
-	struct {
-		uint64_t l2_len:7; /* L2 (MAC) Header Length. */
-		uint64_t l3_len:9; /* L3 (IP) Header Length. */
-		uint64_t l4_len:8; /* L4 Header Length. */
-		uint64_t tso_segsz:16; /* TCP TSO segment size */
-		/* uint64_t unused : 24; */
-	};
-};
-
 union idpf_tx_desc {
 	struct ci_tx_desc *tx_ring;
 	struct idpf_flex_tx_sched_desc *desc_ring;
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c b/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
index 04efee3722..411b171b97 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
@@ -486,10 +486,9 @@ static inline void
 idpf_singleq_vtx1(volatile struct ci_tx_desc *txdp,
 		  struct rte_mbuf *pkt, uint64_t flags)
 {
-	uint64_t high_qw =
-		(IDPF_TX_DESC_DTYPE_DATA |
-		 ((uint64_t)flags  << IDPF_TXD_QW1_CMD_S) |
-		 ((uint64_t)pkt->data_len << IDPF_TXD_QW1_TX_BUF_SZ_S));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		 ((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+		 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	__m128i descriptor = _mm_set_epi64x(high_qw,
 				pkt->buf_iova + pkt->data_off);
@@ -500,8 +499,7 @@ static inline void
 idpf_singleq_vtx(volatile struct ci_tx_desc *txdp,
 		 struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
-	const uint64_t hi_qw_tmpl = (IDPF_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << IDPF_TXD_QW1_CMD_S));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags << CI_TXD_QW1_CMD_S));
 
 	/* if unaligned on 32-bit boundary, do one to align */
 	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
@@ -511,22 +509,14 @@ idpf_singleq_vtx(volatile struct ci_tx_desc *txdp,
 
 	/* do two at a time while possible, in bursts */
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
-		uint64_t hi_qw3 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[3]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
-		uint64_t hi_qw2 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[2]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
-		uint64_t hi_qw1 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[1]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
-		uint64_t hi_qw0 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[0]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw3 = hi_qw_tmpl |
+			((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw2 = hi_qw_tmpl |
+			((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw1 = hi_qw_tmpl |
+			((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw0 = hi_qw_tmpl |
+			((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 
 		__m256i desc2_3 =
 			_mm256_set_epi64x
@@ -559,8 +549,8 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts
 	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
-	uint64_t flags = IDPF_TX_DESC_CMD_EOP;
-	uint64_t rs = IDPF_TX_DESC_CMD_RS | flags;
+	uint64_t flags = CI_TX_DESC_CMD_EOP;
+	uint64_t rs = CI_TX_DESC_CMD_RS | flags;
 
 	/* cross rx_thresh boundary is not allowed */
 	nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);
@@ -605,8 +595,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)IDPF_TX_DESC_CMD_RS) <<
-					 IDPF_TXD_QW1_CMD_S);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
index d5e5a2ca5f..49ace35615 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
@@ -1003,10 +1003,9 @@ static __rte_always_inline void
 idpf_singleq_vtx1(volatile struct ci_tx_desc *txdp,
 	  struct rte_mbuf *pkt, uint64_t flags)
 {
-	uint64_t high_qw =
-		(IDPF_TX_DESC_DTYPE_DATA |
-		 ((uint64_t)flags  << IDPF_TXD_QW1_CMD_S) |
-		 ((uint64_t)pkt->data_len << IDPF_TXD_QW1_TX_BUF_SZ_S));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		 ((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+		 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	__m128i descriptor = _mm_set_epi64x(high_qw,
 					    pkt->buf_iova + pkt->data_off);
@@ -1019,8 +1018,7 @@ static __rte_always_inline void
 idpf_singleq_vtx(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
-	const uint64_t hi_qw_tmpl = (IDPF_TX_DESC_DTYPE_DATA  |
-			((uint64_t)flags  << IDPF_TXD_QW1_CMD_S));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA  | (flags << CI_TXD_QW1_CMD_S));
 
 	/* if unaligned on 32-bit boundary, do one to align */
 	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
@@ -1030,22 +1028,14 @@ idpf_singleq_vtx(volatile struct ci_tx_desc *txdp,
 
 	/* do 4 at a time while possible, in bursts */
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
-		uint64_t hi_qw3 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[3]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
-		uint64_t hi_qw2 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[2]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
-		uint64_t hi_qw1 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[1]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
-		uint64_t hi_qw0 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[0]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw3 = hi_qw_tmpl |
+			((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw2 = hi_qw_tmpl |
+			((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw1 = hi_qw_tmpl |
+			((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw0 = hi_qw_tmpl |
+			((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 
 		__m512i desc0_3 =
 			_mm512_set_epi64
@@ -1075,8 +1065,8 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pk
 	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
-	uint64_t flags = IDPF_TX_DESC_CMD_EOP;
-	uint64_t rs = IDPF_TX_DESC_CMD_RS | flags;
+	uint64_t flags = CI_TX_DESC_CMD_EOP;
+	uint64_t rs = CI_TX_DESC_CMD_RS | flags;
 
 	/* cross rx_thresh boundary is not allowed */
 	nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);
@@ -1124,8 +1114,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pk
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)IDPF_TX_DESC_CMD_RS) <<
-					 IDPF_TXD_QW1_CMD_S);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/idpf/idpf_rxtx_vec_common.h b/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
index b5e8574667..a43d8f78e2 100644
--- a/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
+++ b/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
@@ -32,8 +32,8 @@ idpf_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 		return 1;
 
 	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(IDPF_TXD_QW1_DTYPE_M)) ==
-				rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+				rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 }
 
 static inline int
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 05/36] net/intel: create separate header for Tx scalar fns
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (3 preceding siblings ...)
  2026-01-13 15:14   ` [PATCH v2 04/36] net/intel: consolidate definitions for Tx desc fields Bruce Richardson
@ 2026-01-13 15:14   ` Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 06/36] net/intel: add common fn to calculate needed descriptors Bruce Richardson
                     ` (32 subsequent siblings)
  37 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-13 15:14 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

Rather than having all Tx code in the one file, which could start
getting rather long, move the scalar datapath functions to a new header
file.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h            | 58 ++------------------
 drivers/net/intel/common/tx_scalar_fns.h | 67 ++++++++++++++++++++++++
 2 files changed, 72 insertions(+), 53 deletions(-)
 create mode 100644 drivers/net/intel/common/tx_scalar_fns.h

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 3d3d9ad8e3..320ab0b8e0 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -309,59 +309,6 @@ ci_tx_free_bufs_vec(struct ci_tx_queue *txq, ci_desc_done_fn desc_done, bool ctx
 	return txq->tx_rs_thresh;
 }
 
-/*
- * Common transmit descriptor cleanup function for Intel drivers.
- * Used by ice, i40e, iavf, and idpf drivers.
- *
- * Returns:
- *   0 on success
- *  -1 if cleanup cannot proceed (descriptors not yet processed by HW)
- */
-static __rte_always_inline int
-ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
-{
-	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
-	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
-	uint16_t nb_tx_desc = txq->nb_tx_desc;
-	uint16_t desc_to_clean_to;
-	uint16_t nb_tx_to_clean;
-
-	/* Determine the last descriptor needing to be cleaned */
-	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
-	if (desc_to_clean_to >= nb_tx_desc)
-		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
-
-	/* Check to make sure the last descriptor to clean is done */
-	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-
-	/* Check if descriptor is done - all drivers use 0xF as done value in bits 3:0 */
-	if ((txd[desc_to_clean_to].cmd_type_offset_bsz & rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
-			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE)) {
-		/* Descriptor not yet processed by hardware */
-		return -1;
-	}
-
-	/* Figure out how many descriptors will be cleaned */
-	if (last_desc_cleaned > desc_to_clean_to)
-		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) + desc_to_clean_to);
-	else
-		nb_tx_to_clean = (uint16_t)(desc_to_clean_to - last_desc_cleaned);
-
-	/* The last descriptor to clean is done, so that means all the
-	 * descriptors from the last descriptor that was cleaned
-	 * up to the last descriptor with the RS bit set
-	 * are done. Only reset the threshold descriptor.
-	 */
-	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
-
-	/* Update the txq to reflect the last descriptor that was cleaned */
-	txq->last_desc_cleaned = desc_to_clean_to;
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
-
-	return 0;
-}
-
 static inline void
 ci_txq_release_all_mbufs(struct ci_tx_queue *txq, bool use_ctx)
 {
@@ -480,4 +427,9 @@ ci_tx_path_select(const struct ci_tx_path_features *req_features,
 	return idx;
 }
 
+/* include the scalar functions at the end, so they can use the common definitions.
+ * This is done so drivers can use all functions just by including tx.h
+ */
+#include "tx_scalar_fns.h"
+
 #endif /* _COMMON_INTEL_TX_H_ */
diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
new file mode 100644
index 0000000000..c79210d084
--- /dev/null
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -0,0 +1,67 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2025 Intel Corporation
+ */
+
+#ifndef _COMMON_INTEL_TX_SCALAR_FNS_H_
+#define _COMMON_INTEL_TX_SCALAR_FNS_H_
+
+#include <stdint.h>
+#include <rte_byteorder.h>
+
+/* depends on common Tx definitions. */
+#include "tx.h"
+
+/*
+ * Common transmit descriptor cleanup function for Intel drivers.
+ * Used by ice, i40e, iavf, and idpf drivers.
+ *
+ * Returns:
+ *   0 on success
+ *  -1 if cleanup cannot proceed (descriptors not yet processed by HW)
+ */
+static __rte_always_inline int
+ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
+{
+	struct ci_tx_entry *sw_ring = txq->sw_ring;
+	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
+	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
+	uint16_t nb_tx_desc = txq->nb_tx_desc;
+	uint16_t desc_to_clean_to;
+	uint16_t nb_tx_to_clean;
+
+	/* Determine the last descriptor needing to be cleaned */
+	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
+	if (desc_to_clean_to >= nb_tx_desc)
+		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
+
+	/* Check to make sure the last descriptor to clean is done */
+	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
+
+	/* Check if descriptor is done - all drivers use 0xF as done value in bits 3:0 */
+	if ((txd[desc_to_clean_to].cmd_type_offset_bsz & rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE)) {
+		/* Descriptor not yet processed by hardware */
+		return -1;
+	}
+
+	/* Figure out how many descriptors will be cleaned */
+	if (last_desc_cleaned > desc_to_clean_to)
+		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) + desc_to_clean_to);
+	else
+		nb_tx_to_clean = (uint16_t)(desc_to_clean_to - last_desc_cleaned);
+
+	/* The last descriptor to clean is done, so that means all the
+	 * descriptors from the last descriptor that was cleaned
+	 * up to the last descriptor with the RS bit set
+	 * are done. Only reset the threshold descriptor.
+	 */
+	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
+
+	/* Update the txq to reflect the last descriptor that was cleaned */
+	txq->last_desc_cleaned = desc_to_clean_to;
+	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
+
+	return 0;
+}
+
+#endif /* _COMMON_INTEL_TX_SCALAR_FNS_H_ */
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 06/36] net/intel: add common fn to calculate needed descriptors
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (4 preceding siblings ...)
  2026-01-13 15:14   ` [PATCH v2 05/36] net/intel: create separate header for Tx scalar fns Bruce Richardson
@ 2026-01-13 15:14   ` Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 07/36] net/ice: refactor context descriptor handling Bruce Richardson
                     ` (31 subsequent siblings)
  37 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-13 15:14 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Vladimir Medvedkin, Anatoly Burakov,
	Jingjing Wu, Praveen Shetty

Multiple drivers used the same logic to calculate how many Tx data
descriptors were needed. Move that calculation to common code. In the
process of updating drivers, fix idpf driver calculation for the TSO
case.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar_fns.h  | 21 +++++++++++++++++++++
 drivers/net/intel/i40e/i40e_rxtx.c        | 18 +-----------------
 drivers/net/intel/iavf/iavf_rxtx.c        | 17 +----------------
 drivers/net/intel/ice/ice_rxtx.c          | 18 +-----------------
 drivers/net/intel/idpf/idpf_common_rxtx.c | 21 +++++++++++++++++----
 5 files changed, 41 insertions(+), 54 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index c79210d084..f894cea616 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -64,4 +64,25 @@ ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
 	return 0;
 }
 
+static inline uint16_t
+ci_div_roundup16(uint16_t x, uint16_t y)
+{
+	return (uint16_t)((x + y - 1) / y);
+}
+
+/* Calculate the number of TX descriptors needed for each pkt */
+static inline uint16_t
+ci_calc_pkt_desc(const struct rte_mbuf *tx_pkt)
+{
+	uint16_t count = 0;
+
+	while (tx_pkt != NULL) {
+		count += ci_div_roundup16(tx_pkt->data_len, CI_MAX_DATA_PER_TXD);
+		tx_pkt = tx_pkt->next;
+	}
+
+	return count;
+}
+
+
 #endif /* _COMMON_INTEL_TX_SCALAR_FNS_H_ */
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 701129eba3..6598450d8a 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -1029,21 +1029,6 @@ i40e_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 	return ctx_desc;
 }
 
-/* Calculate the number of TX descriptors needed for each pkt */
-static inline uint16_t
-i40e_calc_pkt_desc(struct rte_mbuf *tx_pkt)
-{
-	struct rte_mbuf *txd = tx_pkt;
-	uint16_t count = 0;
-
-	while (txd != NULL) {
-		count += DIV_ROUND_UP(txd->data_len, CI_MAX_DATA_PER_TXD);
-		txd = txd->next;
-	}
-
-	return count;
-}
-
 uint16_t
 i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
@@ -1106,8 +1091,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		 * per tx desc.
 		 */
 		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-			nb_used = (uint16_t)(i40e_calc_pkt_desc(tx_pkt) +
-					     nb_ctx);
+			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
 		else
 			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
 		tx_last = (uint16_t)(tx_id + nb_used - 1);
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index 0bf15aae5e..dc4c64d169 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -2671,21 +2671,6 @@ iavf_build_data_desc_cmd_offset_fields(volatile uint64_t *qw1,
 		((uint64_t)l2tag1 << IAVF_TXD_DATA_QW1_L2TAG1_SHIFT));
 }
 
-/* Calculate the number of TX descriptors needed for each pkt */
-static inline uint16_t
-iavf_calc_pkt_desc(struct rte_mbuf *tx_pkt)
-{
-	struct rte_mbuf *txd = tx_pkt;
-	uint16_t count = 0;
-
-	while (txd != NULL) {
-		count += (txd->data_len + CI_MAX_DATA_PER_TXD - 1) / CI_MAX_DATA_PER_TXD;
-		txd = txd->next;
-	}
-
-	return count;
-}
-
 static inline void
 iavf_fill_data_desc(volatile struct ci_tx_desc *desc,
 	uint64_t desc_template,	uint16_t buffsz,
@@ -2771,7 +2756,7 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		 * per tx desc.
 		 */
 		if (mb->ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-			nb_desc_required = iavf_calc_pkt_desc(mb) + nb_desc_ctx + nb_desc_ipsec;
+			nb_desc_required = ci_calc_pkt_desc(mb) + nb_desc_ctx + nb_desc_ipsec;
 		else
 			nb_desc_required = nb_desc_data + nb_desc_ctx + nb_desc_ipsec;
 
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index b5395a803f..6599e4d57f 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3063,21 +3063,6 @@ ice_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 	return ctx_desc;
 }
 
-/* Calculate the number of TX descriptors needed for each pkt */
-static inline uint16_t
-ice_calc_pkt_desc(struct rte_mbuf *tx_pkt)
-{
-	struct rte_mbuf *txd = tx_pkt;
-	uint16_t count = 0;
-
-	while (txd != NULL) {
-		count += DIV_ROUND_UP(txd->data_len, CI_MAX_DATA_PER_TXD);
-		txd = txd->next;
-	}
-
-	return count;
-}
-
 uint16_t
 ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
@@ -3140,8 +3125,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		 * per tx desc.
 		 */
 		if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
-			nb_used = (uint16_t)(ice_calc_pkt_desc(tx_pkt) +
-					     nb_ctx);
+			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
 		else
 			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
 		tx_last = (uint16_t)(tx_id + nb_used - 1);
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 587871b54a..11d6848430 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -934,7 +934,16 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		tx_offload.tso_segsz = tx_pkt->tso_segsz;
 		/* Calculate the number of context descriptors needed. */
 		nb_ctx = idpf_calc_context_desc(ol_flags);
-		nb_used = tx_pkt->nb_segs + nb_ctx;
+
+		/* Calculate the number of TX descriptors needed for
+		 * each packet. For TSO packets, use ci_calc_pkt_desc as
+		 * the mbuf data size might exceed max data size that hw allows
+		 * per tx desc.
+		 */
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+			nb_used = ci_calc_pkt_desc(tx_pkt) + nb_ctx;
+		else
+			nb_used = tx_pkt->nb_segs + nb_ctx;
 
 		if (ol_flags & IDPF_TX_CKSUM_OFFLOAD_MASK)
 			cmd_dtype = IDPF_TXD_FLEX_FLOW_CMD_CS_EN;
@@ -1382,10 +1391,14 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		nb_ctx = idpf_calc_context_desc(ol_flags);
 
 		/* The number of descriptors that must be allocated for
-		 * a packet equals to the number of the segments of that
-		 * packet plus 1 context descriptor if needed.
+		 * a packet. For TSO packets, use ci_calc_pkt_desc as
+		 * the mbuf data size might exceed max data size that hw allows
+		 * per tx desc.
 		 */
-		nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
+		else
+			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
 		tx_last = (uint16_t)(tx_id + nb_used - 1);
 
 		/* Circular ring */
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 07/36] net/ice: refactor context descriptor handling
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (5 preceding siblings ...)
  2026-01-13 15:14   ` [PATCH v2 06/36] net/intel: add common fn to calculate needed descriptors Bruce Richardson
@ 2026-01-13 15:14   ` Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 08/36] net/i40e: " Bruce Richardson
                     ` (30 subsequent siblings)
  37 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-13 15:14 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

Create a single function to manage all context descriptor handling,
which returns either 0 or 1 depending on whether a descriptor is needed
or not, as well as returning directly the descriptor contents if
relevant.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/ice/ice_rxtx.c | 96 ++++++++++++++++++--------------
 1 file changed, 55 insertions(+), 41 deletions(-)

diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 6599e4d57f..f9dcc30208 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3040,7 +3040,7 @@ ice_calc_context_desc(uint64_t flags)
 
 /* set ice TSO context descriptor */
 static inline uint64_t
-ice_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
+ice_set_tso_ctx(uint64_t ol_flags, const struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 {
 	uint64_t ctx_desc = 0;
 	uint32_t cd_cmd, hdr_len, cd_tso_len;
@@ -3051,7 +3051,7 @@ ice_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 	}
 
 	hdr_len = tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len;
-	hdr_len += (mbuf->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) ?
+	hdr_len += (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) ?
 		   tx_offload.outer_l2_len + tx_offload.outer_l3_len : 0;
 
 	cd_cmd = CI_TX_CTX_DESC_TSO;
@@ -3063,6 +3063,51 @@ ice_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 	return ctx_desc;
 }
 
+/* compute a context descriptor if one is necessary based on the ol_flags
+ *
+ * Returns 0 if no descriptor is necessary.
+ * Returns 1 if one is necessary and the contents of the descriptor are returned
+ *   in the values pointed to by qw0 and qw1. td_offset may also be modified.
+ */
+static __rte_always_inline uint16_t
+get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
+	const union ci_tx_offload *tx_offload, const struct ci_tx_queue *txq,
+	uint32_t *td_offset, uint64_t *qw0, uint64_t *qw1)
+{
+	uint16_t cd_l2tag2 = 0;
+	uint64_t cd_type_cmd_tso_mss = ICE_TX_DESC_DTYPE_CTX;
+	uint32_t cd_tunneling_params = 0;
+	uint64_t ptp_tx_index = txq->ice_vsi->adapter->ptp_tx_index;
+
+	if (ice_calc_context_desc(ol_flags) == 0)
+		return 0;
+
+	if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) {
+		*td_offset |= (tx_offload->outer_l2_len >> 1) << CI_TX_DESC_LEN_MACLEN_S;
+		ice_parse_tunneling_params(ol_flags, *tx_offload, &cd_tunneling_params);
+	}
+
+	if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
+		cd_type_cmd_tso_mss |= ice_set_tso_ctx(ol_flags, tx_pkt, *tx_offload);
+	else if (ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST)
+		cd_type_cmd_tso_mss |=
+			((uint64_t)CI_TX_CTX_DESC_TSYN << CI_TXD_QW1_CMD_S) |
+			((ptp_tx_index << ICE_TXD_CTX_QW1_TSYN_S) & ICE_TXD_CTX_QW1_TSYN_M);
+
+
+	/* TX context descriptor based double VLAN insert */
+	if (ol_flags & RTE_MBUF_F_TX_QINQ) {
+		cd_l2tag2 = tx_pkt->vlan_tci_outer;
+		cd_type_cmd_tso_mss |= ((uint64_t)CI_TX_CTX_DESC_IL2TAG2 << CI_TXD_QW1_CMD_S);
+	}
+
+	*qw0 = rte_cpu_to_le_32(cd_tunneling_params) |
+		((uint64_t)rte_cpu_to_le_16(cd_l2tag2) << 32);
+	*qw1 = rte_cpu_to_le_64(cd_type_cmd_tso_mss);
+
+	return 1;
+}
+
 uint16_t
 ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
@@ -3073,7 +3118,6 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	struct ci_tx_entry *txe, *txn;
 	struct rte_mbuf *tx_pkt;
 	struct rte_mbuf *m_seg;
-	uint32_t cd_tunneling_params;
 	uint16_t tx_id;
 	uint16_t ts_id = -1;
 	uint16_t nb_tx;
@@ -3102,20 +3146,24 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		(void)ci_tx_xmit_cleanup(txq);
 
 	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
+		uint64_t cd_qw0, cd_qw1;
 		tx_pkt = *tx_pkts++;
 
 		td_cmd = 0;
 		td_tag = 0;
 		td_offset = 0;
 		ol_flags = tx_pkt->ol_flags;
+
 		tx_offload.l2_len = tx_pkt->l2_len;
 		tx_offload.l3_len = tx_pkt->l3_len;
 		tx_offload.outer_l2_len = tx_pkt->outer_l2_len;
 		tx_offload.outer_l3_len = tx_pkt->outer_l3_len;
 		tx_offload.l4_len = tx_pkt->l4_len;
 		tx_offload.tso_segsz = tx_pkt->tso_segsz;
+
 		/* Calculate the number of context descriptors needed. */
-		nb_ctx = ice_calc_context_desc(ol_flags);
+		nb_ctx = get_context_desc(ol_flags, tx_pkt, &tx_offload,
+			txq, &td_offset, &cd_qw0, &cd_qw1);
 
 		/* The number of descriptors that must be allocated for
 		 * a packet equals to the number of the segments of that
@@ -3157,15 +3205,6 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			td_tag = tx_pkt->vlan_tci;
 		}
 
-		/* Fill in tunneling parameters if necessary */
-		cd_tunneling_params = 0;
-		if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) {
-			td_offset |= (tx_offload.outer_l2_len >> 1)
-				<< CI_TX_DESC_LEN_MACLEN_S;
-			ice_parse_tunneling_params(ol_flags, tx_offload,
-						   &cd_tunneling_params);
-		}
-
 		/* Enable checksum offloading */
 		if (ol_flags & ICE_TX_CKSUM_OFFLOAD_MASK)
 			ice_txd_enable_checksum(ol_flags, &td_cmd,
@@ -3173,11 +3212,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		if (nb_ctx) {
 			/* Setup TX context descriptor if required */
-			volatile struct ice_tx_ctx_desc *ctx_txd =
-				(volatile struct ice_tx_ctx_desc *)
-					&ci_tx_ring[tx_id];
-			uint16_t cd_l2tag2 = 0;
-			uint64_t cd_type_cmd_tso_mss = ICE_TX_DESC_DTYPE_CTX;
+			uint64_t *ctx_txd = RTE_CAST_PTR(uint64_t *, &ci_tx_ring[tx_id]);
 
 			txn = &sw_ring[txe->next_id];
 			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
@@ -3186,29 +3221,8 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 				txe->mbuf = NULL;
 			}
 
-			if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
-				cd_type_cmd_tso_mss |=
-					ice_set_tso_ctx(tx_pkt, tx_offload);
-			else if (ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST)
-				cd_type_cmd_tso_mss |=
-					((uint64_t)CI_TX_CTX_DESC_TSYN <<
-					CI_TXD_QW1_CMD_S) |
-					 (((uint64_t)txq->ice_vsi->adapter->ptp_tx_index <<
-					 ICE_TXD_CTX_QW1_TSYN_S) & ICE_TXD_CTX_QW1_TSYN_M);
-
-			ctx_txd->tunneling_params =
-				rte_cpu_to_le_32(cd_tunneling_params);
-
-			/* TX context descriptor based double VLAN insert */
-			if (ol_flags & RTE_MBUF_F_TX_QINQ) {
-				cd_l2tag2 = tx_pkt->vlan_tci_outer;
-				cd_type_cmd_tso_mss |=
-					((uint64_t)CI_TX_CTX_DESC_IL2TAG2 <<
-					 CI_TXD_QW1_CMD_S);
-			}
-			ctx_txd->l2tag2 = rte_cpu_to_le_16(cd_l2tag2);
-			ctx_txd->qw1 =
-				rte_cpu_to_le_64(cd_type_cmd_tso_mss);
+			ctx_txd[0] = cd_qw0;
+			ctx_txd[1] = cd_qw1;
 
 			txe->last_id = tx_last;
 			tx_id = txe->next_id;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 08/36] net/i40e: refactor context descriptor handling
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (6 preceding siblings ...)
  2026-01-13 15:14   ` [PATCH v2 07/36] net/ice: refactor context descriptor handling Bruce Richardson
@ 2026-01-13 15:14   ` Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 09/36] net/idpf: " Bruce Richardson
                     ` (29 subsequent siblings)
  37 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-13 15:14 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

move all context descriptor handling to a single function, as with the
ice driver, and use the same function signature as that driver.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/i40e/i40e_rxtx.c | 110 ++++++++++++++++-------------
 1 file changed, 59 insertions(+), 51 deletions(-)

diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 6598450d8a..7646ea07aa 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -1004,7 +1004,7 @@ i40e_calc_context_desc(uint64_t flags)
 
 /* set i40e TSO context descriptor */
 static inline uint64_t
-i40e_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
+i40e_set_tso_ctx(uint64_t ol_flags, const struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 {
 	uint64_t ctx_desc = 0;
 	uint32_t cd_cmd, hdr_len, cd_tso_len;
@@ -1015,7 +1015,7 @@ i40e_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 	}
 
 	hdr_len = tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len;
-	hdr_len += (mbuf->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) ?
+	hdr_len += (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) ?
 		   tx_offload.outer_l2_len + tx_offload.outer_l3_len : 0;
 
 	cd_cmd = I40E_TX_CTX_DESC_TSO;
@@ -1029,6 +1029,54 @@ i40e_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 	return ctx_desc;
 }
 
+/* compute a context descriptor if one is necessary based on the ol_flags
+ *
+ * Returns 0 if no descriptor is necessary.
+ * Returns 1 if one is necessary and the contents of the descriptor are returned
+ *   in the values pointed to by qw0 and qw1. td_offset may also be modified.
+ */
+static __rte_always_inline uint16_t
+get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
+		 const union ci_tx_offload *tx_offload,
+		 const struct ci_tx_queue *txq __rte_unused,
+		 uint32_t *td_offset, uint64_t *qw0, uint64_t *qw1)
+{
+	uint16_t cd_l2tag2 = 0;
+	uint64_t cd_type_cmd_tso_mss = I40E_TX_DESC_DTYPE_CONTEXT;
+	uint32_t cd_tunneling_params = 0;
+
+	if (i40e_calc_context_desc(ol_flags) == 0)
+		return 0;
+
+	if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) {
+		*td_offset |= (tx_offload->outer_l2_len >> 1) << CI_TX_DESC_LEN_MACLEN_S;
+		i40e_parse_tunneling_params(ol_flags, *tx_offload, &cd_tunneling_params);
+	}
+
+	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+		cd_type_cmd_tso_mss |= i40e_set_tso_ctx(ol_flags, tx_pkt, *tx_offload);
+	} else {
+#ifdef RTE_LIBRTE_IEEE1588
+		if (ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST)
+			cd_type_cmd_tso_mss |=
+				((uint64_t)I40E_TX_CTX_DESC_TSYN << I40E_TXD_CTX_QW1_CMD_SHIFT);
+#endif
+	}
+
+	/* TX context descriptor based double VLAN insert */
+	if (ol_flags & RTE_MBUF_F_TX_QINQ) {
+		cd_l2tag2 = tx_pkt->vlan_tci_outer;
+		cd_type_cmd_tso_mss |=
+				((uint64_t)I40E_TX_CTX_DESC_IL2TAG2 << I40E_TXD_CTX_QW1_CMD_SHIFT);
+	}
+
+	*qw0 = rte_cpu_to_le_32(cd_tunneling_params) |
+		((uint64_t)rte_cpu_to_le_16(cd_l2tag2) << 32);
+	*qw1 = rte_cpu_to_le_64(cd_type_cmd_tso_mss);
+
+	return 1;
+}
+
 uint16_t
 i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
@@ -1039,7 +1087,6 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	volatile struct ci_tx_desc *txr;
 	struct rte_mbuf *tx_pkt;
 	struct rte_mbuf *m_seg;
-	uint32_t cd_tunneling_params;
 	uint16_t tx_id;
 	uint16_t nb_tx;
 	uint32_t td_cmd;
@@ -1080,7 +1127,9 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		tx_offload.tso_segsz = tx_pkt->tso_segsz;
 
 		/* Calculate the number of context descriptors needed. */
-		nb_ctx = i40e_calc_context_desc(ol_flags);
+		uint64_t cd_qw0 = 0, cd_qw1 = 0;
+		nb_ctx = get_context_desc(ol_flags, tx_pkt, &tx_offload, txq, &td_offset,
+				&cd_qw0, &cd_qw1);
 
 		/**
 		 * The number of descriptors that must be allocated for
@@ -1126,14 +1175,6 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		/* Always enable CRC offload insertion */
 		td_cmd |= CI_TX_DESC_CMD_ICRC;
 
-		/* Fill in tunneling parameters if necessary */
-		cd_tunneling_params = 0;
-		if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) {
-			td_offset |= (tx_offload.outer_l2_len >> 1)
-					<< CI_TX_DESC_LEN_MACLEN_S;
-			i40e_parse_tunneling_params(ol_flags, tx_offload,
-						    &cd_tunneling_params);
-		}
 		/* Enable checksum offloading */
 		if (ol_flags & I40E_TX_CKSUM_OFFLOAD_MASK)
 			i40e_txd_enable_checksum(ol_flags, &td_cmd,
@@ -1141,12 +1182,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		if (nb_ctx) {
 			/* Setup TX context descriptor if required */
-			volatile struct i40e_tx_context_desc *ctx_txd =
-				(volatile struct i40e_tx_context_desc *)\
-							&txr[tx_id];
-			uint16_t cd_l2tag2 = 0;
-			uint64_t cd_type_cmd_tso_mss =
-				I40E_TX_DESC_DTYPE_CONTEXT;
+			uint64_t *desc = RTE_CAST_PTR(uint64_t *, &txr[tx_id]);
 
 			txn = &sw_ring[txe->next_id];
 			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
@@ -1155,41 +1191,13 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 				txe->mbuf = NULL;
 			}
 
-			/* TSO enabled means no timestamp */
-			if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-				cd_type_cmd_tso_mss |=
-					i40e_set_tso_ctx(tx_pkt, tx_offload);
-			else {
-#ifdef RTE_LIBRTE_IEEE1588
-				if (ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST)
-					cd_type_cmd_tso_mss |=
-						((uint64_t)I40E_TX_CTX_DESC_TSYN <<
-						 I40E_TXD_CTX_QW1_CMD_SHIFT);
-#endif
-			}
-
-			ctx_txd->tunneling_params =
-				rte_cpu_to_le_32(cd_tunneling_params);
-			if (ol_flags & RTE_MBUF_F_TX_QINQ) {
-				cd_l2tag2 = tx_pkt->vlan_tci_outer;
-				cd_type_cmd_tso_mss |=
-					((uint64_t)I40E_TX_CTX_DESC_IL2TAG2 <<
-						I40E_TXD_CTX_QW1_CMD_SHIFT);
-			}
-			ctx_txd->l2tag2 = rte_cpu_to_le_16(cd_l2tag2);
-			ctx_txd->type_cmd_tso_mss =
-				rte_cpu_to_le_64(cd_type_cmd_tso_mss);
+			desc[0] = cd_qw0;
+			desc[1] = cd_qw1;
 
 			PMD_TX_LOG(DEBUG, "mbuf: %p, TCD[%u]: "
-				"tunneling_params: %#x; "
-				"l2tag2: %#hx; "
-				"rsvd: %#hx; "
-				"type_cmd_tso_mss: %#"PRIx64";",
-				tx_pkt, tx_id,
-				ctx_txd->tunneling_params,
-				ctx_txd->l2tag2,
-				ctx_txd->rsvd,
-				ctx_txd->type_cmd_tso_mss);
+				"qw0: %#"PRIx64"; "
+				"qw1: %#"PRIx64";",
+				tx_pkt, tx_id, cd_qw0, cd_qw1);
 
 			txe->last_id = tx_last;
 			tx_id = txe->next_id;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 09/36] net/idpf: refactor context descriptor handling
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (7 preceding siblings ...)
  2026-01-13 15:14   ` [PATCH v2 08/36] net/i40e: " Bruce Richardson
@ 2026-01-13 15:14   ` Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 10/36] net/intel: consolidate checksum mask definition Bruce Richardson
                     ` (28 subsequent siblings)
  37 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-13 15:14 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Jingjing Wu, Praveen Shetty

move all context descriptor handling to a single function, as with the
ice driver.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/idpf/idpf_common_rxtx.c | 61 +++++++++++------------
 1 file changed, 28 insertions(+), 33 deletions(-)

diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 11d6848430..9219ad9047 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -845,37 +845,36 @@ idpf_calc_context_desc(uint64_t flags)
 	return 0;
 }
 
-/* set TSO context descriptor
+/* set TSO context descriptor, returns 0 if no context needed, 1 if context set
  */
-static inline void
-idpf_set_splitq_tso_ctx(struct rte_mbuf *mbuf,
+static inline uint16_t
+idpf_set_tso_ctx(uint64_t ol_flags, struct rte_mbuf *mbuf,
 			union ci_tx_offload tx_offload,
-			volatile union idpf_flex_tx_ctx_desc *ctx_desc)
+			uint64_t *qw0, uint64_t *qw1)
 {
-	uint16_t cmd_dtype;
+	uint16_t cmd_dtype = IDPF_TX_DESC_DTYPE_FLEX_TSO_CTX | IDPF_TX_FLEX_CTX_DESC_CMD_TSO;
+	uint16_t tso_segsz = mbuf->tso_segsz;
 	uint32_t tso_len;
 	uint8_t hdr_len;
 
+	if (idpf_calc_context_desc(ol_flags) == 0)
+		return 0;
+
+	/* TSO context descriptor setup */
 	if (tx_offload.l4_len == 0) {
 		TX_LOG(DEBUG, "L4 length set to 0");
-		return;
+		return 0;
 	}
 
-	hdr_len = tx_offload.l2_len +
-		tx_offload.l3_len +
-		tx_offload.l4_len;
-	cmd_dtype = IDPF_TX_DESC_DTYPE_FLEX_TSO_CTX |
-		IDPF_TX_FLEX_CTX_DESC_CMD_TSO;
+	hdr_len = tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len;
 	tso_len = mbuf->pkt_len - hdr_len;
 
-	ctx_desc->tso.qw1.cmd_dtype = rte_cpu_to_le_16(cmd_dtype);
-	ctx_desc->tso.qw0.hdr_len = hdr_len;
-	ctx_desc->tso.qw0.mss_rt =
-		rte_cpu_to_le_16((uint16_t)mbuf->tso_segsz &
-				 IDPF_TXD_FLEX_CTX_MSS_RT_M);
-	ctx_desc->tso.qw0.flex_tlen =
-		rte_cpu_to_le_32(tso_len &
-				 IDPF_TXD_FLEX_CTX_MSS_RT_M);
+	*qw0 = rte_cpu_to_le_32(tso_len & IDPF_TXD_FLEX_CTX_MSS_RT_M) |
+	       ((uint64_t)rte_cpu_to_le_16(tso_segsz & IDPF_TXD_FLEX_CTX_MSS_RT_M) << 32) |
+	       ((uint64_t)hdr_len << 48);
+	*qw1 = rte_cpu_to_le_16(cmd_dtype);
+
+	return 1;
 }
 
 RTE_EXPORT_INTERNAL_SYMBOL(idpf_dp_splitq_xmit_pkts)
@@ -933,7 +932,8 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		tx_offload.l4_len = tx_pkt->l4_len;
 		tx_offload.tso_segsz = tx_pkt->tso_segsz;
 		/* Calculate the number of context descriptors needed. */
-		nb_ctx = idpf_calc_context_desc(ol_flags);
+		uint64_t cd_qw0, cd_qw1;
+		nb_ctx = idpf_set_tso_ctx(ol_flags, tx_pkt, tx_offload, &cd_qw0, &cd_qw1);
 
 		/* Calculate the number of TX descriptors needed for
 		 * each packet. For TSO packets, use ci_calc_pkt_desc as
@@ -950,12 +950,10 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 		/* context descriptor */
 		if (nb_ctx != 0) {
-			volatile union idpf_flex_tx_ctx_desc *ctx_desc =
-				(volatile union idpf_flex_tx_ctx_desc *)&txr[tx_id];
+			uint64_t *ctx_desc = RTE_CAST_PTR(uint64_t *, &txr[tx_id]);
 
-			if ((ol_flags & RTE_MBUF_F_TX_TCP_SEG) != 0)
-				idpf_set_splitq_tso_ctx(tx_pkt, tx_offload,
-							ctx_desc);
+			ctx_desc[0] = cd_qw0;
+			ctx_desc[1] = cd_qw1;
 
 			tx_id++;
 			if (tx_id == txq->nb_tx_desc)
@@ -1388,7 +1386,8 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		tx_offload.l4_len = tx_pkt->l4_len;
 		tx_offload.tso_segsz = tx_pkt->tso_segsz;
 		/* Calculate the number of context descriptors needed. */
-		nb_ctx = idpf_calc_context_desc(ol_flags);
+		uint64_t cd_qw0, cd_qw1;
+		nb_ctx = idpf_set_tso_ctx(ol_flags, tx_pkt, tx_offload, &cd_qw0, &cd_qw1);
 
 		/* The number of descriptors that must be allocated for
 		 * a packet. For TSO packets, use ci_calc_pkt_desc as
@@ -1431,9 +1430,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 		if (nb_ctx != 0) {
 			/* Setup TX context descriptor if required */
-			volatile union idpf_flex_tx_ctx_desc *ctx_txd =
-				(volatile union idpf_flex_tx_ctx_desc *)
-				&txr[tx_id];
+			uint64_t *ctx_txd = RTE_CAST_PTR(uint64_t *, &txr[tx_id]);
 
 			txn = &sw_ring[txe->next_id];
 			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
@@ -1442,10 +1439,8 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 				txe->mbuf = NULL;
 			}
 
-			/* TSO enabled */
-			if ((ol_flags & RTE_MBUF_F_TX_TCP_SEG) != 0)
-				idpf_set_splitq_tso_ctx(tx_pkt, tx_offload,
-							ctx_txd);
+			ctx_txd[0] = cd_qw0;
+			ctx_txd[1] = cd_qw1;
 
 			txe->last_id = tx_last;
 			tx_id = txe->next_id;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 10/36] net/intel: consolidate checksum mask definition
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (8 preceding siblings ...)
  2026-01-13 15:14   ` [PATCH v2 09/36] net/idpf: " Bruce Richardson
@ 2026-01-13 15:14   ` Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 11/36] net/intel: create common checksum Tx offload function Bruce Richardson
                     ` (27 subsequent siblings)
  37 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-13 15:14 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Vladimir Medvedkin, Anatoly Burakov,
	Jingjing Wu, Praveen Shetty

Create a common definition for checksum masks across iavf, idpf, i40e
and ice drivers.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h             | 7 +++++++
 drivers/net/intel/i40e/i40e_rxtx.c        | 7 +------
 drivers/net/intel/iavf/iavf_rxtx.c        | 2 +-
 drivers/net/intel/iavf/iavf_rxtx.h        | 8 --------
 drivers/net/intel/ice/ice_rxtx.c          | 8 +-------
 drivers/net/intel/idpf/idpf_common_rxtx.c | 4 ++--
 drivers/net/intel/idpf/idpf_common_rxtx.h | 7 +------
 7 files changed, 13 insertions(+), 30 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 320ab0b8e0..a71b98f119 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -53,6 +53,13 @@
 /* Common maximum data per TX descriptor */
 #define CI_MAX_DATA_PER_TXD     (CI_TXD_QW1_TX_BUF_SZ_M >> CI_TXD_QW1_TX_BUF_SZ_S)
 
+/* Checksum offload mask to identify packets requesting offload */
+#define CI_TX_CKSUM_OFFLOAD_MASK (RTE_MBUF_F_TX_IP_CKSUM |		 \
+				   RTE_MBUF_F_TX_L4_MASK |		 \
+				   RTE_MBUF_F_TX_TCP_SEG |		 \
+				   RTE_MBUF_F_TX_OUTER_IP_CKSUM |	 \
+				   RTE_MBUF_F_TX_OUTER_UDP_CKSUM)
+
 /**
  * Common TX offload union for Intel drivers.
  * Supports both basic offloads (l2_len, l3_len, l4_len, tso_segsz) and
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 7646ea07aa..db36ec86f7 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -53,11 +53,6 @@
 #define I40E_TX_IEEE1588_TMST 0
 #endif
 
-#define I40E_TX_CKSUM_OFFLOAD_MASK (RTE_MBUF_F_TX_IP_CKSUM |		 \
-		RTE_MBUF_F_TX_L4_MASK |		 \
-		RTE_MBUF_F_TX_TCP_SEG |		 \
-		RTE_MBUF_F_TX_OUTER_IP_CKSUM)
-
 #define I40E_TX_OFFLOAD_MASK (RTE_MBUF_F_TX_OUTER_IPV4 |	\
 		RTE_MBUF_F_TX_OUTER_IPV6 |	\
 		RTE_MBUF_F_TX_IPV4 |		\
@@ -1176,7 +1171,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		td_cmd |= CI_TX_DESC_CMD_ICRC;
 
 		/* Enable checksum offloading */
-		if (ol_flags & I40E_TX_CKSUM_OFFLOAD_MASK)
+		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
 			i40e_txd_enable_checksum(ol_flags, &td_cmd,
 						 &td_offset, tx_offload);
 
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index dc4c64d169..75f2e143f9 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -2601,7 +2601,7 @@ iavf_build_data_desc_cmd_offset_fields(volatile uint64_t *qw1,
 	}
 
 	if ((m->ol_flags &
-	    (IAVF_TX_CKSUM_OFFLOAD_MASK | RTE_MBUF_F_TX_SEC_OFFLOAD)) == 0)
+	    (CI_TX_CKSUM_OFFLOAD_MASK | RTE_MBUF_F_TX_SEC_OFFLOAD)) == 0)
 		goto skip_cksum;
 
 	/* Set MACLEN */
diff --git a/drivers/net/intel/iavf/iavf_rxtx.h b/drivers/net/intel/iavf/iavf_rxtx.h
index 395d97b4ee..cca5c25119 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.h
+++ b/drivers/net/intel/iavf/iavf_rxtx.h
@@ -136,14 +136,6 @@
 
 #define IAVF_TX_MIN_PKT_LEN 17
 
-#define IAVF_TX_CKSUM_OFFLOAD_MASK (		 \
-		RTE_MBUF_F_TX_IP_CKSUM |		 \
-		RTE_MBUF_F_TX_L4_MASK |		 \
-		RTE_MBUF_F_TX_TCP_SEG |          \
-		RTE_MBUF_F_TX_UDP_SEG |          \
-		RTE_MBUF_F_TX_OUTER_IP_CKSUM |   \
-		RTE_MBUF_F_TX_OUTER_UDP_CKSUM)
-
 #define IAVF_TX_OFFLOAD_MASK (  \
 		RTE_MBUF_F_TX_OUTER_IPV6 |		 \
 		RTE_MBUF_F_TX_OUTER_IPV4 |		 \
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index f9dcc30208..dc21a89ce3 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -13,12 +13,6 @@
 #include "../common/rx_vec_x86.h"
 #endif
 
-#define ICE_TX_CKSUM_OFFLOAD_MASK (RTE_MBUF_F_TX_IP_CKSUM |		 \
-		RTE_MBUF_F_TX_L4_MASK |		 \
-		RTE_MBUF_F_TX_TCP_SEG |		 \
-		RTE_MBUF_F_TX_UDP_SEG |		 \
-		RTE_MBUF_F_TX_OUTER_IP_CKSUM)
-
 /**
  * The mbuf dynamic field pointer for protocol extraction metadata.
  */
@@ -3206,7 +3200,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		}
 
 		/* Enable checksum offloading */
-		if (ol_flags & ICE_TX_CKSUM_OFFLOAD_MASK)
+		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
 			ice_txd_enable_checksum(ol_flags, &td_cmd,
 						&td_offset, tx_offload);
 
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 9219ad9047..b34d545a0a 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -945,7 +945,7 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		else
 			nb_used = tx_pkt->nb_segs + nb_ctx;
 
-		if (ol_flags & IDPF_TX_CKSUM_OFFLOAD_MASK)
+		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
 			cmd_dtype = IDPF_TXD_FLEX_FLOW_CMD_CS_EN;
 
 		/* context descriptor */
@@ -1425,7 +1425,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			}
 		}
 
-		if (ol_flags & IDPF_TX_CKSUM_OFFLOAD_MASK)
+		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
 			td_cmd |= IDPF_TX_FLEX_DESC_CMD_CS_EN;
 
 		if (nb_ctx != 0) {
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.h b/drivers/net/intel/idpf/idpf_common_rxtx.h
index b88a87402d..fe7094d434 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.h
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.h
@@ -39,13 +39,8 @@
 #define IDPF_RLAN_CTX_DBUF_S	7
 #define IDPF_RX_MAX_DATA_BUF_SIZE	(16 * 1024 - 128)
 
-#define IDPF_TX_CKSUM_OFFLOAD_MASK (		\
-		RTE_MBUF_F_TX_IP_CKSUM |	\
-		RTE_MBUF_F_TX_L4_MASK |		\
-		RTE_MBUF_F_TX_TCP_SEG)
-
 #define IDPF_TX_OFFLOAD_MASK (			\
-		IDPF_TX_CKSUM_OFFLOAD_MASK |	\
+		CI_TX_CKSUM_OFFLOAD_MASK |	\
 		RTE_MBUF_F_TX_IPV4 |		\
 		RTE_MBUF_F_TX_IPV6)
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 11/36] net/intel: create common checksum Tx offload function
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (9 preceding siblings ...)
  2026-01-13 15:14   ` [PATCH v2 10/36] net/intel: consolidate checksum mask definition Bruce Richardson
@ 2026-01-13 15:14   ` Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 12/36] net/intel: create a common scalar Tx function Bruce Richardson
                     ` (26 subsequent siblings)
  37 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-13 15:14 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

Since i40e and ice have the same checksum offload logic, merge their
functions into one. Future rework should enable this to be used by more
drivers also.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar_fns.h | 63 +++++++++++++++++++++++
 drivers/net/intel/i40e/i40e_rxtx.c       | 57 +--------------------
 drivers/net/intel/ice/ice_rxtx.c         | 64 +-----------------------
 3 files changed, 65 insertions(+), 119 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index f894cea616..95ee7dc35f 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -64,6 +64,69 @@ ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
 	return 0;
 }
 
+/* Common checksum enable function for Intel drivers (ice, i40e, etc.) */
+static inline void
+ci_txd_enable_checksum(uint64_t ol_flags,
+		       uint32_t *td_cmd,
+		       uint32_t *td_offset,
+		       union ci_tx_offload tx_offload)
+{
+	/* Set MACLEN */
+	if (!(ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK))
+		*td_offset |= (tx_offload.l2_len >> 1)
+			<< CI_TX_DESC_LEN_MACLEN_S;
+
+	/* Enable L3 checksum offloads */
+	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
+		*td_offset |= (tx_offload.l3_len >> 2) <<
+			CI_TX_DESC_LEN_IPLEN_S;
+	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
+		*td_offset |= (tx_offload.l3_len >> 2) <<
+			CI_TX_DESC_LEN_IPLEN_S;
+	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
+		*td_offset |= (tx_offload.l3_len >> 2) <<
+			CI_TX_DESC_LEN_IPLEN_S;
+	}
+
+	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
+		*td_offset |= (tx_offload.l4_len >> 2) <<
+			      CI_TX_DESC_LEN_L4_LEN_S;
+		return;
+	}
+
+	if (ol_flags & RTE_MBUF_F_TX_UDP_SEG) {
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
+		*td_offset |= (tx_offload.l4_len >> 2) <<
+			      CI_TX_DESC_LEN_L4_LEN_S;
+		return;
+	}
+
+	/* Enable L4 checksum offloads */
+	switch (ol_flags & RTE_MBUF_F_TX_L4_MASK) {
+	case RTE_MBUF_F_TX_TCP_CKSUM:
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
+		*td_offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
+			      CI_TX_DESC_LEN_L4_LEN_S;
+		break;
+	case RTE_MBUF_F_TX_SCTP_CKSUM:
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
+		*td_offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
+			      CI_TX_DESC_LEN_L4_LEN_S;
+		break;
+	case RTE_MBUF_F_TX_UDP_CKSUM:
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
+		*td_offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
+			      CI_TX_DESC_LEN_L4_LEN_S;
+		break;
+	default:
+		break;
+	}
+}
+
 static inline uint16_t
 ci_div_roundup16(uint16_t x, uint16_t y)
 {
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index db36ec86f7..617b93c92b 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -310,61 +310,6 @@ i40e_parse_tunneling_params(uint64_t ol_flags,
 		*cd_tunneling |= I40E_TXD_CTX_QW0_L4T_CS_MASK;
 }
 
-static inline void
-i40e_txd_enable_checksum(uint64_t ol_flags,
-			uint32_t *td_cmd,
-			uint32_t *td_offset,
-			union ci_tx_offload tx_offload)
-{
-	/* Set MACLEN */
-	if (!(ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK))
-		*td_offset |= (tx_offload.l2_len >> 1)
-			<< CI_TX_DESC_LEN_MACLEN_S;
-
-	/* Enable L3 checksum offloads */
-	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
-		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
-		*td_offset |= (tx_offload.l3_len >> 2)
-				<< CI_TX_DESC_LEN_IPLEN_S;
-	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
-		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
-		*td_offset |= (tx_offload.l3_len >> 2)
-				<< CI_TX_DESC_LEN_IPLEN_S;
-	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
-		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
-		*td_offset |= (tx_offload.l3_len >> 2)
-				<< CI_TX_DESC_LEN_IPLEN_S;
-	}
-
-	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
-		*td_offset |= (tx_offload.l4_len >> 2)
-			<< CI_TX_DESC_LEN_L4_LEN_S;
-		return;
-	}
-
-	/* Enable L4 checksum offloads */
-	switch (ol_flags & RTE_MBUF_F_TX_L4_MASK) {
-	case RTE_MBUF_F_TX_TCP_CKSUM:
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
-		*td_offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
-				CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	case RTE_MBUF_F_TX_SCTP_CKSUM:
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
-		*td_offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
-				CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	case RTE_MBUF_F_TX_UDP_CKSUM:
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
-		*td_offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
-				CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	default:
-		break;
-	}
-}
-
 /* Construct the tx flags */
 static inline uint64_t
 i40e_build_ctob(uint32_t td_cmd,
@@ -1172,7 +1117,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		/* Enable checksum offloading */
 		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
-			i40e_txd_enable_checksum(ol_flags, &td_cmd,
+			ci_txd_enable_checksum(ol_flags, &td_cmd,
 						 &td_offset, tx_offload);
 
 		if (nb_ctx) {
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index dc21a89ce3..b9c38995f0 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -2942,68 +2942,6 @@ ice_parse_tunneling_params(uint64_t ol_flags,
 		*cd_tunneling |= ICE_TXD_CTX_QW0_L4T_CS_M;
 }
 
-static inline void
-ice_txd_enable_checksum(uint64_t ol_flags,
-			uint32_t *td_cmd,
-			uint32_t *td_offset,
-			union ci_tx_offload tx_offload)
-{
-	/* Set MACLEN */
-	if (!(ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK))
-		*td_offset |= (tx_offload.l2_len >> 1)
-			<< CI_TX_DESC_LEN_MACLEN_S;
-
-	/* Enable L3 checksum offloads */
-	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
-		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
-		*td_offset |= (tx_offload.l3_len >> 2) <<
-			CI_TX_DESC_LEN_IPLEN_S;
-	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
-		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
-		*td_offset |= (tx_offload.l3_len >> 2) <<
-			CI_TX_DESC_LEN_IPLEN_S;
-	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
-		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
-		*td_offset |= (tx_offload.l3_len >> 2) <<
-			CI_TX_DESC_LEN_IPLEN_S;
-	}
-
-	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
-		*td_offset |= (tx_offload.l4_len >> 2) <<
-			      CI_TX_DESC_LEN_L4_LEN_S;
-		return;
-	}
-
-	if (ol_flags & RTE_MBUF_F_TX_UDP_SEG) {
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
-		*td_offset |= (tx_offload.l4_len >> 2) <<
-			      CI_TX_DESC_LEN_L4_LEN_S;
-		return;
-	}
-
-	/* Enable L4 checksum offloads */
-	switch (ol_flags & RTE_MBUF_F_TX_L4_MASK) {
-	case RTE_MBUF_F_TX_TCP_CKSUM:
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
-		*td_offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
-			      CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	case RTE_MBUF_F_TX_SCTP_CKSUM:
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
-		*td_offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
-			      CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	case RTE_MBUF_F_TX_UDP_CKSUM:
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
-		*td_offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
-			      CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	default:
-		break;
-	}
-}
-
 /* Construct the tx flags */
 static inline uint64_t
 ice_build_ctob(uint32_t td_cmd,
@@ -3201,7 +3139,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		/* Enable checksum offloading */
 		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
-			ice_txd_enable_checksum(ol_flags, &td_cmd,
+			ci_txd_enable_checksum(ol_flags, &td_cmd,
 						&td_offset, tx_offload);
 
 		if (nb_ctx) {
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 12/36] net/intel: create a common scalar Tx function
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (10 preceding siblings ...)
  2026-01-13 15:14   ` [PATCH v2 11/36] net/intel: create common checksum Tx offload function Bruce Richardson
@ 2026-01-13 15:14   ` Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 13/36] net/i40e: use " Bruce Richardson
                     ` (25 subsequent siblings)
  37 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-13 15:14 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

Given the similarities between the transmit functions across various
Intel drivers, make a start on consolidating them by moving the ice Tx
function into common, for reuse by other drivers.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar_fns.h | 215 ++++++++++++++++++
 drivers/net/intel/ice/ice_rxtx.c         | 268 +++++------------------
 2 files changed, 267 insertions(+), 216 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index 95ee7dc35f..70b22f1da0 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -6,6 +6,7 @@
 #define _COMMON_INTEL_TX_SCALAR_FNS_H_
 
 #include <stdint.h>
+#include <rte_io.h>
 #include <rte_byteorder.h>
 
 /* depends on common Tx definitions. */
@@ -147,5 +148,219 @@ ci_calc_pkt_desc(const struct rte_mbuf *tx_pkt)
 	return count;
 }
 
+typedef uint16_t (*ci_get_ctx_desc_fn)(uint64_t ol_flags, const struct rte_mbuf *mbuf,
+		const union ci_tx_offload *tx_offload, const struct ci_tx_queue *txq,
+		uint32_t *td_offset, uint64_t *qw0, uint64_t *qw1);
+
+/* gets current timestamp tail index */
+typedef uint16_t (*get_ts_tail_t)(struct ci_tx_queue *txq);
+/* writes a timestamp descriptor and returns new tail index */
+typedef uint16_t (*write_ts_desc_t)(struct ci_tx_queue *txq, struct rte_mbuf *mbuf,
+		uint16_t tx_id, uint16_t ts_id);
+/* writes a timestamp tail index - doorbell */
+typedef void (*write_ts_tail_t)(struct ci_tx_queue *txq, uint16_t ts_id);
+
+struct ci_timesstamp_queue_fns {
+	get_ts_tail_t get_ts_tail;
+	write_ts_desc_t write_ts_desc;
+	write_ts_tail_t write_ts_tail;
+};
+
+static inline uint16_t
+ci_xmit_pkts(struct ci_tx_queue *txq,
+	     struct rte_mbuf **tx_pkts,
+	     uint16_t nb_pkts,
+	     ci_get_ctx_desc_fn get_ctx_desc,
+	     const struct ci_timesstamp_queue_fns *ts_fns)
+{
+	volatile struct ci_tx_desc *ci_tx_ring;
+	volatile struct ci_tx_desc *txd;
+	struct ci_tx_entry *sw_ring;
+	struct ci_tx_entry *txe, *txn;
+	struct rte_mbuf *tx_pkt;
+	struct rte_mbuf *m_seg;
+	uint16_t tx_id;
+	uint16_t ts_id = -1;
+	uint16_t nb_tx;
+	uint16_t nb_used;
+	uint16_t nb_ctx;
+	uint32_t td_cmd = 0;
+	uint32_t td_offset = 0;
+	uint32_t td_tag = 0;
+	uint16_t tx_last;
+	uint16_t slen;
+	uint64_t buf_dma_addr;
+	uint64_t ol_flags;
+	union ci_tx_offload tx_offload = {0};
+
+	sw_ring = txq->sw_ring;
+	ci_tx_ring = txq->ci_tx_ring;
+	tx_id = txq->tx_tail;
+	txe = &sw_ring[tx_id];
+
+	if (ts_fns != NULL)
+		ts_id = ts_fns->get_ts_tail(txq);
+
+	/* Check if the descriptor ring needs to be cleaned. */
+	if (txq->nb_tx_free < txq->tx_free_thresh)
+		(void)ci_tx_xmit_cleanup(txq);
+
+	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
+		uint64_t cd_qw0, cd_qw1;
+		tx_pkt = *tx_pkts++;
+
+		td_cmd = CI_TX_DESC_CMD_ICRC;
+		td_tag = 0;
+		td_offset = 0;
+		ol_flags = tx_pkt->ol_flags;
+
+		tx_offload.l2_len = tx_pkt->l2_len;
+		tx_offload.l3_len = tx_pkt->l3_len;
+		tx_offload.outer_l2_len = tx_pkt->outer_l2_len;
+		tx_offload.outer_l3_len = tx_pkt->outer_l3_len;
+		tx_offload.l4_len = tx_pkt->l4_len;
+		tx_offload.tso_segsz = tx_pkt->tso_segsz;
+
+		/* Calculate the number of context descriptors needed. */
+		nb_ctx = get_ctx_desc(ol_flags, tx_pkt, &tx_offload,
+			txq, &td_offset, &cd_qw0, &cd_qw1);
+
+		/* The number of descriptors that must be allocated for
+		 * a packet equals to the number of the segments of that
+		 * packet plus the number of context descriptor if needed.
+		 * Recalculate the needed tx descs when TSO enabled in case
+		 * the mbuf data size exceeds max data size that hw allows
+		 * per tx desc.
+		 */
+		if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
+			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
+		else
+			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
+		tx_last = (uint16_t)(tx_id + nb_used - 1);
+
+		/* Circular ring */
+		if (tx_last >= txq->nb_tx_desc)
+			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
+
+		if (nb_used > txq->nb_tx_free) {
+			if (ci_tx_xmit_cleanup(txq) != 0) {
+				if (nb_tx == 0)
+					return 0;
+				goto end_of_tx;
+			}
+			if (unlikely(nb_used > txq->tx_rs_thresh)) {
+				while (nb_used > txq->nb_tx_free) {
+					if (ci_tx_xmit_cleanup(txq) != 0) {
+						if (nb_tx == 0)
+							return 0;
+						goto end_of_tx;
+					}
+				}
+			}
+		}
+
+		/* Descriptor based VLAN insertion */
+		if (ol_flags & (RTE_MBUF_F_TX_VLAN | RTE_MBUF_F_TX_QINQ)) {
+			td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
+			td_tag = tx_pkt->vlan_tci;
+		}
+
+		/* Enable checksum offloading */
+		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
+			ci_txd_enable_checksum(ol_flags, &td_cmd,
+						&td_offset, tx_offload);
+
+		if (nb_ctx) {
+			/* Setup TX context descriptor if required */
+			uint64_t *ctx_txd = RTE_CAST_PTR(uint64_t *, &ci_tx_ring[tx_id]);
+
+			txn = &sw_ring[txe->next_id];
+			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
+			if (txe->mbuf) {
+				rte_pktmbuf_free_seg(txe->mbuf);
+				txe->mbuf = NULL;
+			}
+
+			ctx_txd[0] = cd_qw0;
+			ctx_txd[1] = cd_qw1;
+
+			txe->last_id = tx_last;
+			tx_id = txe->next_id;
+			txe = txn;
+		}
+		m_seg = tx_pkt;
+
+		do {
+			txd = &ci_tx_ring[tx_id];
+			txn = &sw_ring[txe->next_id];
+
+			if (txe->mbuf)
+				rte_pktmbuf_free_seg(txe->mbuf);
+			txe->mbuf = m_seg;
+
+			/* Setup TX Descriptor */
+			slen = m_seg->data_len;
+			buf_dma_addr = rte_mbuf_data_iova(m_seg);
+
+			while ((ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) &&
+					unlikely(slen > CI_MAX_DATA_PER_TXD)) {
+				txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
+				txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+					((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
+					((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+					((uint64_t)CI_MAX_DATA_PER_TXD << CI_TXD_QW1_TX_BUF_SZ_S) |
+					((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
+
+				buf_dma_addr += CI_MAX_DATA_PER_TXD;
+				slen -= CI_MAX_DATA_PER_TXD;
+
+				txe->last_id = tx_last;
+				tx_id = txe->next_id;
+				txe = txn;
+				txd = &ci_tx_ring[tx_id];
+				txn = &sw_ring[txe->next_id];
+			}
+
+			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
+			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
+				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+				((uint64_t)slen << CI_TXD_QW1_TX_BUF_SZ_S) |
+				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
+
+			txe->last_id = tx_last;
+			tx_id = txe->next_id;
+			txe = txn;
+			m_seg = m_seg->next;
+		} while (m_seg);
+
+		/* fill the last descriptor with End of Packet (EOP) bit */
+		td_cmd |= CI_TX_DESC_CMD_EOP;
+		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
+		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
+
+		/* set RS bit on the last descriptor of one packet */
+		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
+			td_cmd |= CI_TX_DESC_CMD_RS;
+
+			/* Update txq RS bit counters */
+			txq->nb_tx_used = 0;
+		}
+		txd->cmd_type_offset_bsz |=
+				rte_cpu_to_le_64(((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S);
+
+		if (ts_fns != NULL)
+			ts_id = ts_fns->write_ts_desc(txq, tx_pkt, tx_id, ts_id);
+	}
+end_of_tx:
+	/* update Tail register */
+	if (ts_fns != NULL)
+		ts_fns->write_ts_tail(txq, ts_id);
+	else
+		rte_write32_wc(tx_id, txq->qtx_tail);
+	txq->tx_tail = tx_id;
+
+	return nb_tx;
+}
 
 #endif /* _COMMON_INTEL_TX_SCALAR_FNS_H_ */
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index b9c38995f0..65e2d401e8 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3040,228 +3040,64 @@ get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
 	return 1;
 }
 
-uint16_t
-ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+static uint16_t
+ice_get_ts_tail(struct ci_tx_queue *txq)
 {
-	struct ci_tx_queue *txq;
-	volatile struct ci_tx_desc *ci_tx_ring;
-	volatile struct ci_tx_desc *txd;
-	struct ci_tx_entry *sw_ring;
-	struct ci_tx_entry *txe, *txn;
-	struct rte_mbuf *tx_pkt;
-	struct rte_mbuf *m_seg;
-	uint16_t tx_id;
-	uint16_t ts_id = -1;
-	uint16_t nb_tx;
-	uint16_t nb_used;
-	uint16_t nb_ctx;
-	uint32_t td_cmd = 0;
-	uint32_t td_offset = 0;
-	uint32_t td_tag = 0;
-	uint16_t tx_last;
-	uint16_t slen;
-	uint64_t buf_dma_addr;
-	uint64_t ol_flags;
-	union ci_tx_offload tx_offload = {0};
-
-	txq = tx_queue;
-	sw_ring = txq->sw_ring;
-	ci_tx_ring = txq->ci_tx_ring;
-	tx_id = txq->tx_tail;
-	txe = &sw_ring[tx_id];
-
-	if (txq->tsq != NULL && txq->tsq->ts_flag > 0)
-		ts_id = txq->tsq->ts_tail;
-
-	/* Check if the descriptor ring needs to be cleaned. */
-	if (txq->nb_tx_free < txq->tx_free_thresh)
-		(void)ci_tx_xmit_cleanup(txq);
-
-	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
-		uint64_t cd_qw0, cd_qw1;
-		tx_pkt = *tx_pkts++;
-
-		td_cmd = 0;
-		td_tag = 0;
-		td_offset = 0;
-		ol_flags = tx_pkt->ol_flags;
-
-		tx_offload.l2_len = tx_pkt->l2_len;
-		tx_offload.l3_len = tx_pkt->l3_len;
-		tx_offload.outer_l2_len = tx_pkt->outer_l2_len;
-		tx_offload.outer_l3_len = tx_pkt->outer_l3_len;
-		tx_offload.l4_len = tx_pkt->l4_len;
-		tx_offload.tso_segsz = tx_pkt->tso_segsz;
-
-		/* Calculate the number of context descriptors needed. */
-		nb_ctx = get_context_desc(ol_flags, tx_pkt, &tx_offload,
-			txq, &td_offset, &cd_qw0, &cd_qw1);
-
-		/* The number of descriptors that must be allocated for
-		 * a packet equals to the number of the segments of that
-		 * packet plus the number of context descriptor if needed.
-		 * Recalculate the needed tx descs when TSO enabled in case
-		 * the mbuf data size exceeds max data size that hw allows
-		 * per tx desc.
-		 */
-		if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
-			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
-		else
-			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
-		tx_last = (uint16_t)(tx_id + nb_used - 1);
-
-		/* Circular ring */
-		if (tx_last >= txq->nb_tx_desc)
-			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
-
-		if (nb_used > txq->nb_tx_free) {
-			if (ci_tx_xmit_cleanup(txq) != 0) {
-				if (nb_tx == 0)
-					return 0;
-				goto end_of_tx;
-			}
-			if (unlikely(nb_used > txq->tx_rs_thresh)) {
-				while (nb_used > txq->nb_tx_free) {
-					if (ci_tx_xmit_cleanup(txq) != 0) {
-						if (nb_tx == 0)
-							return 0;
-						goto end_of_tx;
-					}
-				}
-			}
-		}
-
-		/* Descriptor based VLAN insertion */
-		if (ol_flags & (RTE_MBUF_F_TX_VLAN | RTE_MBUF_F_TX_QINQ)) {
-			td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
-			td_tag = tx_pkt->vlan_tci;
-		}
-
-		/* Enable checksum offloading */
-		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
-			ci_txd_enable_checksum(ol_flags, &td_cmd,
-						&td_offset, tx_offload);
-
-		if (nb_ctx) {
-			/* Setup TX context descriptor if required */
-			uint64_t *ctx_txd = RTE_CAST_PTR(uint64_t *, &ci_tx_ring[tx_id]);
-
-			txn = &sw_ring[txe->next_id];
-			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
-			if (txe->mbuf) {
-				rte_pktmbuf_free_seg(txe->mbuf);
-				txe->mbuf = NULL;
-			}
-
-			ctx_txd[0] = cd_qw0;
-			ctx_txd[1] = cd_qw1;
-
-			txe->last_id = tx_last;
-			tx_id = txe->next_id;
-			txe = txn;
-		}
-		m_seg = tx_pkt;
-
-		do {
-			txd = &ci_tx_ring[tx_id];
-			txn = &sw_ring[txe->next_id];
-
-			if (txe->mbuf)
-				rte_pktmbuf_free_seg(txe->mbuf);
-			txe->mbuf = m_seg;
-
-			/* Setup TX Descriptor */
-			slen = m_seg->data_len;
-			buf_dma_addr = rte_mbuf_data_iova(m_seg);
-
-			while ((ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) &&
-					unlikely(slen > CI_MAX_DATA_PER_TXD)) {
-				txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-				txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
-					((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
-					((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
-					((uint64_t)CI_MAX_DATA_PER_TXD << CI_TXD_QW1_TX_BUF_SZ_S) |
-					((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
-
-				buf_dma_addr += CI_MAX_DATA_PER_TXD;
-				slen -= CI_MAX_DATA_PER_TXD;
-
-				txe->last_id = tx_last;
-				tx_id = txe->next_id;
-				txe = txn;
-				txd = &ci_tx_ring[tx_id];
-				txn = &sw_ring[txe->next_id];
-			}
-
-			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
-				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
-				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
-				((uint64_t)slen << CI_TXD_QW1_TX_BUF_SZ_S) |
-				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
-
-			txe->last_id = tx_last;
-			tx_id = txe->next_id;
-			txe = txn;
-			m_seg = m_seg->next;
-		} while (m_seg);
+	return txq->tsq->ts_tail;
+}
 
-		/* fill the last descriptor with End of Packet (EOP) bit */
-		td_cmd |= CI_TX_DESC_CMD_EOP;
-		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
-		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
+static uint16_t
+ice_write_ts_desc(struct ci_tx_queue *txq,
+		  struct rte_mbuf *tx_pkt,
+		  uint16_t tx_id,
+		  uint16_t ts_id)
+{
+	uint64_t txtime = *RTE_MBUF_DYNFIELD(tx_pkt, txq->tsq->ts_offset, uint64_t *);
+	uint32_t tstamp = (uint32_t)(txtime % NS_PER_S) >> ICE_TXTIME_CTX_RESOLUTION_128NS;
+	const uint32_t desc_tx_id = (tx_id == 0) ? txq->nb_tx_desc : tx_id;
+	__le32 ts_desc = rte_cpu_to_le_32(FIELD_PREP(ICE_TXTIME_TX_DESC_IDX_M, desc_tx_id) |
+			FIELD_PREP(ICE_TXTIME_STAMP_M, tstamp));
+
+	txq->tsq->ice_ts_ring[ts_id].tx_desc_idx_tstamp = ts_desc;
+	ts_id++;
+
+	/* To prevent an MDD, when wrapping the tstamp
+	 * ring create additional TS descriptors equal
+	 * to the number of the fetch TS descriptors
+	 * value. HW will merge the TS descriptors with
+	 * the same timestamp value into a single
+	 * descriptor.
+	 */
+	if (ts_id == txq->tsq->nb_ts_desc) {
+		uint16_t fetch = txq->tsq->nb_ts_desc - txq->nb_tx_desc;
+		ts_id = 0;
+		for (; ts_id < fetch; ts_id++)
+			txq->tsq->ice_ts_ring[ts_id].tx_desc_idx_tstamp = ts_desc;
+	}
+	return ts_id;
+}
 
-		/* set RS bit on the last descriptor of one packet */
-		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
-			PMD_TX_LOG(DEBUG,
-				   "Setting RS bit on TXD id="
-				   "%4u (port=%d queue=%d)",
-				   tx_last, txq->port_id, txq->queue_id);
+static void
+ice_write_ts_tail(struct ci_tx_queue *txq, uint16_t ts_tail)
+{
+	ICE_PCI_REG_WRITE(txq->qtx_tail, ts_tail);
+	txq->tsq->ts_tail = ts_tail;
+}
 
-			td_cmd |= CI_TX_DESC_CMD_RS;
+uint16_t
+ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	const struct ci_timesstamp_queue_fns ts_fns = {
+		.get_ts_tail = ice_get_ts_tail,
+		.write_ts_desc = ice_write_ts_desc,
+		.write_ts_tail = ice_write_ts_tail,
+	};
+	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
 
-			/* Update txq RS bit counters */
-			txq->nb_tx_used = 0;
-		}
-		txd->cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S);
-
-		if (txq->tsq != NULL && txq->tsq->ts_flag > 0) {
-			uint64_t txtime = *RTE_MBUF_DYNFIELD(tx_pkt,
-					txq->tsq->ts_offset, uint64_t *);
-			uint32_t tstamp = (uint32_t)(txtime % NS_PER_S) >>
-						ICE_TXTIME_CTX_RESOLUTION_128NS;
-			const uint32_t desc_tx_id = (tx_id == 0) ? txq->nb_tx_desc : tx_id;
-			__le32 ts_desc = rte_cpu_to_le_32(FIELD_PREP(ICE_TXTIME_TX_DESC_IDX_M,
-					desc_tx_id) | FIELD_PREP(ICE_TXTIME_STAMP_M, tstamp));
-			txq->tsq->ice_ts_ring[ts_id].tx_desc_idx_tstamp = ts_desc;
-			ts_id++;
-			/* To prevent an MDD, when wrapping the tstamp
-			 * ring create additional TS descriptors equal
-			 * to the number of the fetch TS descriptors
-			 * value. HW will merge the TS descriptors with
-			 * the same timestamp value into a single
-			 * descriptor.
-			 */
-			if (ts_id == txq->tsq->nb_ts_desc) {
-				uint16_t fetch = txq->tsq->nb_ts_desc - txq->nb_tx_desc;
-				ts_id = 0;
-				for (; ts_id < fetch; ts_id++)
-					txq->tsq->ice_ts_ring[ts_id].tx_desc_idx_tstamp = ts_desc;
-			}
-		}
-	}
-end_of_tx:
-	/* update Tail register */
-	if (txq->tsq != NULL && txq->tsq->ts_flag > 0) {
-		ICE_PCI_REG_WRITE(txq->qtx_tail, ts_id);
-		txq->tsq->ts_tail = ts_id;
-	} else {
-		ICE_PCI_REG_WRITE(txq->qtx_tail, tx_id);
-	}
-	txq->tx_tail = tx_id;
+	if (txq->tsq != NULL && txq->tsq->ts_flag > 0)
+		return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, &ts_fns);
 
-	return nb_tx;
+	return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, NULL);
 }
 
 static __rte_always_inline int
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 13/36] net/i40e: use common scalar Tx function
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (11 preceding siblings ...)
  2026-01-13 15:14   ` [PATCH v2 12/36] net/intel: create a common scalar Tx function Bruce Richardson
@ 2026-01-13 15:14   ` Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 14/36] net/intel: add IPsec hooks to common " Bruce Richardson
                     ` (24 subsequent siblings)
  37 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-13 15:14 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

Following earlier rework, the scalar transmit function for i40e can use
the common function previously moved over from ice driver. This saves
hundreds of duplicated lines of code.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/i40e/i40e_rxtx.c | 206 +----------------------------
 1 file changed, 2 insertions(+), 204 deletions(-)

diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 617b93c92b..08a74fb1d4 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -1020,210 +1020,8 @@ get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
 uint16_t
 i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
-	struct ci_tx_queue *txq;
-	struct ci_tx_entry *sw_ring;
-	struct ci_tx_entry *txe, *txn;
-	volatile struct ci_tx_desc *txd;
-	volatile struct ci_tx_desc *txr;
-	struct rte_mbuf *tx_pkt;
-	struct rte_mbuf *m_seg;
-	uint16_t tx_id;
-	uint16_t nb_tx;
-	uint32_t td_cmd;
-	uint32_t td_offset;
-	uint32_t td_tag;
-	uint64_t ol_flags;
-	uint16_t nb_used;
-	uint16_t nb_ctx;
-	uint16_t tx_last;
-	uint16_t slen;
-	uint64_t buf_dma_addr;
-	union ci_tx_offload tx_offload = {0};
-
-	txq = tx_queue;
-	sw_ring = txq->sw_ring;
-	txr = txq->ci_tx_ring;
-	tx_id = txq->tx_tail;
-	txe = &sw_ring[tx_id];
-
-	/* Check if the descriptor ring needs to be cleaned. */
-	if (txq->nb_tx_free < txq->tx_free_thresh)
-		(void)ci_tx_xmit_cleanup(txq);
-
-	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
-		td_cmd = 0;
-		td_tag = 0;
-		td_offset = 0;
-
-		tx_pkt = *tx_pkts++;
-		RTE_MBUF_PREFETCH_TO_FREE(txe->mbuf);
-
-		ol_flags = tx_pkt->ol_flags;
-		tx_offload.l2_len = tx_pkt->l2_len;
-		tx_offload.l3_len = tx_pkt->l3_len;
-		tx_offload.outer_l2_len = tx_pkt->outer_l2_len;
-		tx_offload.outer_l3_len = tx_pkt->outer_l3_len;
-		tx_offload.l4_len = tx_pkt->l4_len;
-		tx_offload.tso_segsz = tx_pkt->tso_segsz;
-
-		/* Calculate the number of context descriptors needed. */
-		uint64_t cd_qw0 = 0, cd_qw1 = 0;
-		nb_ctx = get_context_desc(ol_flags, tx_pkt, &tx_offload, txq, &td_offset,
-				&cd_qw0, &cd_qw1);
-
-		/**
-		 * The number of descriptors that must be allocated for
-		 * a packet equals to the number of the segments of that
-		 * packet plus 1 context descriptor if needed.
-		 * Recalculate the needed tx descs when TSO enabled in case
-		 * the mbuf data size exceeds max data size that hw allows
-		 * per tx desc.
-		 */
-		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
-		else
-			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
-		tx_last = (uint16_t)(tx_id + nb_used - 1);
-
-		/* Circular ring */
-		if (tx_last >= txq->nb_tx_desc)
-			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
-
-		if (nb_used > txq->nb_tx_free) {
-			if (ci_tx_xmit_cleanup(txq) != 0) {
-				if (nb_tx == 0)
-					return 0;
-				goto end_of_tx;
-			}
-			if (unlikely(nb_used > txq->tx_rs_thresh)) {
-				while (nb_used > txq->nb_tx_free) {
-					if (ci_tx_xmit_cleanup(txq) != 0) {
-						if (nb_tx == 0)
-							return 0;
-						goto end_of_tx;
-					}
-				}
-			}
-		}
-
-		/* Descriptor based VLAN insertion */
-		if (ol_flags & (RTE_MBUF_F_TX_VLAN | RTE_MBUF_F_TX_QINQ)) {
-			td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
-			td_tag = tx_pkt->vlan_tci;
-		}
-
-		/* Always enable CRC offload insertion */
-		td_cmd |= CI_TX_DESC_CMD_ICRC;
-
-		/* Enable checksum offloading */
-		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
-			ci_txd_enable_checksum(ol_flags, &td_cmd,
-						 &td_offset, tx_offload);
-
-		if (nb_ctx) {
-			/* Setup TX context descriptor if required */
-			uint64_t *desc = RTE_CAST_PTR(uint64_t *, &txr[tx_id]);
-
-			txn = &sw_ring[txe->next_id];
-			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
-			if (txe->mbuf != NULL) {
-				rte_pktmbuf_free_seg(txe->mbuf);
-				txe->mbuf = NULL;
-			}
-
-			desc[0] = cd_qw0;
-			desc[1] = cd_qw1;
-
-			PMD_TX_LOG(DEBUG, "mbuf: %p, TCD[%u]: "
-				"qw0: %#"PRIx64"; "
-				"qw1: %#"PRIx64";",
-				tx_pkt, tx_id, cd_qw0, cd_qw1);
-
-			txe->last_id = tx_last;
-			tx_id = txe->next_id;
-			txe = txn;
-		}
-
-		m_seg = tx_pkt;
-		do {
-			txd = &txr[tx_id];
-			txn = &sw_ring[txe->next_id];
-
-			if (txe->mbuf)
-				rte_pktmbuf_free_seg(txe->mbuf);
-			txe->mbuf = m_seg;
-
-			/* Setup TX Descriptor */
-			slen = m_seg->data_len;
-			buf_dma_addr = rte_mbuf_data_iova(m_seg);
-
-			while ((ol_flags & RTE_MBUF_F_TX_TCP_SEG) &&
-				unlikely(slen > CI_MAX_DATA_PER_TXD)) {
-				txd->buffer_addr =
-					rte_cpu_to_le_64(buf_dma_addr);
-				txd->cmd_type_offset_bsz =
-					i40e_build_ctob(td_cmd,
-					td_offset, CI_MAX_DATA_PER_TXD,
-					td_tag);
-
-				buf_dma_addr += CI_MAX_DATA_PER_TXD;
-				slen -= CI_MAX_DATA_PER_TXD;
-
-				txe->last_id = tx_last;
-				tx_id = txe->next_id;
-				txe = txn;
-				txd = &txr[tx_id];
-				txn = &sw_ring[txe->next_id];
-			}
-			PMD_TX_LOG(DEBUG, "mbuf: %p, TDD[%u]: "
-				"buf_dma_addr: %#"PRIx64"; "
-				"td_cmd: %#x; "
-				"td_offset: %#x; "
-				"td_len: %u; "
-				"td_tag: %#x;",
-				tx_pkt, tx_id, buf_dma_addr,
-				td_cmd, td_offset, slen, td_tag);
-
-			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-			txd->cmd_type_offset_bsz = i40e_build_ctob(td_cmd,
-						td_offset, slen, td_tag);
-			txe->last_id = tx_last;
-			tx_id = txe->next_id;
-			txe = txn;
-			m_seg = m_seg->next;
-		} while (m_seg != NULL);
-
-		/* The last packet data descriptor needs End Of Packet (EOP) */
-		td_cmd |= CI_TX_DESC_CMD_EOP;
-		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
-		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
-
-		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
-			PMD_TX_LOG(DEBUG,
-				   "Setting RS bit on TXD id="
-				   "%4u (port=%d queue=%d)",
-				   tx_last, txq->port_id, txq->queue_id);
-
-			td_cmd |= CI_TX_DESC_CMD_RS;
-
-			/* Update txq RS bit counters */
-			txq->nb_tx_used = 0;
-		}
-
-		txd->cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S);
-	}
-
-end_of_tx:
-	PMD_TX_LOG(DEBUG, "port_id=%u queue_id=%u tx_tail=%u nb_tx=%u",
-		   (unsigned) txq->port_id, (unsigned) txq->queue_id,
-		   (unsigned) tx_id, (unsigned) nb_tx);
-
-	rte_io_wmb();
-	I40E_PCI_REG_WC_WRITE_RELAXED(txq->qtx_tail, tx_id);
-	txq->tx_tail = tx_id;
-
-	return nb_tx;
+	/* i40e does not support timestamp queues, so pass NULL for ts_fns */
+	return ci_xmit_pkts(tx_queue, tx_pkts, nb_pkts, get_context_desc, NULL);
 }
 
 static __rte_always_inline int
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 14/36] net/intel: add IPsec hooks to common Tx function
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (12 preceding siblings ...)
  2026-01-13 15:14   ` [PATCH v2 13/36] net/i40e: use " Bruce Richardson
@ 2026-01-13 15:14   ` Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 15/36] net/intel: support configurable VLAN tag insertion on Tx Bruce Richardson
                     ` (23 subsequent siblings)
  37 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-13 15:14 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

The iavf driver has IPsec offload support on Tx, so add hooks to the
common Tx function to support that. Do so in a way that has zero
performance impact for drivers which do not have IPSec support, by
passing in compile-time NULL constants for the function pointers, which
can be optimized away by the compiler.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar_fns.h | 60 ++++++++++++++++++++++--
 drivers/net/intel/i40e/i40e_rxtx.c       |  4 +-
 drivers/net/intel/ice/ice_rxtx.c         |  4 +-
 3 files changed, 60 insertions(+), 8 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index 70b22f1da0..8c0de26537 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -152,6 +152,24 @@ typedef uint16_t (*ci_get_ctx_desc_fn)(uint64_t ol_flags, const struct rte_mbuf
 		const union ci_tx_offload *tx_offload, const struct ci_tx_queue *txq,
 		uint32_t *td_offset, uint64_t *qw0, uint64_t *qw1);
 
+/* gets IPsec descriptor information and returns number of descriptors needed (0 or 1) */
+typedef uint16_t (*get_ipsec_desc_t)(const struct rte_mbuf *mbuf,
+		const struct ci_tx_queue *txq,
+		void **ipsec_metadata,
+		uint64_t *qw0,
+		uint64_t *qw1);
+/* calculates segment length for IPsec + TSO combinations */
+typedef uint16_t (*calc_ipsec_segment_len_t)(const struct rte_mbuf *mb_seg,
+		uint64_t ol_flags,
+		const void *ipsec_metadata,
+		uint16_t tlen);
+
+/** IPsec descriptor operations for drivers that support inline IPsec crypto. */
+struct ci_ipsec_ops {
+	get_ipsec_desc_t get_ipsec_desc;
+	calc_ipsec_segment_len_t calc_segment_len;
+};
+
 /* gets current timestamp tail index */
 typedef uint16_t (*get_ts_tail_t)(struct ci_tx_queue *txq);
 /* writes a timestamp descriptor and returns new tail index */
@@ -171,6 +189,7 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
 	     uint16_t nb_pkts,
 	     ci_get_ctx_desc_fn get_ctx_desc,
+	     const struct ci_ipsec_ops *ipsec_ops,
 	     const struct ci_timesstamp_queue_fns *ts_fns)
 {
 	volatile struct ci_tx_desc *ci_tx_ring;
@@ -206,6 +225,9 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 		(void)ci_tx_xmit_cleanup(txq);
 
 	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
+		void *ipsec_md = NULL;
+		uint16_t nb_ipsec = 0;
+		uint64_t ipsec_qw0 = 0, ipsec_qw1 = 0;
 		uint64_t cd_qw0, cd_qw1;
 		tx_pkt = *tx_pkts++;
 
@@ -225,17 +247,22 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 		nb_ctx = get_ctx_desc(ol_flags, tx_pkt, &tx_offload,
 			txq, &td_offset, &cd_qw0, &cd_qw1);
 
+		/* Get IPsec descriptor information if IPsec ops provided */
+		if (ipsec_ops != NULL)
+			nb_ipsec = ipsec_ops->get_ipsec_desc(tx_pkt, txq, &ipsec_md,
+					&ipsec_qw0, &ipsec_qw1);
+
 		/* The number of descriptors that must be allocated for
 		 * a packet equals to the number of the segments of that
-		 * packet plus the number of context descriptor if needed.
+		 * packet plus the number of context and IPsec descriptors if needed.
 		 * Recalculate the needed tx descs when TSO enabled in case
 		 * the mbuf data size exceeds max data size that hw allows
 		 * per tx desc.
 		 */
 		if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
-			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
+			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx + nb_ipsec);
 		else
-			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
+			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx + nb_ipsec);
 		tx_last = (uint16_t)(tx_id + nb_used - 1);
 
 		/* Circular ring */
@@ -288,6 +315,26 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			tx_id = txe->next_id;
 			txe = txn;
 		}
+
+		if (ipsec_ops != NULL && nb_ipsec > 0) {
+			/* Setup TX IPsec descriptor if required */
+			uint64_t *ipsec_txd = RTE_CAST_PTR(uint64_t *, &ci_tx_ring[tx_id]);
+
+			txn = &sw_ring[txe->next_id];
+			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
+			if (txe->mbuf) {
+				rte_pktmbuf_free_seg(txe->mbuf);
+				txe->mbuf = NULL;
+			}
+
+			ipsec_txd[0] = ipsec_qw0;
+			ipsec_txd[1] = ipsec_qw1;
+
+			txe->last_id = tx_last;
+			tx_id = txe->next_id;
+			txe = txn;
+		}
+
 		m_seg = tx_pkt;
 
 		do {
@@ -299,7 +346,12 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			txe->mbuf = m_seg;
 
 			/* Setup TX Descriptor */
-			slen = m_seg->data_len;
+			/* Calculate segment length, using IPsec callback if provided */
+			if (ipsec_ops != NULL)
+				slen = ipsec_ops->calc_segment_len(m_seg, ol_flags, ipsec_md, 0);
+			else
+				slen = m_seg->data_len;
+
 			buf_dma_addr = rte_mbuf_data_iova(m_seg);
 
 			while ((ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) &&
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 08a74fb1d4..6f3b22db02 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -1020,8 +1020,8 @@ get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
 uint16_t
 i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
-	/* i40e does not support timestamp queues, so pass NULL for ts_fns */
-	return ci_xmit_pkts(tx_queue, tx_pkts, nb_pkts, get_context_desc, NULL);
+	/* i40e does not support IPsec or timestamp queues, so pass NULL for both */
+	return ci_xmit_pkts(tx_queue, tx_pkts, nb_pkts, get_context_desc, NULL, NULL);
 }
 
 static __rte_always_inline int
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 65e2d401e8..760a994165 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3095,9 +3095,9 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
 
 	if (txq->tsq != NULL && txq->tsq->ts_flag > 0)
-		return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, &ts_fns);
+		return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, NULL, &ts_fns);
 
-	return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, NULL);
+	return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, NULL, NULL);
 }
 
 static __rte_always_inline int
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 15/36] net/intel: support configurable VLAN tag insertion on Tx
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (13 preceding siblings ...)
  2026-01-13 15:14   ` [PATCH v2 14/36] net/intel: add IPsec hooks to common " Bruce Richardson
@ 2026-01-13 15:14   ` Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 16/36] net/iavf: use common scalar Tx function Bruce Richardson
                     ` (22 subsequent siblings)
  37 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-13 15:14 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

Make the VLAN tag insertion logic configurable in the common code, as to
where inner/outer tags get placed.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h            | 10 ++++++++++
 drivers/net/intel/common/tx_scalar_fns.h |  9 +++++++--
 drivers/net/intel/i40e/i40e_rxtx.c       |  6 +++---
 drivers/net/intel/ice/ice_rxtx.c         |  5 +++--
 4 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index a71b98f119..0d11daaab3 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -45,6 +45,16 @@
 #define CI_TX_CTX_DESC_TSYN             0x02
 #define CI_TX_CTX_DESC_IL2TAG2          0x04
 
+/**
+ * L2TAG1 Field Source Selection
+ * Specifies which mbuf VLAN field to use for the L2TAG1 field in data descriptors.
+ * Context descriptor VLAN handling (L2TAG2) is managed by driver-specific callbacks.
+ */
+enum ci_tx_l2tag1_field {
+	CI_VLAN_IN_L2TAG1,       /**< For VLAN (not QinQ), use L2Tag1 field in data desc */
+	CI_VLAN_IN_L2TAG2,       /**< For VLAN (not QinQ), use L2Tag2 field in ctx desc */
+};
+
 /* Common TX Descriptor Length Field Shifts */
 #define CI_TX_DESC_LEN_MACLEN_S         0  /* 7 BITS */
 #define CI_TX_DESC_LEN_IPLEN_S          7  /* 7 BITS */
diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index 8c0de26537..0dfe2060c0 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -188,6 +188,7 @@ static inline uint16_t
 ci_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
 	     uint16_t nb_pkts,
+	     enum ci_tx_l2tag1_field l2tag1_field,
 	     ci_get_ctx_desc_fn get_ctx_desc,
 	     const struct ci_ipsec_ops *ipsec_ops,
 	     const struct ci_timesstamp_queue_fns *ts_fns)
@@ -286,8 +287,12 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			}
 		}
 
-		/* Descriptor based VLAN insertion */
-		if (ol_flags & (RTE_MBUF_F_TX_VLAN | RTE_MBUF_F_TX_QINQ)) {
+		/* Descriptor based VLAN/QinQ insertion */
+		/* for single vlan offload, only insert in data desc with VLAN_IN_L2TAG1 is set
+		 * for qinq offload, we always put inner tag in L2Tag1
+		 */
+		if (((ol_flags & RTE_MBUF_F_TX_VLAN) && l2tag1_field == CI_VLAN_IN_L2TAG1) ||
+				(ol_flags & RTE_MBUF_F_TX_QINQ)) {
 			td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
 			td_tag = tx_pkt->vlan_tci;
 		}
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 6f3b22db02..f0047e4f59 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -1006,8 +1006,7 @@ get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
 	/* TX context descriptor based double VLAN insert */
 	if (ol_flags & RTE_MBUF_F_TX_QINQ) {
 		cd_l2tag2 = tx_pkt->vlan_tci_outer;
-		cd_type_cmd_tso_mss |=
-				((uint64_t)I40E_TX_CTX_DESC_IL2TAG2 << I40E_TXD_CTX_QW1_CMD_SHIFT);
+		cd_type_cmd_tso_mss |= (I40E_TX_CTX_DESC_IL2TAG2 << I40E_TXD_CTX_QW1_CMD_SHIFT);
 	}
 
 	*qw0 = rte_cpu_to_le_32(cd_tunneling_params) |
@@ -1021,7 +1020,8 @@ uint16_t
 i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
 	/* i40e does not support IPsec or timestamp queues, so pass NULL for both */
-	return ci_xmit_pkts(tx_queue, tx_pkts, nb_pkts, get_context_desc, NULL, NULL);
+	return ci_xmit_pkts(tx_queue, tx_pkts, nb_pkts, CI_VLAN_IN_L2TAG1,
+			get_context_desc, NULL, NULL);
 }
 
 static __rte_always_inline int
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 760a994165..c67d6223b3 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3095,9 +3095,10 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
 
 	if (txq->tsq != NULL && txq->tsq->ts_flag > 0)
-		return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, NULL, &ts_fns);
+		return ci_xmit_pkts(txq, tx_pkts, nb_pkts, CI_VLAN_IN_L2TAG1,
+				get_context_desc, NULL, &ts_fns);
 
-	return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, NULL, NULL);
+	return ci_xmit_pkts(txq, tx_pkts, nb_pkts, CI_VLAN_IN_L2TAG1, get_context_desc, NULL, NULL);
 }
 
 static __rte_always_inline int
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 16/36] net/iavf: use common scalar Tx function
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (14 preceding siblings ...)
  2026-01-13 15:14   ` [PATCH v2 15/36] net/intel: support configurable VLAN tag insertion on Tx Bruce Richardson
@ 2026-01-13 15:14   ` Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 17/36] net/i40e: document requirement for QinQ support Bruce Richardson
                     ` (21 subsequent siblings)
  37 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-13 15:14 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Vladimir Medvedkin

Now that the common scalar Tx function has all necessary hooks for the
features supported by the iavf driver, use the common function to avoid
duplicated code.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/iavf/iavf_rxtx.c | 534 ++++++-----------------------
 1 file changed, 109 insertions(+), 425 deletions(-)

diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index 75f2e143f9..8810d5bb63 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -2331,7 +2331,7 @@ iavf_recv_pkts_bulk_alloc(void *rx_queue,
 
 /* Check if the context descriptor is needed for TX offloading */
 static inline uint16_t
-iavf_calc_context_desc(struct rte_mbuf *mb, uint8_t vlan_flag)
+iavf_calc_context_desc(const struct rte_mbuf *mb, uint8_t vlan_flag)
 {
 	uint64_t flags = mb->ol_flags;
 	if (flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG |
@@ -2349,44 +2349,7 @@ iavf_calc_context_desc(struct rte_mbuf *mb, uint8_t vlan_flag)
 }
 
 static inline void
-iavf_fill_ctx_desc_cmd_field(volatile uint64_t *field, struct rte_mbuf *m,
-		uint8_t vlan_flag)
-{
-	uint64_t cmd = 0;
-
-	/* TSO enabled */
-	if (m->ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
-		cmd = CI_TX_CTX_DESC_TSO << IAVF_TXD_CTX_QW1_CMD_SHIFT;
-
-	if ((m->ol_flags & RTE_MBUF_F_TX_VLAN &&
-			vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2) ||
-			m->ol_flags & RTE_MBUF_F_TX_QINQ) {
-		cmd |= CI_TX_CTX_DESC_IL2TAG2
-			<< IAVF_TXD_CTX_QW1_CMD_SHIFT;
-	}
-
-	if (IAVF_CHECK_TX_LLDP(m))
-		cmd |= IAVF_TX_CTX_DESC_SWTCH_UPLINK
-			<< IAVF_TXD_CTX_QW1_CMD_SHIFT;
-
-	*field |= cmd;
-}
-
-static inline void
-iavf_fill_ctx_desc_ipsec_field(volatile uint64_t *field,
-	struct iavf_ipsec_crypto_pkt_metadata *ipsec_md)
-{
-	uint64_t ipsec_field =
-		(uint64_t)ipsec_md->ctx_desc_ipsec_params <<
-			IAVF_TXD_CTX_QW1_IPSEC_PARAMS_CIPHERBLK_SHIFT;
-
-	*field |= ipsec_field;
-}
-
-
-static inline void
-iavf_fill_ctx_desc_tunnelling_field(volatile uint64_t *qw0,
-		const struct rte_mbuf *m)
+iavf_fill_ctx_desc_tunnelling_field(uint64_t *qw0, const struct rte_mbuf *m)
 {
 	uint64_t eip_typ = IAVF_TX_CTX_DESC_EIPT_NONE;
 	uint64_t eip_len = 0;
@@ -2461,7 +2424,7 @@ iavf_fill_ctx_desc_tunnelling_field(volatile uint64_t *qw0,
 
 static inline uint16_t
 iavf_fill_ctx_desc_segmentation_field(volatile uint64_t *field,
-	struct rte_mbuf *m, struct iavf_ipsec_crypto_pkt_metadata *ipsec_md)
+	const struct rte_mbuf *m, struct iavf_ipsec_crypto_pkt_metadata *ipsec_md)
 {
 	uint64_t segmentation_field = 0;
 	uint64_t total_length = 0;
@@ -2500,59 +2463,31 @@ struct iavf_tx_context_desc_qws {
 	__le64 qw1;
 };
 
-static inline void
-iavf_fill_context_desc(volatile struct iavf_tx_context_desc *desc,
-	struct rte_mbuf *m, struct iavf_ipsec_crypto_pkt_metadata *ipsec_md,
-	uint16_t *tlen, uint8_t vlan_flag)
+/* IPsec callback for ci_xmit_pkts - gets IPsec descriptor information */
+static uint16_t
+iavf_get_ipsec_desc(const struct rte_mbuf *mbuf, const struct ci_tx_queue *txq,
+		    void **ipsec_metadata, uint64_t *qw0, uint64_t *qw1)
 {
-	volatile struct iavf_tx_context_desc_qws *desc_qws =
-			(volatile struct iavf_tx_context_desc_qws *)desc;
-	/* fill descriptor type field */
-	desc_qws->qw1 = IAVF_TX_DESC_DTYPE_CONTEXT;
-
-	/* fill command field */
-	iavf_fill_ctx_desc_cmd_field(&desc_qws->qw1, m, vlan_flag);
-
-	/* fill segmentation field */
-	if (m->ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) {
-		/* fill IPsec field */
-		if (m->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD)
-			iavf_fill_ctx_desc_ipsec_field(&desc_qws->qw1,
-				ipsec_md);
-
-		*tlen = iavf_fill_ctx_desc_segmentation_field(&desc_qws->qw1,
-				m, ipsec_md);
-	}
-
-	/* fill tunnelling field */
-	if (m->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)
-		iavf_fill_ctx_desc_tunnelling_field(&desc_qws->qw0, m);
-	else
-		desc_qws->qw0 = 0;
+	struct iavf_ipsec_crypto_pkt_metadata *md;
 
-	desc_qws->qw0 = rte_cpu_to_le_64(desc_qws->qw0);
-	desc_qws->qw1 = rte_cpu_to_le_64(desc_qws->qw1);
+	if (!(mbuf->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD))
+		return 0;
 
-	/* vlan_flag specifies VLAN tag location for VLAN, and outer tag location for QinQ. */
-	if (m->ol_flags & RTE_MBUF_F_TX_QINQ)
-		desc->l2tag2 = vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2 ? m->vlan_tci_outer :
-						m->vlan_tci;
-	else if (m->ol_flags & RTE_MBUF_F_TX_VLAN && vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2)
-		desc->l2tag2 = m->vlan_tci;
-}
+	md = RTE_MBUF_DYNFIELD(mbuf, txq->ipsec_crypto_pkt_md_offset,
+				     struct iavf_ipsec_crypto_pkt_metadata *);
+	if (!md)
+		return 0;
 
+	*ipsec_metadata = md;
 
-static inline void
-iavf_fill_ipsec_desc(volatile struct iavf_tx_ipsec_desc *desc,
-	const struct iavf_ipsec_crypto_pkt_metadata *md, uint16_t *ipsec_len)
-{
-	desc->qw0 = rte_cpu_to_le_64(((uint64_t)md->l4_payload_len <<
+	/* Fill IPsec descriptor using existing logic */
+	*qw0 = rte_cpu_to_le_64(((uint64_t)md->l4_payload_len <<
 		IAVF_IPSEC_TX_DESC_QW0_L4PAYLEN_SHIFT) |
 		((uint64_t)md->esn << IAVF_IPSEC_TX_DESC_QW0_IPSECESN_SHIFT) |
 		((uint64_t)md->esp_trailer_len <<
 				IAVF_IPSEC_TX_DESC_QW0_TRAILERLEN_SHIFT));
 
-	desc->qw1 = rte_cpu_to_le_64(((uint64_t)md->sa_idx <<
+	*qw1 = rte_cpu_to_le_64(((uint64_t)md->sa_idx <<
 		IAVF_IPSEC_TX_DESC_QW1_IPSECSA_SHIFT) |
 		((uint64_t)md->next_proto <<
 				IAVF_IPSEC_TX_DESC_QW1_IPSECNH_SHIFT) |
@@ -2561,143 +2496,106 @@ iavf_fill_ipsec_desc(volatile struct iavf_tx_ipsec_desc *desc,
 		((uint64_t)(md->ol_flags & IAVF_IPSEC_CRYPTO_OL_FLAGS_NATT ?
 				1ULL : 0ULL) <<
 				IAVF_IPSEC_TX_DESC_QW1_UDP_SHIFT) |
-		(uint64_t)IAVF_TX_DESC_DTYPE_IPSEC);
+		((uint64_t)IAVF_TX_DESC_DTYPE_IPSEC <<
+				CI_TXD_QW1_DTYPE_S));
 
-	/**
-	 * TODO: Pre-calculate this in the Session initialization
-	 *
-	 * Calculate IPsec length required in data descriptor func when TSO
-	 * offload is enabled
-	 */
-	*ipsec_len = sizeof(struct rte_esp_hdr) + (md->len_iv >> 2) +
-			(md->ol_flags & IAVF_IPSEC_CRYPTO_OL_FLAGS_NATT ?
-			sizeof(struct rte_udp_hdr) : 0);
+	return 1; /* One IPsec descriptor needed */
 }
 
-static inline void
-iavf_build_data_desc_cmd_offset_fields(volatile uint64_t *qw1,
-		struct rte_mbuf *m, uint8_t vlan_flag)
+/* IPsec callback for ci_xmit_pkts - calculates segment length for IPsec+TSO */
+static uint16_t
+iavf_calc_ipsec_segment_len(const struct rte_mbuf *mb_seg, uint64_t ol_flags,
+			    const void *ipsec_metadata, uint16_t tlen)
 {
-	uint64_t command = 0;
-	uint64_t offset = 0;
-	uint64_t l2tag1 = 0;
-
-	*qw1 = CI_TX_DESC_DTYPE_DATA;
-
-	command = (uint64_t)CI_TX_DESC_CMD_ICRC;
-
-	/* Descriptor based VLAN insertion */
-	if ((vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1) &&
-			m->ol_flags & RTE_MBUF_F_TX_VLAN) {
-		command |= (uint64_t)CI_TX_DESC_CMD_IL2TAG1;
-		l2tag1 |= m->vlan_tci;
-	}
-
-	/* Descriptor based QinQ insertion. vlan_flag specifies outer tag location. */
-	if (m->ol_flags & RTE_MBUF_F_TX_QINQ) {
-		command |= (uint64_t)CI_TX_DESC_CMD_IL2TAG1;
-		l2tag1 = vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1 ? m->vlan_tci_outer :
-									m->vlan_tci;
+	const struct iavf_ipsec_crypto_pkt_metadata *ipsec_md = ipsec_metadata;
+
+	if ((ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD) &&
+	    (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))) {
+		uint16_t ipseclen = ipsec_md ? (ipsec_md->esp_trailer_len +
+						ipsec_md->len_iv) : 0;
+		uint16_t slen = tlen + mb_seg->l2_len + mb_seg->l3_len +
+				mb_seg->outer_l3_len + ipseclen;
+		if (ol_flags & RTE_MBUF_F_TX_L4_MASK)
+			slen += mb_seg->l4_len;
+		return slen;
 	}
 
-	if ((m->ol_flags &
-	    (CI_TX_CKSUM_OFFLOAD_MASK | RTE_MBUF_F_TX_SEC_OFFLOAD)) == 0)
-		goto skip_cksum;
+	return mb_seg->data_len;
+}
 
-	/* Set MACLEN */
-	if (m->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK &&
-			!(m->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD))
-		offset |= (m->outer_l2_len >> 1)
-			<< CI_TX_DESC_LEN_MACLEN_S;
-	else
-		offset |= (m->l2_len >> 1)
-			<< CI_TX_DESC_LEN_MACLEN_S;
+/* Context descriptor callback for ci_xmit_pkts */
+static uint16_t
+iavf_get_context_desc(uint64_t ol_flags, const struct rte_mbuf *mbuf,
+		      const union ci_tx_offload *tx_offload __rte_unused,
+		      const struct ci_tx_queue *txq,
+		      uint32_t *td_offset __rte_unused, uint64_t *qw0, uint64_t *qw1)
+{
+	uint8_t iavf_vlan_flag;
+	uint16_t cd_l2tag2 = 0;
+	uint64_t cd_type_cmd = IAVF_TX_DESC_DTYPE_CONTEXT;
+	uint64_t cd_tunneling_params = 0;
+	uint16_t tlen = 0;
+	struct iavf_ipsec_crypto_pkt_metadata *ipsec_md = NULL;
+
+	/* Use IAVF-specific vlan_flag from txq */
+	iavf_vlan_flag = txq->vlan_flag;
+
+	/* Check if context descriptor is needed using existing IAVF logic */
+	if (!iavf_calc_context_desc(mbuf, iavf_vlan_flag))
+		return 0;
 
-	/* Enable L3 checksum offloading inner */
-	if (m->ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
-		if (m->ol_flags & RTE_MBUF_F_TX_IPV4) {
-			command |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
-			offset |= (m->l3_len >> 2) << CI_TX_DESC_LEN_IPLEN_S;
-		}
-	} else if (m->ol_flags & RTE_MBUF_F_TX_IPV4) {
-		command |= CI_TX_DESC_CMD_IIPT_IPV4;
-		offset |= (m->l3_len >> 2) << CI_TX_DESC_LEN_IPLEN_S;
-	} else if (m->ol_flags & RTE_MBUF_F_TX_IPV6) {
-		command |= CI_TX_DESC_CMD_IIPT_IPV6;
-		offset |= (m->l3_len >> 2) << CI_TX_DESC_LEN_IPLEN_S;
+	/* Get IPsec metadata if needed */
+	if (ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD) {
+		ipsec_md = RTE_MBUF_DYNFIELD(mbuf, txq->ipsec_crypto_pkt_md_offset,
+					     struct iavf_ipsec_crypto_pkt_metadata *);
 	}
 
-	if (m->ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) {
-		if (m->ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-			command |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
-		else
-			command |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
-		offset |= (m->l4_len >> 2) <<
-			      CI_TX_DESC_LEN_L4_LEN_S;
+	/* TSO command field */
+	if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) {
+		cd_type_cmd |= (uint64_t)CI_TX_CTX_DESC_TSO << IAVF_TXD_CTX_QW1_CMD_SHIFT;
 
-		*qw1 = rte_cpu_to_le_64((((uint64_t)command <<
-			IAVF_TXD_DATA_QW1_CMD_SHIFT) & IAVF_TXD_DATA_QW1_CMD_MASK) |
-			(((uint64_t)offset << IAVF_TXD_DATA_QW1_OFFSET_SHIFT) &
-			IAVF_TXD_DATA_QW1_OFFSET_MASK) |
-			((uint64_t)l2tag1 << IAVF_TXD_DATA_QW1_L2TAG1_SHIFT));
+		/* IPsec field for TSO */
+		if (ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD && ipsec_md) {
+			uint64_t ipsec_field = (uint64_t)ipsec_md->ctx_desc_ipsec_params <<
+				IAVF_TXD_CTX_QW1_IPSEC_PARAMS_CIPHERBLK_SHIFT;
+			cd_type_cmd |= ipsec_field;
+		}
 
-		return;
+		/* TSO segmentation field */
+		tlen = iavf_fill_ctx_desc_segmentation_field(&cd_type_cmd,
+							     mbuf, ipsec_md);
+		(void)tlen; /* Suppress unused variable warning */
 	}
 
-	/* Enable L4 checksum offloads */
-	switch (m->ol_flags & RTE_MBUF_F_TX_L4_MASK) {
-	case RTE_MBUF_F_TX_TCP_CKSUM:
-		command |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
-		offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
-				CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	case RTE_MBUF_F_TX_SCTP_CKSUM:
-		command |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
-		offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
-				CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	case RTE_MBUF_F_TX_UDP_CKSUM:
-		command |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
-		offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
-				CI_TX_DESC_LEN_L4_LEN_S;
-		break;
+	/* VLAN field for L2TAG2 */
+	if ((ol_flags & RTE_MBUF_F_TX_VLAN &&
+	     iavf_vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2) ||
+	    ol_flags & RTE_MBUF_F_TX_QINQ) {
+		cd_type_cmd |= (uint64_t)CI_TX_CTX_DESC_IL2TAG2 << IAVF_TXD_CTX_QW1_CMD_SHIFT;
 	}
 
-skip_cksum:
-	*qw1 = rte_cpu_to_le_64((((uint64_t)command <<
-		IAVF_TXD_DATA_QW1_CMD_SHIFT) & IAVF_TXD_DATA_QW1_CMD_MASK) |
-		(((uint64_t)offset << IAVF_TXD_DATA_QW1_OFFSET_SHIFT) &
-		IAVF_TXD_DATA_QW1_OFFSET_MASK) |
-		((uint64_t)l2tag1 << IAVF_TXD_DATA_QW1_L2TAG1_SHIFT));
-}
-
-static inline void
-iavf_fill_data_desc(volatile struct ci_tx_desc *desc,
-	uint64_t desc_template,	uint16_t buffsz,
-	uint64_t buffer_addr)
-{
-	/* fill data descriptor qw1 from template */
-	desc->cmd_type_offset_bsz = desc_template;
-
-	/* set data buffer size */
-	desc->cmd_type_offset_bsz |=
-		(((uint64_t)buffsz << IAVF_TXD_DATA_QW1_TX_BUF_SZ_SHIFT) &
-		IAVF_TXD_DATA_QW1_TX_BUF_SZ_MASK);
-
-	desc->buffer_addr = rte_cpu_to_le_64(buffer_addr);
-	desc->cmd_type_offset_bsz = rte_cpu_to_le_64(desc->cmd_type_offset_bsz);
-}
-
+	/* LLDP switching field */
+	if (IAVF_CHECK_TX_LLDP(mbuf))
+		cd_type_cmd |= IAVF_TX_CTX_DESC_SWTCH_UPLINK << IAVF_TXD_CTX_QW1_CMD_SHIFT;
+
+	/* Tunneling field */
+	if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)
+		iavf_fill_ctx_desc_tunnelling_field((uint64_t *)&cd_tunneling_params, mbuf);
+
+	/* L2TAG2 field (VLAN) */
+	if (ol_flags & RTE_MBUF_F_TX_QINQ) {
+		cd_l2tag2 = iavf_vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2 ?
+			    mbuf->vlan_tci_outer : mbuf->vlan_tci;
+	} else if (ol_flags & RTE_MBUF_F_TX_VLAN &&
+		   iavf_vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2) {
+		cd_l2tag2 = mbuf->vlan_tci;
+	}
 
-static struct iavf_ipsec_crypto_pkt_metadata *
-iavf_ipsec_crypto_get_pkt_metadata(const struct ci_tx_queue *txq,
-		struct rte_mbuf *m)
-{
-	if (m->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD)
-		return RTE_MBUF_DYNFIELD(m, txq->ipsec_crypto_pkt_md_offset,
-				struct iavf_ipsec_crypto_pkt_metadata *);
+	/* Set outputs */
+	*qw0 = rte_cpu_to_le_64(cd_tunneling_params | ((uint64_t)cd_l2tag2 << 32));
+	*qw1 = rte_cpu_to_le_64(cd_type_cmd);
 
-	return NULL;
+	return 1; /* One context descriptor needed */
 }
 
 /* TX function */
@@ -2705,231 +2603,17 @@ uint16_t
 iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = tx_queue;
-	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
-	struct ci_tx_entry *txe_ring = txq->sw_ring;
-	struct ci_tx_entry *txe, *txn;
-	struct rte_mbuf *mb, *mb_seg;
-	uint64_t buf_dma_addr;
-	uint16_t desc_idx, desc_idx_last;
-	uint16_t idx;
-	uint16_t slen;
-
-
-	/* Check if the descriptor ring needs to be cleaned. */
-	if (txq->nb_tx_free < txq->tx_free_thresh)
-		ci_tx_xmit_cleanup(txq);
-
-	desc_idx = txq->tx_tail;
-	txe = &txe_ring[desc_idx];
-
-	for (idx = 0; idx < nb_pkts; idx++) {
-		volatile struct ci_tx_desc *ddesc;
-		struct iavf_ipsec_crypto_pkt_metadata *ipsec_md;
-
-		uint16_t nb_desc_ctx, nb_desc_ipsec;
-		uint16_t nb_desc_data, nb_desc_required;
-		uint16_t tlen = 0, ipseclen = 0;
-		uint64_t ddesc_template = 0;
-		uint64_t ddesc_cmd = 0;
-
-		mb = tx_pkts[idx];
 
-		RTE_MBUF_PREFETCH_TO_FREE(txe->mbuf);
-
-		/**
-		 * Get metadata for ipsec crypto from mbuf dynamic fields if
-		 * security offload is specified.
-		 */
-		ipsec_md = iavf_ipsec_crypto_get_pkt_metadata(txq, mb);
-
-		nb_desc_data = mb->nb_segs;
-		nb_desc_ctx =
-			iavf_calc_context_desc(mb, txq->vlan_flag);
-		nb_desc_ipsec = !!(mb->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD);
-
-		/**
-		 * The number of descriptors that must be allocated for
-		 * a packet equals to the number of the segments of that
-		 * packet plus the context and ipsec descriptors if needed.
-		 * Recalculate the needed tx descs when TSO enabled in case
-		 * the mbuf data size exceeds max data size that hw allows
-		 * per tx desc.
-		 */
-		if (mb->ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-			nb_desc_required = ci_calc_pkt_desc(mb) + nb_desc_ctx + nb_desc_ipsec;
-		else
-			nb_desc_required = nb_desc_data + nb_desc_ctx + nb_desc_ipsec;
-
-		desc_idx_last = (uint16_t)(desc_idx + nb_desc_required - 1);
-
-		/* wrap descriptor ring */
-		if (desc_idx_last >= txq->nb_tx_desc)
-			desc_idx_last =
-				(uint16_t)(desc_idx_last - txq->nb_tx_desc);
-
-		PMD_TX_LOG(DEBUG,
-			"port_id=%u queue_id=%u tx_first=%u tx_last=%u",
-			txq->port_id, txq->queue_id, desc_idx, desc_idx_last);
-
-		if (nb_desc_required > txq->nb_tx_free) {
-			if (ci_tx_xmit_cleanup(txq)) {
-				if (idx == 0)
-					return 0;
-				goto end_of_tx;
-			}
-			if (unlikely(nb_desc_required > txq->tx_rs_thresh)) {
-				while (nb_desc_required > txq->nb_tx_free) {
-					if (ci_tx_xmit_cleanup(txq)) {
-						if (idx == 0)
-							return 0;
-						goto end_of_tx;
-					}
-				}
-			}
-		}
-
-		iavf_build_data_desc_cmd_offset_fields(&ddesc_template, mb,
-			txq->vlan_flag);
-
-			/* Setup TX context descriptor if required */
-		if (nb_desc_ctx) {
-			volatile struct iavf_tx_context_desc *ctx_desc =
-				(volatile struct iavf_tx_context_desc *)
-					&txr[desc_idx];
-
-			/* clear QW0 or the previous writeback value
-			 * may impact next write
-			 */
-			*(volatile uint64_t *)ctx_desc = 0;
-
-			txn = &txe_ring[txe->next_id];
-			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
-
-			if (txe->mbuf) {
-				rte_pktmbuf_free_seg(txe->mbuf);
-				txe->mbuf = NULL;
-			}
-
-			iavf_fill_context_desc(ctx_desc, mb, ipsec_md, &tlen,
-				txq->vlan_flag);
-			IAVF_DUMP_TX_DESC(txq, ctx_desc, desc_idx);
-
-			txe->last_id = desc_idx_last;
-			desc_idx = txe->next_id;
-			txe = txn;
-		}
-
-		if (nb_desc_ipsec) {
-			volatile struct iavf_tx_ipsec_desc *ipsec_desc =
-				(volatile struct iavf_tx_ipsec_desc *)
-					&txr[desc_idx];
-
-			txn = &txe_ring[txe->next_id];
-			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
-
-			if (txe->mbuf) {
-				rte_pktmbuf_free_seg(txe->mbuf);
-				txe->mbuf = NULL;
-			}
-
-			iavf_fill_ipsec_desc(ipsec_desc, ipsec_md, &ipseclen);
-
-			IAVF_DUMP_TX_DESC(txq, ipsec_desc, desc_idx);
-
-			txe->last_id = desc_idx_last;
-			desc_idx = txe->next_id;
-			txe = txn;
-		}
-
-		mb_seg = mb;
-
-		do {
-			ddesc = (volatile struct ci_tx_desc *)
-					&txr[desc_idx];
-
-			txn = &txe_ring[txe->next_id];
-			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
-
-			if (txe->mbuf)
-				rte_pktmbuf_free_seg(txe->mbuf);
-
-			txe->mbuf = mb_seg;
-
-			if ((mb_seg->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD) &&
-					(mb_seg->ol_flags &
-						(RTE_MBUF_F_TX_TCP_SEG |
-						RTE_MBUF_F_TX_UDP_SEG))) {
-				slen = tlen + mb_seg->l2_len + mb_seg->l3_len +
-						mb_seg->outer_l3_len + ipseclen;
-				if (mb_seg->ol_flags & RTE_MBUF_F_TX_L4_MASK)
-					slen += mb_seg->l4_len;
-			} else {
-				slen = mb_seg->data_len;
-			}
-
-			buf_dma_addr = rte_mbuf_data_iova(mb_seg);
-			while ((mb_seg->ol_flags & (RTE_MBUF_F_TX_TCP_SEG |
-					RTE_MBUF_F_TX_UDP_SEG)) &&
-					unlikely(slen > CI_MAX_DATA_PER_TXD)) {
-				iavf_fill_data_desc(ddesc, ddesc_template,
-					CI_MAX_DATA_PER_TXD, buf_dma_addr);
-
-				IAVF_DUMP_TX_DESC(txq, ddesc, desc_idx);
-
-				buf_dma_addr += CI_MAX_DATA_PER_TXD;
-				slen -= CI_MAX_DATA_PER_TXD;
-
-				txe->last_id = desc_idx_last;
-				desc_idx = txe->next_id;
-				txe = txn;
-				ddesc = &txr[desc_idx];
-				txn = &txe_ring[txe->next_id];
-			}
-
-			iavf_fill_data_desc(ddesc, ddesc_template,
-					slen, buf_dma_addr);
-
-			IAVF_DUMP_TX_DESC(txq, ddesc, desc_idx);
-
-			txe->last_id = desc_idx_last;
-			desc_idx = txe->next_id;
-			txe = txn;
-			mb_seg = mb_seg->next;
-		} while (mb_seg);
-
-		/* The last packet data descriptor needs End Of Packet (EOP) */
-		ddesc_cmd = CI_TX_DESC_CMD_EOP;
-
-		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_desc_required);
-		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_desc_required);
-
-		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
-			PMD_TX_LOG(DEBUG, "Setting RS bit on TXD id="
-				   "%4u (port=%d queue=%d)",
-				   desc_idx_last, txq->port_id, txq->queue_id);
-
-			ddesc_cmd |= CI_TX_DESC_CMD_RS;
-
-			/* Update txq RS bit counters */
-			txq->nb_tx_used = 0;
-		}
-
-		ddesc->cmd_type_offset_bsz |= rte_cpu_to_le_64(ddesc_cmd <<
-				IAVF_TXD_DATA_QW1_CMD_SHIFT);
-
-		IAVF_DUMP_TX_DESC(txq, ddesc, desc_idx - 1);
-	}
-
-end_of_tx:
-	rte_wmb();
-
-	PMD_TX_LOG(DEBUG, "port_id=%u queue_id=%u tx_tail=%u nb_tx=%u",
-		   txq->port_id, txq->queue_id, desc_idx, idx);
-
-	IAVF_PCI_REG_WRITE_RELAXED(txq->qtx_tail, desc_idx);
-	txq->tx_tail = desc_idx;
+	const struct ci_ipsec_ops ipsec_ops = {
+		.get_ipsec_desc = iavf_get_ipsec_desc,
+		.calc_segment_len = iavf_calc_ipsec_segment_len,
+	};
 
-	return idx;
+	/* IAVF does not support timestamp queues, so pass NULL for ts_fns */
+	return ci_xmit_pkts(txq, tx_pkts, nb_pkts,
+			    (txq->vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1) ?
+				CI_VLAN_IN_L2TAG1 : CI_VLAN_IN_L2TAG2,
+			    iavf_get_context_desc, &ipsec_ops, NULL);
 }
 
 /* Check if the packet with vlan user priority is transmitted in the
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 17/36] net/i40e: document requirement for QinQ support
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (15 preceding siblings ...)
  2026-01-13 15:14   ` [PATCH v2 16/36] net/iavf: use common scalar Tx function Bruce Richardson
@ 2026-01-13 15:14   ` Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 18/36] net/idpf: use common scalar Tx function Bruce Richardson
                     ` (20 subsequent siblings)
  37 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-13 15:14 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

In order to get multiple VLANs inserted in an outgoing packet with QinQ
offload the i40e driver needs to be set to double vlan mode. This is
done by using the VLAN_EXTEND Rx config flag. Add a code check for this
dependency and update the docs about it.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 doc/guides/nics/i40e.rst           | 18 ++++++++++++++++++
 drivers/net/intel/i40e/i40e_rxtx.c |  9 +++++++++
 2 files changed, 27 insertions(+)

diff --git a/doc/guides/nics/i40e.rst b/doc/guides/nics/i40e.rst
index 45dc083c94..cbfaddbdd8 100644
--- a/doc/guides/nics/i40e.rst
+++ b/doc/guides/nics/i40e.rst
@@ -245,6 +245,24 @@ Runtime Configuration
   * ``segment``: Check number of mbuf segments not exceed hw limitation.
   * ``offload``: Check any unsupported offload flag.
 
+QinQ Configuration
+~~~~~~~~~~~~~~~~~~
+
+When using QinQ TX offload (``RTE_ETH_TX_OFFLOAD_QINQ_INSERT``), you must also
+enable ``RTE_ETH_RX_OFFLOAD_VLAN_EXTEND`` to configure the hardware for double
+VLAN mode. Without this, only the inner VLAN tag will be inserted.
+
+Example::
+
+  struct rte_eth_conf port_conf = {
+      .rxmode = {
+          .offloads = RTE_ETH_RX_OFFLOAD_VLAN_EXTEND,
+      },
+      .txmode = {
+          .offloads = RTE_ETH_TX_OFFLOAD_QINQ_INSERT,
+      },
+  };
+
 Vector RX Pre-conditions
 ~~~~~~~~~~~~~~~~~~~~~~~~
 For Vector RX it is assumed that the number of descriptor rings will be a power
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index f0047e4f59..0ccd8e8b2a 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -2169,6 +2169,15 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	vsi = i40e_pf_get_vsi_by_qindex(pf, queue_idx);
 	if (!vsi)
 		return -EINVAL;
+
+	/* Check if QinQ TX offload requires VLAN extend mode */
+	if ((offloads & RTE_ETH_TX_OFFLOAD_QINQ_INSERT) &&
+			!(dev->data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_VLAN_EXTEND)) {
+		PMD_DRV_LOG(WARNING, "Port %u: QinQ TX offload is enabled but VLAN extend mode is not set. ",
+				dev->data->port_id);
+		PMD_DRV_LOG(WARNING, "Double VLAN insertion may not work correctly without RTE_ETH_RX_OFFLOAD_VLAN_EXTEND set in Rx configuration.");
+	}
+
 	q_offset = i40e_get_queue_offset_by_qindex(pf, queue_idx);
 	if (q_offset < 0)
 		return -EINVAL;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 18/36] net/idpf: use common scalar Tx function
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (16 preceding siblings ...)
  2026-01-13 15:14   ` [PATCH v2 17/36] net/i40e: document requirement for QinQ support Bruce Richardson
@ 2026-01-13 15:14   ` Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 19/36] net/intel: avoid writing the final pkt descriptor twice Bruce Richardson
                     ` (19 subsequent siblings)
  37 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-13 15:14 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Jingjing Wu, Praveen Shetty

Update idpf driver to use the common scalar Tx function in single-queue
configuration.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/idpf/idpf_common_rxtx.c | 179 ++--------------------
 1 file changed, 11 insertions(+), 168 deletions(-)

diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index b34d545a0a..81bc45f6ef 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -8,7 +8,6 @@
 
 #include "idpf_common_rxtx.h"
 #include "idpf_common_device.h"
-#include "../common/rx.h"
 
 int idpf_timestamp_dynfield_offset = -1;
 uint64_t idpf_timestamp_dynflag;
@@ -848,9 +847,11 @@ idpf_calc_context_desc(uint64_t flags)
 /* set TSO context descriptor, returns 0 if no context needed, 1 if context set
  */
 static inline uint16_t
-idpf_set_tso_ctx(uint64_t ol_flags, struct rte_mbuf *mbuf,
-			union ci_tx_offload tx_offload,
-			uint64_t *qw0, uint64_t *qw1)
+idpf_set_tso_ctx(uint64_t ol_flags, const struct rte_mbuf *mbuf,
+		 const union ci_tx_offload *tx_offload,
+		 const struct ci_tx_queue *txq __rte_unused,
+		 uint32_t *td_offset __rte_unused,
+		 uint64_t *qw0, uint64_t *qw1)
 {
 	uint16_t cmd_dtype = IDPF_TX_DESC_DTYPE_FLEX_TSO_CTX | IDPF_TX_FLEX_CTX_DESC_CMD_TSO;
 	uint16_t tso_segsz = mbuf->tso_segsz;
@@ -861,12 +862,12 @@ idpf_set_tso_ctx(uint64_t ol_flags, struct rte_mbuf *mbuf,
 		return 0;
 
 	/* TSO context descriptor setup */
-	if (tx_offload.l4_len == 0) {
+	if (tx_offload->l4_len == 0) {
 		TX_LOG(DEBUG, "L4 length set to 0");
 		return 0;
 	}
 
-	hdr_len = tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len;
+	hdr_len = tx_offload->l2_len + tx_offload->l3_len + tx_offload->l4_len;
 	tso_len = mbuf->pkt_len - hdr_len;
 
 	*qw0 = rte_cpu_to_le_32(tso_len & IDPF_TXD_FLEX_CTX_MSS_RT_M) |
@@ -933,7 +934,8 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		tx_offload.tso_segsz = tx_pkt->tso_segsz;
 		/* Calculate the number of context descriptors needed. */
 		uint64_t cd_qw0, cd_qw1;
-		nb_ctx = idpf_set_tso_ctx(ol_flags, tx_pkt, tx_offload, &cd_qw0, &cd_qw1);
+		nb_ctx = idpf_set_tso_ctx(ol_flags, tx_pkt, &tx_offload, txq,
+					  NULL /* unused */, &cd_qw0, &cd_qw1);
 
 		/* Calculate the number of TX descriptors needed for
 		 * each packet. For TSO packets, use ci_calc_pkt_desc as
@@ -1339,167 +1341,8 @@ uint16_t
 idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			  uint16_t nb_pkts)
 {
-	volatile struct ci_tx_desc *txd;
-	volatile struct ci_tx_desc *txr;
-	union ci_tx_offload tx_offload = {0};
-	struct ci_tx_entry *txe, *txn;
-	struct ci_tx_entry *sw_ring;
-	struct ci_tx_queue *txq;
-	struct rte_mbuf *tx_pkt;
-	struct rte_mbuf *m_seg;
-	uint64_t buf_dma_addr;
-	uint32_t td_offset;
-	uint64_t ol_flags;
-	uint16_t tx_last;
-	uint16_t nb_used;
-	uint16_t nb_ctx;
-	uint16_t td_cmd;
-	uint16_t tx_id;
-	uint16_t nb_tx;
-	uint16_t slen;
-
-	nb_tx = 0;
-	txq = tx_queue;
-
-	if (unlikely(txq == NULL))
-		return nb_tx;
-
-	sw_ring = txq->sw_ring;
-	txr = txq->ci_tx_ring;
-	tx_id = txq->tx_tail;
-	txe = &sw_ring[tx_id];
-
-	/* Check if the descriptor ring needs to be cleaned. */
-	if (txq->nb_tx_free < txq->tx_free_thresh)
-		(void)ci_tx_xmit_cleanup(txq);
-
-	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
-		td_cmd = 0;
-		td_offset = 0;
-
-		tx_pkt = *tx_pkts++;
-		RTE_MBUF_PREFETCH_TO_FREE(txe->mbuf);
-
-		ol_flags = tx_pkt->ol_flags;
-		tx_offload.l2_len = tx_pkt->l2_len;
-		tx_offload.l3_len = tx_pkt->l3_len;
-		tx_offload.l4_len = tx_pkt->l4_len;
-		tx_offload.tso_segsz = tx_pkt->tso_segsz;
-		/* Calculate the number of context descriptors needed. */
-		uint64_t cd_qw0, cd_qw1;
-		nb_ctx = idpf_set_tso_ctx(ol_flags, tx_pkt, tx_offload, &cd_qw0, &cd_qw1);
-
-		/* The number of descriptors that must be allocated for
-		 * a packet. For TSO packets, use ci_calc_pkt_desc as
-		 * the mbuf data size might exceed max data size that hw allows
-		 * per tx desc.
-		 */
-		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
-		else
-			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
-		tx_last = (uint16_t)(tx_id + nb_used - 1);
-
-		/* Circular ring */
-		if (tx_last >= txq->nb_tx_desc)
-			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
-
-		TX_LOG(DEBUG, "port_id=%u queue_id=%u"
-		       " tx_first=%u tx_last=%u",
-		       txq->port_id, txq->queue_id, tx_id, tx_last);
-
-		if (nb_used > txq->nb_tx_free) {
-			if (ci_tx_xmit_cleanup(txq) != 0) {
-				if (nb_tx == 0)
-					return 0;
-				goto end_of_tx;
-			}
-			if (unlikely(nb_used > txq->tx_rs_thresh)) {
-				while (nb_used > txq->nb_tx_free) {
-					if (ci_tx_xmit_cleanup(txq) != 0) {
-						if (nb_tx == 0)
-							return 0;
-						goto end_of_tx;
-					}
-				}
-			}
-		}
-
-		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
-			td_cmd |= IDPF_TX_FLEX_DESC_CMD_CS_EN;
-
-		if (nb_ctx != 0) {
-			/* Setup TX context descriptor if required */
-			uint64_t *ctx_txd = RTE_CAST_PTR(uint64_t *, &txr[tx_id]);
-
-			txn = &sw_ring[txe->next_id];
-			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
-			if (txe->mbuf != NULL) {
-				rte_pktmbuf_free_seg(txe->mbuf);
-				txe->mbuf = NULL;
-			}
-
-			ctx_txd[0] = cd_qw0;
-			ctx_txd[1] = cd_qw1;
-
-			txe->last_id = tx_last;
-			tx_id = txe->next_id;
-			txe = txn;
-		}
-
-		m_seg = tx_pkt;
-		do {
-			txd = &txr[tx_id];
-			txn = &sw_ring[txe->next_id];
-
-			if (txe->mbuf != NULL)
-				rte_pktmbuf_free_seg(txe->mbuf);
-			txe->mbuf = m_seg;
-
-			/* Setup TX Descriptor */
-			slen = m_seg->data_len;
-			buf_dma_addr = rte_mbuf_data_iova(m_seg);
-			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
-				((uint64_t)td_cmd  << CI_TXD_QW1_CMD_S) |
-				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
-				((uint64_t)slen << CI_TXD_QW1_TX_BUF_SZ_S));
-
-			txe->last_id = tx_last;
-			tx_id = txe->next_id;
-			txe = txn;
-			m_seg = m_seg->next;
-		} while (m_seg);
-
-		/* The last packet data descriptor needs End Of Packet (EOP) */
-		td_cmd |= CI_TX_DESC_CMD_EOP;
-		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
-		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
-
-		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
-			TX_LOG(DEBUG, "Setting RS bit on TXD id="
-			       "%4u (port=%d queue=%d)",
-			       tx_last, txq->port_id, txq->queue_id);
-
-			td_cmd |= CI_TX_DESC_CMD_RS;
-
-			/* Update txq RS bit counters */
-			txq->nb_tx_used = 0;
-		}
-
-		txd->cmd_type_offset_bsz |= rte_cpu_to_le_16(td_cmd << CI_TXD_QW1_CMD_S);
-	}
-
-end_of_tx:
-	rte_wmb();
-
-	TX_LOG(DEBUG, "port_id=%u queue_id=%u tx_tail=%u nb_tx=%u",
-	       txq->port_id, txq->queue_id, tx_id, nb_tx);
-
-	IDPF_PCI_REG_WRITE(txq->qtx_tail, tx_id);
-	txq->tx_tail = tx_id;
-
-	return nb_tx;
+	return ci_xmit_pkts(tx_queue, tx_pkts, nb_pkts, CI_VLAN_IN_L2TAG1,
+			idpf_set_tso_ctx, NULL, NULL);
 }
 
 /* TX prep functions */
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 19/36] net/intel: avoid writing the final pkt descriptor twice
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (17 preceding siblings ...)
  2026-01-13 15:14   ` [PATCH v2 18/36] net/idpf: use common scalar Tx function Bruce Richardson
@ 2026-01-13 15:14   ` Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 20/36] eal: add macro for marking assumed alignment Bruce Richardson
                     ` (18 subsequent siblings)
  37 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-13 15:14 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

In the scalar datapath, there is a loop to handle multi-segment, and
multi-descriptor packets on Tx. After that loop, the end-of-packet bit
was written to the descriptor separately, meaning that for each
single-descriptor packet there were two writes to the second quad-word -
basically 3 x 64-bit writes rather than just 2. Adjusting the code to
compute the EOP bit inside the loop saves that extra write per packet
and so improves performance.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar_fns.h | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index 0dfe2060c0..bfe545826b 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -378,6 +378,10 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 				txn = &sw_ring[txe->next_id];
 			}
 
+			/* fill the last descriptor with End of Packet (EOP) bit */
+			if (m_seg->next == NULL)
+				td_cmd |= CI_TX_DESC_CMD_EOP;
+
 			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
 			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
 				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
@@ -390,21 +394,17 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			txe = txn;
 			m_seg = m_seg->next;
 		} while (m_seg);
-
-		/* fill the last descriptor with End of Packet (EOP) bit */
-		td_cmd |= CI_TX_DESC_CMD_EOP;
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
 		/* set RS bit on the last descriptor of one packet */
 		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
-			td_cmd |= CI_TX_DESC_CMD_RS;
+			txd->cmd_type_offset_bsz |=
+					rte_cpu_to_le_64(CI_TX_DESC_CMD_RS << CI_TXD_QW1_CMD_S);
 
 			/* Update txq RS bit counters */
 			txq->nb_tx_used = 0;
 		}
-		txd->cmd_type_offset_bsz |=
-				rte_cpu_to_le_64(((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S);
 
 		if (ts_fns != NULL)
 			ts_id = ts_fns->write_ts_desc(txq, tx_pkt, tx_id, ts_id);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 20/36] eal: add macro for marking assumed alignment
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (18 preceding siblings ...)
  2026-01-13 15:14   ` [PATCH v2 19/36] net/intel: avoid writing the final pkt descriptor twice Bruce Richardson
@ 2026-01-13 15:14   ` Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 21/36] net/intel: write descriptors using non-volatile pointers Bruce Richardson
                     ` (17 subsequent siblings)
  37 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-13 15:14 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

Provide a common DPDK macro for the gcc/clang builtin
__rte_assume_aligned to mark pointers as pointing to something with
known minimum alignment.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/eal/include/rte_common.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h
index 9e7d84f929..11d6f2bee1 100644
--- a/lib/eal/include/rte_common.h
+++ b/lib/eal/include/rte_common.h
@@ -121,6 +121,12 @@ extern "C" {
 #define __rte_aligned(a) __attribute__((__aligned__(a)))
 #endif
 
+#ifdef RTE_TOOLCHAIN_MSVC
+#define __rte_assume_aligned(ptr, align) (ptr)
+#else
+#define __rte_assume_aligned __builtin_assume_aligned
+#endif
+
 #ifdef RTE_ARCH_STRICT_ALIGN
 typedef uint64_t unaligned_uint64_t __rte_aligned(1);
 typedef uint32_t unaligned_uint32_t __rte_aligned(1);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 21/36] net/intel: write descriptors using non-volatile pointers
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (19 preceding siblings ...)
  2026-01-13 15:14   ` [PATCH v2 20/36] eal: add macro for marking assumed alignment Bruce Richardson
@ 2026-01-13 15:14   ` Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 22/36] net/intel: remove unnecessary flag clearing Bruce Richardson
                     ` (16 subsequent siblings)
  37 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-13 15:14 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

Use a non-volatile uint64_t pointer to store to the descriptor ring.
This will allow the compiler to optionally merge the stores as it sees
best.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar_fns.h | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index bfe545826b..7cb4d3efb9 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -184,6 +184,15 @@ struct ci_timesstamp_queue_fns {
 	write_ts_tail_t write_ts_tail;
 };
 
+static inline void
+write_txd(volatile void *txd, uint64_t qw0, uint64_t qw1)
+{
+	uint64_t *txd_qw = __rte_assume_aligned(RTE_CAST_PTR(void *, txd), 16);
+
+	txd_qw[0] = rte_cpu_to_le_64(qw0);
+	txd_qw[1] = rte_cpu_to_le_64(qw1);
+}
+
 static inline uint16_t
 ci_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
@@ -313,8 +322,7 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 				txe->mbuf = NULL;
 			}
 
-			ctx_txd[0] = cd_qw0;
-			ctx_txd[1] = cd_qw1;
+			write_txd(ctx_txd, cd_qw0, cd_qw1);
 
 			txe->last_id = tx_last;
 			tx_id = txe->next_id;
@@ -361,12 +369,12 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 
 			while ((ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) &&
 					unlikely(slen > CI_MAX_DATA_PER_TXD)) {
-				txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-				txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+				const uint64_t cmd_type_offset_bsz = CI_TX_DESC_DTYPE_DATA |
 					((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
 					((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
 					((uint64_t)CI_MAX_DATA_PER_TXD << CI_TXD_QW1_TX_BUF_SZ_S) |
-					((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
+					((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S);
+				write_txd(txd, buf_dma_addr, cmd_type_offset_bsz);
 
 				buf_dma_addr += CI_MAX_DATA_PER_TXD;
 				slen -= CI_MAX_DATA_PER_TXD;
@@ -382,12 +390,12 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			if (m_seg->next == NULL)
 				td_cmd |= CI_TX_DESC_CMD_EOP;
 
-			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+			const uint64_t cmd_type_offset_bsz = CI_TX_DESC_DTYPE_DATA |
 				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
 				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
 				((uint64_t)slen << CI_TXD_QW1_TX_BUF_SZ_S) |
-				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
+				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S);
+			write_txd(txd, buf_dma_addr, cmd_type_offset_bsz);
 
 			txe->last_id = tx_last;
 			tx_id = txe->next_id;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 22/36] net/intel: remove unnecessary flag clearing
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (20 preceding siblings ...)
  2026-01-13 15:14   ` [PATCH v2 21/36] net/intel: write descriptors using non-volatile pointers Bruce Richardson
@ 2026-01-13 15:14   ` Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 23/36] net/intel: mark mid-burst ring cleanup as unlikely Bruce Richardson
                     ` (15 subsequent siblings)
  37 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-13 15:14 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

When cleaning the Tx ring, there is no need to zero out the done flag
from the completed entry. That flag will be automatically cleared when
the descriptor is next written. This gives a small performance benefit.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar_fns.h | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index 7cb4d3efb9..68c05524b0 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -51,13 +51,6 @@ ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
 	else
 		nb_tx_to_clean = (uint16_t)(desc_to_clean_to - last_desc_cleaned);
 
-	/* The last descriptor to clean is done, so that means all the
-	 * descriptors from the last descriptor that was cleaned
-	 * up to the last descriptor with the RS bit set
-	 * are done. Only reset the threshold descriptor.
-	 */
-	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
-
 	/* Update the txq to reflect the last descriptor that was cleaned */
 	txq->last_desc_cleaned = desc_to_clean_to;
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 23/36] net/intel: mark mid-burst ring cleanup as unlikely
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (21 preceding siblings ...)
  2026-01-13 15:14   ` [PATCH v2 22/36] net/intel: remove unnecessary flag clearing Bruce Richardson
@ 2026-01-13 15:14   ` Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 24/36] net/intel: add special handling for single desc packets Bruce Richardson
                     ` (14 subsequent siblings)
  37 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-13 15:14 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

It should rarely be the case that we need to cleanup the descriptor ring
mid-burst, so mark as unlikely to help performance.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar_fns.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index 68c05524b0..58196dc02a 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -272,7 +272,7 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 		if (tx_last >= txq->nb_tx_desc)
 			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
 
-		if (nb_used > txq->nb_tx_free) {
+		if (unlikely(nb_used > txq->nb_tx_free)) {
 			if (ci_tx_xmit_cleanup(txq) != 0) {
 				if (nb_tx == 0)
 					return 0;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 24/36] net/intel: add special handling for single desc packets
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (22 preceding siblings ...)
  2026-01-13 15:14   ` [PATCH v2 23/36] net/intel: mark mid-burst ring cleanup as unlikely Bruce Richardson
@ 2026-01-13 15:14   ` Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 25/36] net/intel: use separate array for desc status tracking Bruce Richardson
                     ` (13 subsequent siblings)
  37 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-13 15:14 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

Within the scalar packets, add a shortcut path for packets that don't
use TSO and have only a single data descriptor.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar_fns.h | 26 ++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index 58196dc02a..3772cb4c3c 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -304,6 +304,31 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			ci_txd_enable_checksum(ol_flags, &td_cmd,
 						&td_offset, tx_offload);
 
+		/* special case for single descriptor packet, without TSO offload */
+		if (nb_used == 1 &&
+				(ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) == 0) {
+			txd = &ci_tx_ring[tx_id];
+			tx_id = txe->next_id;
+
+			if (txe->mbuf)
+				rte_pktmbuf_free_seg(txe->mbuf);
+			*txe = (struct ci_tx_entry){
+				.mbuf = tx_pkt, .last_id = tx_last, .next_id = tx_id
+			};
+
+			/* Setup TX Descriptor */
+			td_cmd |= CI_TX_DESC_CMD_EOP;
+			const uint64_t cmd_type_offset_bsz = CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
+				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+				((uint64_t)tx_pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S) |
+				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S);
+			write_txd(txd, rte_mbuf_data_iova(tx_pkt), cmd_type_offset_bsz);
+
+			txe = &sw_ring[tx_id];
+			goto end_pkt;
+		}
+
 		if (nb_ctx) {
 			/* Setup TX context descriptor if required */
 			uint64_t *ctx_txd = RTE_CAST_PTR(uint64_t *, &ci_tx_ring[tx_id]);
@@ -395,6 +420,7 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			txe = txn;
 			m_seg = m_seg->next;
 		} while (m_seg);
+end_pkt:
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 25/36] net/intel: use separate array for desc status tracking
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (23 preceding siblings ...)
  2026-01-13 15:14   ` [PATCH v2 24/36] net/intel: add special handling for single desc packets Bruce Richardson
@ 2026-01-13 15:14   ` Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 26/36] net/ixgbe: " Bruce Richardson
                     ` (12 subsequent siblings)
  37 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-13 15:14 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Vladimir Medvedkin, Anatoly Burakov,
	Jingjing Wu, Praveen Shetty

Rather than writing a last_id for each individual descriptor, we can
write one only for places where the "report status" (RS) bit is set,
i.e. the descriptors which will be written back when done. The method
used for marking what descriptors are free is also changed in the
process, even if the last descriptor with the "done" bits set is past
the expected point, we only track up to the expected point, and leave
the rest to be counted as freed next time. This means that we always
have the RS/DD bits set at fixed intervals, and we always track free
slots in units of the same tx_free_thresh intervals.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h             |  4 ++
 drivers/net/intel/common/tx_scalar_fns.h  | 62 ++++++++++-------------
 drivers/net/intel/i40e/i40e_rxtx.c        | 20 ++++++++
 drivers/net/intel/iavf/iavf_rxtx.c        | 19 +++++++
 drivers/net/intel/ice/ice_rxtx.c          | 20 ++++++++
 drivers/net/intel/idpf/idpf_common_rxtx.c |  7 +++
 drivers/net/intel/idpf/idpf_rxtx.c        | 13 +++++
 7 files changed, 111 insertions(+), 34 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 0d11daaab3..9b3f8385e6 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -126,6 +126,8 @@ struct ci_tx_queue {
 		struct ci_tx_entry *sw_ring; /* virtual address of SW ring */
 		struct ci_tx_entry_vec *sw_ring_vec;
 	};
+	/* Scalar TX path: Array tracking last_id at each RS threshold boundary */
+	uint16_t *rs_last_id;
 	uint16_t nb_tx_desc;           /* number of TX descriptors */
 	uint16_t tx_tail; /* current value of tail register */
 	uint16_t nb_tx_used; /* number of TX desc used since RS bit set */
@@ -139,6 +141,8 @@ struct ci_tx_queue {
 	uint16_t tx_free_thresh;
 	/* Number of TX descriptors to use before RS bit is set. */
 	uint16_t tx_rs_thresh;
+	/* Scalar TX path: log2 of tx_rs_thresh for efficient bit operations */
+	uint8_t log2_rs_thresh;
 	uint16_t port_id;  /* Device port identifier. */
 	uint16_t queue_id; /* TX queue index. */
 	uint16_t reg_idx;
diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index 3772cb4c3c..257772a73a 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -23,37 +23,25 @@
 static __rte_always_inline int
 ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
 {
-	struct ci_tx_entry *sw_ring = txq->sw_ring;
 	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
-	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
-	uint16_t nb_tx_desc = txq->nb_tx_desc;
-	uint16_t desc_to_clean_to;
-	uint16_t nb_tx_to_clean;
+	const uint16_t last_desc_cleaned = txq->last_desc_cleaned;
+	const uint16_t nb_tx_desc = txq->nb_tx_desc;
 
-	/* Determine the last descriptor needing to be cleaned */
-	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
-	if (desc_to_clean_to >= nb_tx_desc)
-		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
-
-	/* Check to make sure the last descriptor to clean is done */
-	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
+	const uint16_t rs_idx = (last_desc_cleaned == nb_tx_desc - 1) ?
+			0 :
+			(last_desc_cleaned + 1) >> txq->log2_rs_thresh;
+	uint16_t desc_to_clean_to = (rs_idx << txq->log2_rs_thresh) + (txq->tx_rs_thresh - 1);
 
 	/* Check if descriptor is done - all drivers use 0xF as done value in bits 3:0 */
-	if ((txd[desc_to_clean_to].cmd_type_offset_bsz & rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
-			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE)) {
+	if ((txd[txq->rs_last_id[rs_idx]].cmd_type_offset_bsz &
+			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
+				rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
 		/* Descriptor not yet processed by hardware */
 		return -1;
-	}
-
-	/* Figure out how many descriptors will be cleaned */
-	if (last_desc_cleaned > desc_to_clean_to)
-		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) + desc_to_clean_to);
-	else
-		nb_tx_to_clean = (uint16_t)(desc_to_clean_to - last_desc_cleaned);
 
 	/* Update the txq to reflect the last descriptor that was cleaned */
 	txq->last_desc_cleaned = desc_to_clean_to;
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
+	txq->nb_tx_free += txq->tx_rs_thresh;
 
 	return 0;
 }
@@ -232,6 +220,7 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 		uint16_t nb_ipsec = 0;
 		uint64_t ipsec_qw0 = 0, ipsec_qw1 = 0;
 		uint64_t cd_qw0, cd_qw1;
+		uint16_t pkt_rs_idx;
 		tx_pkt = *tx_pkts++;
 
 		td_cmd = CI_TX_DESC_CMD_ICRC;
@@ -272,6 +261,9 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 		if (tx_last >= txq->nb_tx_desc)
 			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
 
+		/* Track the RS threshold bucket at packet start */
+		pkt_rs_idx = (uint16_t)(tx_id >> txq->log2_rs_thresh);
+
 		if (unlikely(nb_used > txq->nb_tx_free)) {
 			if (ci_tx_xmit_cleanup(txq) != 0) {
 				if (nb_tx == 0)
@@ -312,10 +304,7 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 
 			if (txe->mbuf)
 				rte_pktmbuf_free_seg(txe->mbuf);
-			*txe = (struct ci_tx_entry){
-				.mbuf = tx_pkt, .last_id = tx_last, .next_id = tx_id
-			};
-
+			txe->mbuf = tx_pkt;
 			/* Setup TX Descriptor */
 			td_cmd |= CI_TX_DESC_CMD_EOP;
 			const uint64_t cmd_type_offset_bsz = CI_TX_DESC_DTYPE_DATA |
@@ -342,7 +331,6 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 
 			write_txd(ctx_txd, cd_qw0, cd_qw1);
 
-			txe->last_id = tx_last;
 			tx_id = txe->next_id;
 			txe = txn;
 		}
@@ -361,7 +349,6 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			ipsec_txd[0] = ipsec_qw0;
 			ipsec_txd[1] = ipsec_qw1;
 
-			txe->last_id = tx_last;
 			tx_id = txe->next_id;
 			txe = txn;
 		}
@@ -397,7 +384,6 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 				buf_dma_addr += CI_MAX_DATA_PER_TXD;
 				slen -= CI_MAX_DATA_PER_TXD;
 
-				txe->last_id = tx_last;
 				tx_id = txe->next_id;
 				txe = txn;
 				txd = &ci_tx_ring[tx_id];
@@ -415,7 +401,6 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S);
 			write_txd(txd, buf_dma_addr, cmd_type_offset_bsz);
 
-			txe->last_id = tx_last;
 			tx_id = txe->next_id;
 			txe = txn;
 			m_seg = m_seg->next;
@@ -424,13 +409,22 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
-		/* set RS bit on the last descriptor of one packet */
-		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
+		/* Check if packet crosses into a new RS threshold bucket.
+		 * The RS bit is set on the last descriptor when we move from one bucket to another.
+		 * For example, with tx_rs_thresh=32 and a 5-descriptor packet using slots 30-34:
+		 *   - pkt_rs_idx = 30 >> 5 = 0 (started in bucket 0)
+		 *   - tx_last = 34, so 35 >> 5 = 1 (next packet is in bucket 1)
+		 *   - Since 0 != 1, set RS bit on descriptor 34, and record rs_last_id[0] = 34
+		 */
+		uint16_t next_rs_idx = ((tx_last + 1) >> txq->log2_rs_thresh);
+
+		if (next_rs_idx != pkt_rs_idx) {
+			/* Packet crossed into a new bucket - set RS bit on last descriptor */
 			txd->cmd_type_offset_bsz |=
 					rte_cpu_to_le_64(CI_TX_DESC_CMD_RS << CI_TXD_QW1_CMD_S);
 
-			/* Update txq RS bit counters */
-			txq->nb_tx_used = 0;
+			/* Record the last descriptor ID for the bucket we're leaving */
+			txq->rs_last_id[pkt_rs_idx] = tx_last;
 		}
 
 		if (ts_fns != NULL)
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 0ccd8e8b2a..68074333aa 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -24,6 +24,7 @@
 #include <rte_ip.h>
 #include <rte_net.h>
 #include <rte_vect.h>
+#include <rte_bitops.h>
 
 #include "i40e_logs.h"
 #include "base/i40e_prototype.h"
@@ -2267,6 +2268,13 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 			     (int)queue_idx);
 		return I40E_ERR_PARAM;
 	}
+	if (!rte_is_power_of_2(tx_rs_thresh)) {
+		PMD_INIT_LOG(ERR, "tx_rs_thresh must be a power of 2. (tx_rs_thresh=%u port=%d queue=%d)",
+			     (unsigned int)tx_rs_thresh,
+			     (int)dev->data->port_id,
+			     (int)queue_idx);
+		return I40E_ERR_PARAM;
+	}
 	if ((tx_rs_thresh > 1) && (tx_conf->tx_thresh.wthresh != 0)) {
 		PMD_INIT_LOG(ERR, "TX WTHRESH must be set to 0 if "
 			     "tx_rs_thresh is greater than 1. "
@@ -2308,6 +2316,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	txq->mz = tz;
 	txq->nb_tx_desc = nb_desc;
 	txq->tx_rs_thresh = tx_rs_thresh;
+	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
 	txq->tx_free_thresh = tx_free_thresh;
 	txq->queue_id = queue_idx;
 	txq->reg_idx = reg_idx;
@@ -2331,6 +2340,16 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 		return -ENOMEM;
 	}
 
+	/* Allocate RS last_id tracking array */
+	uint16_t num_rs_buckets = nb_desc / tx_rs_thresh;
+	txq->rs_last_id = rte_zmalloc_socket(NULL, sizeof(txq->rs_last_id[0]) * num_rs_buckets,
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (txq->rs_last_id == NULL) {
+		i40e_tx_queue_release(txq);
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for RS last_id array");
+		return -ENOMEM;
+	}
+
 	i40e_reset_tx_queue(txq);
 	txq->q_set = TRUE;
 
@@ -2376,6 +2395,7 @@ i40e_tx_queue_release(void *txq)
 
 	ci_txq_release_all_mbufs(q, false);
 	rte_free(q->sw_ring);
+	rte_free(q->rs_last_id);
 	rte_memzone_free(q->mz);
 	rte_free(q);
 }
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index 8810d5bb63..b78818c02b 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -25,6 +25,7 @@
 #include <rte_ip.h>
 #include <rte_net.h>
 #include <rte_vect.h>
+#include <rte_bitops.h>
 #include <rte_vxlan.h>
 #include <rte_gtp.h>
 #include <rte_geneve.h>
@@ -208,6 +209,11 @@ check_tx_thresh(uint16_t nb_desc, uint16_t tx_rs_thresh,
 			     tx_rs_thresh, nb_desc);
 		return -EINVAL;
 	}
+	if (!rte_is_power_of_2(tx_rs_thresh)) {
+		PMD_INIT_LOG(ERR, "tx_rs_thresh must be a power of 2. (tx_rs_thresh=%u)",
+			     tx_rs_thresh);
+		return -EINVAL;
+	}
 
 	return 0;
 }
@@ -808,6 +814,7 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
 
 	txq->nb_tx_desc = nb_desc;
 	txq->tx_rs_thresh = tx_rs_thresh;
+	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
 	txq->tx_free_thresh = tx_free_thresh;
 	txq->queue_id = queue_idx;
 	txq->port_id = dev->data->port_id;
@@ -831,6 +838,17 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
 		return -ENOMEM;
 	}
 
+	/* Allocate RS last_id tracking array */
+	uint16_t num_rs_buckets = nb_desc / tx_rs_thresh;
+	txq->rs_last_id = rte_zmalloc_socket(NULL, sizeof(txq->rs_last_id[0]) * num_rs_buckets,
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (txq->rs_last_id == NULL) {
+		PMD_INIT_LOG(ERR, "Failed to allocate memory for RS last_id array");
+		rte_free(txq->sw_ring);
+		rte_free(txq);
+		return -ENOMEM;
+	}
+
 	/* Allocate TX hardware ring descriptors. */
 	ring_size = sizeof(struct ci_tx_desc) * IAVF_MAX_RING_DESC;
 	ring_size = RTE_ALIGN(ring_size, IAVF_DMA_MEM_ALIGN);
@@ -1055,6 +1073,7 @@ iavf_dev_tx_queue_release(struct rte_eth_dev *dev, uint16_t qid)
 
 	ci_txq_release_all_mbufs(q, q->use_ctx);
 	rte_free(q->sw_ring);
+	rte_free(q->rs_last_id);
 	rte_memzone_free(q->mz);
 	rte_free(q);
 }
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index c67d6223b3..0c85f9a592 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -5,6 +5,7 @@
 #include <ethdev_driver.h>
 #include <rte_net.h>
 #include <rte_vect.h>
+#include <rte_bitops.h>
 
 #include "ice_rxtx.h"
 #include "ice_rxtx_vec_common.h"
@@ -1579,6 +1580,13 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
 			     (int)queue_idx);
 		return -EINVAL;
 	}
+	if (!rte_is_power_of_2(tx_rs_thresh)) {
+		PMD_INIT_LOG(ERR, "tx_rs_thresh must be a power of 2. (tx_rs_thresh=%u port=%d queue=%d)",
+			     (unsigned int)tx_rs_thresh,
+			     (int)dev->data->port_id,
+			     (int)queue_idx);
+		return -EINVAL;
+	}
 	if (tx_rs_thresh > 1 && tx_conf->tx_thresh.wthresh != 0) {
 		PMD_INIT_LOG(ERR, "TX WTHRESH must be set to 0 if "
 			     "tx_rs_thresh is greater than 1. "
@@ -1621,6 +1629,7 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
 	txq->mz = tz;
 	txq->nb_tx_desc = nb_desc;
 	txq->tx_rs_thresh = tx_rs_thresh;
+	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
 	txq->tx_free_thresh = tx_free_thresh;
 	txq->queue_id = queue_idx;
 
@@ -1645,6 +1654,16 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
 		return -ENOMEM;
 	}
 
+	/* Allocate RS last_id tracking array */
+	uint16_t num_rs_buckets = nb_desc / tx_rs_thresh;
+	txq->rs_last_id = rte_zmalloc_socket(NULL, sizeof(txq->rs_last_id[0]) * num_rs_buckets,
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (txq->rs_last_id == NULL) {
+		ice_tx_queue_release(txq);
+		PMD_INIT_LOG(ERR, "Failed to allocate memory for RS last_id array");
+		return -ENOMEM;
+	}
+
 	if (vsi->type == ICE_VSI_PF && (offloads & RTE_ETH_TX_OFFLOAD_SEND_ON_TIMESTAMP)) {
 		if (hw->phy_model != ICE_PHY_E830) {
 			ice_tx_queue_release(txq);
@@ -1717,6 +1736,7 @@ ice_tx_queue_release(void *txq)
 
 	ci_txq_release_all_mbufs(q, false);
 	rte_free(q->sw_ring);
+	rte_free(q->rs_last_id);
 	if (q->tsq) {
 		rte_memzone_free(q->tsq->ts_mz);
 		rte_free(q->tsq);
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 81bc45f6ef..1d123f6350 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -5,6 +5,7 @@
 #include <eal_export.h>
 #include <rte_mbuf_dyn.h>
 #include <rte_errno.h>
+#include <rte_bitops.h>
 
 #include "idpf_common_rxtx.h"
 #include "idpf_common_device.h"
@@ -73,6 +74,11 @@ idpf_qc_tx_thresh_check(uint16_t nb_desc, uint16_t tx_rs_thresh,
 			tx_rs_thresh, nb_desc);
 		return -EINVAL;
 	}
+	if (!rte_is_power_of_2(tx_rs_thresh)) {
+		DRV_LOG(ERR, "tx_rs_thresh must be a power of 2. (tx_rs_thresh=%u)",
+			tx_rs_thresh);
+		return -EINVAL;
+	}
 
 	return 0;
 }
@@ -333,6 +339,7 @@ idpf_qc_tx_queue_release(void *txq)
 	}
 
 	ci_txq_release_all_mbufs(q, false);
+	rte_free(q->rs_last_id);
 	rte_free(q->sw_ring);
 	rte_memzone_free(q->mz);
 	rte_free(q);
diff --git a/drivers/net/intel/idpf/idpf_rxtx.c b/drivers/net/intel/idpf/idpf_rxtx.c
index e974eb44b0..5c2516f556 100644
--- a/drivers/net/intel/idpf/idpf_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_rxtx.c
@@ -437,6 +437,7 @@ idpf_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 
 	txq->nb_tx_desc = nb_desc;
 	txq->tx_rs_thresh = tx_rs_thresh;
+	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
 	txq->tx_free_thresh = tx_free_thresh;
 	txq->queue_id = vport->chunks_info.tx_start_qid + queue_idx;
 	txq->port_id = dev->data->port_id;
@@ -468,6 +469,15 @@ idpf_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 		goto err_sw_ring_alloc;
 	}
 
+	txq->rs_last_id = rte_zmalloc_socket("idpf tx rs_last_id",
+			sizeof(txq->rs_last_id[0]) * (nb_desc >> txq->log2_rs_thresh),
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (txq->rs_last_id == NULL) {
+		PMD_INIT_LOG(ERR, "Failed to allocate memory for TX RS tracking");
+		ret = -ENOMEM;
+		goto err_rs_last_id_alloc;
+	}
+
 	if (!is_splitq) {
 		txq->ci_tx_ring = mz->addr;
 		idpf_qc_single_tx_queue_reset(txq);
@@ -490,6 +500,9 @@ idpf_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 	return 0;
 
 err_complq_setup:
+	rte_free(txq->rs_last_id);
+err_rs_last_id_alloc:
+	rte_free(txq->sw_ring);
 err_sw_ring_alloc:
 	idpf_dma_zone_release(mz);
 err_mz_reserve:
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 26/36] net/ixgbe: use separate array for desc status tracking
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (24 preceding siblings ...)
  2026-01-13 15:14   ` [PATCH v2 25/36] net/intel: use separate array for desc status tracking Bruce Richardson
@ 2026-01-13 15:14   ` Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 27/36] net/intel: drop unused Tx queue used count Bruce Richardson
                     ` (11 subsequent siblings)
  37 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-13 15:14 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov, Vladimir Medvedkin

Due to significant differences in the ixgbe transmit descriptors, the
ixgbe driver does not use the common scalar Tx functionality. Update the
driver directly so its use of the rs_last_id array matches that of the
common Tx code.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/ixgbe/ixgbe_rxtx.c | 86 +++++++++++++++-------------
 1 file changed, 47 insertions(+), 39 deletions(-)

diff --git a/drivers/net/intel/ixgbe/ixgbe_rxtx.c b/drivers/net/intel/ixgbe/ixgbe_rxtx.c
index a7583c178a..3eeec220fd 100644
--- a/drivers/net/intel/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/intel/ixgbe/ixgbe_rxtx.c
@@ -43,6 +43,7 @@
 #include <rte_ip.h>
 #include <rte_net.h>
 #include <rte_vect.h>
+#include <rte_bitops.h>
 
 #include "ixgbe_logs.h"
 #include "base/ixgbe_api.h"
@@ -571,57 +572,35 @@ tx_desc_ol_flags_to_cmdtype(uint64_t ol_flags)
 static inline int
 ixgbe_xmit_cleanup(struct ci_tx_queue *txq)
 {
-	struct ci_tx_entry *sw_ring = txq->sw_ring;
 	volatile union ixgbe_adv_tx_desc *txr = txq->ixgbe_tx_ring;
-	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
-	uint16_t nb_tx_desc = txq->nb_tx_desc;
-	uint16_t desc_to_clean_to;
-	uint16_t nb_tx_to_clean;
-	uint32_t status;
+	const uint16_t last_desc_cleaned = txq->last_desc_cleaned;
+	const uint16_t nb_tx_desc = txq->nb_tx_desc;
 
-	/* Determine the last descriptor needing to be cleaned */
-	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
-	if (desc_to_clean_to >= nb_tx_desc)
-		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
+	const uint16_t rs_idx = (last_desc_cleaned == nb_tx_desc - 1) ?
+			0 :
+			(last_desc_cleaned + 1) >> txq->log2_rs_thresh;
+	uint16_t desc_to_clean_to = (rs_idx << txq->log2_rs_thresh) + (txq->tx_rs_thresh - 1);
 
-	/* Check to make sure the last descriptor to clean is done */
-	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-	status = txr[desc_to_clean_to].wb.status;
+	uint32_t status = txr[txq->rs_last_id[rs_idx]].wb.status;
 	if (!(status & rte_cpu_to_le_32(IXGBE_TXD_STAT_DD))) {
 		PMD_TX_LOG(DEBUG,
 			   "TX descriptor %4u is not done"
 			   "(port=%d queue=%d)",
-			   desc_to_clean_to,
+			   txq->rs_last_id[rs_idx],
 			   txq->port_id, txq->queue_id);
 		/* Failed to clean any descriptors, better luck next time */
 		return -(1);
 	}
 
-	/* Figure out how many descriptors will be cleaned */
-	if (last_desc_cleaned > desc_to_clean_to)
-		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) +
-							desc_to_clean_to);
-	else
-		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
-						last_desc_cleaned);
-
 	PMD_TX_LOG(DEBUG,
 		   "Cleaning %4u TX descriptors: %4u to %4u "
 		   "(port=%d queue=%d)",
-		   nb_tx_to_clean, last_desc_cleaned, desc_to_clean_to,
+		   txq->tx_rs_thresh, last_desc_cleaned, desc_to_clean_to,
 		   txq->port_id, txq->queue_id);
 
-	/*
-	 * The last descriptor to clean is done, so that means all the
-	 * descriptors from the last descriptor that was cleaned
-	 * up to the last descriptor with the RS bit set
-	 * are done. Only reset the threshold descriptor.
-	 */
-	txr[desc_to_clean_to].wb.status = 0;
-
 	/* Update the txq to reflect the last descriptor that was cleaned */
 	txq->last_desc_cleaned = desc_to_clean_to;
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
+	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + txq->tx_rs_thresh);
 
 	/* No Error */
 	return 0;
@@ -749,6 +728,9 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		if (tx_last >= txq->nb_tx_desc)
 			tx_last = (uint16_t) (tx_last - txq->nb_tx_desc);
 
+		/* Track the RS threshold bucket at packet start */
+		uint16_t pkt_rs_idx = (uint16_t)(tx_id >> txq->log2_rs_thresh);
+
 		PMD_TX_LOG(DEBUG, "port_id=%u queue_id=%u pktlen=%u"
 			   " tx_first=%u tx_last=%u",
 			   (unsigned) txq->port_id,
@@ -876,7 +858,6 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 					tx_offload,
 					rte_security_dynfield(tx_pkt));
 
-				txe->last_id = tx_last;
 				tx_id = txe->next_id;
 				txe = txn;
 			}
@@ -922,7 +903,6 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 				rte_cpu_to_le_32(cmd_type_len | slen);
 			txd->read.olinfo_status =
 				rte_cpu_to_le_32(olinfo_status);
-			txe->last_id = tx_last;
 			tx_id = txe->next_id;
 			txe = txn;
 			m_seg = m_seg->next;
@@ -935,8 +915,18 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
-		/* Set RS bit only on threshold packets' last descriptor */
-		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
+		/*
+		 * Check if packet crosses into a new RS threshold bucket.
+		 * The RS bit is set on the last descriptor when we move from one bucket to another.
+		 * For example, with tx_rs_thresh=32 and a 5-descriptor packet using slots 30-34:
+		 *   - pkt_rs_idx = 30 >> 5 = 0 (started in bucket 0)
+		 *   - tx_last = 34, so 35 >> 5 = 1 (next packet is in bucket 1)
+		 *   - Since 0 != 1, set RS bit on descriptor 34, and record rs_last_id[0] = 34
+		 */
+		uint16_t next_rs_idx = ((tx_last + 1) >> txq->log2_rs_thresh);
+
+		if (next_rs_idx != pkt_rs_idx) {
+			/* Packet crossed into a new bucket - set RS bit on last descriptor */
 			PMD_TX_LOG(DEBUG,
 				   "Setting RS bit on TXD id="
 				   "%4u (port=%d queue=%d)",
@@ -944,9 +934,8 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 			cmd_type_len |= IXGBE_TXD_CMD_RS;
 
-			/* Update txq RS bit counters */
-			txq->nb_tx_used = 0;
-			txp = NULL;
+			/* Record the last descriptor ID for the bucket we're leaving */
+			txq->rs_last_id[pkt_rs_idx] = tx_last;
 		} else
 			txp = txd;
 
@@ -2521,6 +2510,7 @@ ixgbe_tx_queue_release(struct ci_tx_queue *txq)
 	if (txq != NULL && txq->ops != NULL) {
 		ci_txq_release_all_mbufs(txq, false);
 		txq->ops->free_swring(txq);
+		rte_free(txq->rs_last_id);
 		rte_memzone_free(txq->mz);
 		rte_free(txq);
 	}
@@ -2825,6 +2815,13 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev,
 			     (int)dev->data->port_id, (int)queue_idx);
 		return -(EINVAL);
 	}
+	if (!rte_is_power_of_2(tx_rs_thresh)) {
+		PMD_INIT_LOG(ERR, "tx_rs_thresh must be a power of 2. (tx_rs_thresh=%u port=%d queue=%d)",
+			     (unsigned int)tx_rs_thresh,
+			     (int)dev->data->port_id,
+			     (int)queue_idx);
+		return -(EINVAL);
+	}
 
 	/*
 	 * If rs_bit_thresh is greater than 1, then TX WTHRESH should be
@@ -2870,6 +2867,7 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	txq->mz = tz;
 	txq->nb_tx_desc = nb_desc;
 	txq->tx_rs_thresh = tx_rs_thresh;
+	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
 	txq->tx_free_thresh = tx_free_thresh;
 	txq->pthresh = tx_conf->tx_thresh.pthresh;
 	txq->hthresh = tx_conf->tx_thresh.hthresh;
@@ -2911,6 +2909,16 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	PMD_INIT_LOG(DEBUG, "sw_ring=%p hw_ring=%p dma_addr=0x%"PRIx64,
 		     txq->sw_ring, txq->ixgbe_tx_ring, txq->tx_ring_dma);
 
+	/* Allocate RS last_id tracking array */
+	uint16_t num_rs_buckets = nb_desc / tx_rs_thresh;
+	txq->rs_last_id = rte_zmalloc_socket(NULL, sizeof(txq->rs_last_id[0]) * num_rs_buckets,
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (txq->rs_last_id == NULL) {
+		ixgbe_tx_queue_release(txq);
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for RS last_id array");
+		return -ENOMEM;
+	}
+
 	/* set up vector or scalar TX function as appropriate */
 	ixgbe_set_tx_function(dev, txq);
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 27/36] net/intel: drop unused Tx queue used count
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (25 preceding siblings ...)
  2026-01-13 15:14   ` [PATCH v2 26/36] net/ixgbe: " Bruce Richardson
@ 2026-01-13 15:14   ` Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 28/36] net/intel: remove index for tracking end of packet Bruce Richardson
                     ` (10 subsequent siblings)
  37 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-13 15:14 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Vladimir Medvedkin, Anatoly Burakov,
	Jingjing Wu, Praveen Shetty

Since drivers now track the setting of the RS bit based on fixed
thresholds rather than after a fixed number of descriptors, we no longer
need to track the number of descriptors used from one call to another.
Therefore we can remove the tx_used value in the Tx queue structure.

This value was still being used inside the IDPF splitq scalar code,
however, the ipdf driver-specific section of the Tx queue structure also
had an rs_compl_count value that was only used for the vector code
paths, so we can use it to replace the old tx_used value in the scalar
path.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h                   | 1 -
 drivers/net/intel/common/tx_scalar_fns.h        | 1 -
 drivers/net/intel/i40e/i40e_rxtx.c              | 1 -
 drivers/net/intel/iavf/iavf_rxtx.c              | 1 -
 drivers/net/intel/ice/ice_dcf_ethdev.c          | 1 -
 drivers/net/intel/ice/ice_rxtx.c                | 1 -
 drivers/net/intel/idpf/idpf_common_rxtx.c       | 8 +++-----
 drivers/net/intel/ixgbe/ixgbe_rxtx.c            | 8 --------
 drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c | 1 -
 9 files changed, 3 insertions(+), 20 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 9b3f8385e6..3976766f06 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -130,7 +130,6 @@ struct ci_tx_queue {
 	uint16_t *rs_last_id;
 	uint16_t nb_tx_desc;           /* number of TX descriptors */
 	uint16_t tx_tail; /* current value of tail register */
-	uint16_t nb_tx_used; /* number of TX desc used since RS bit set */
 	/* index to last TX descriptor to have been cleaned */
 	uint16_t last_desc_cleaned;
 	/* Total number of TX descriptors ready to be allocated. */
diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index 257772a73a..3a65797c5f 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -406,7 +406,6 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			m_seg = m_seg->next;
 		} while (m_seg);
 end_pkt:
-		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
 		/* Check if packet crosses into a new RS threshold bucket.
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 68074333aa..812a9a7a83 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -2630,7 +2630,6 @@ i40e_reset_tx_queue(struct ci_tx_queue *txq)
 	txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 
 	txq->last_desc_cleaned = (uint16_t)(txq->nb_tx_desc - 1);
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_desc - 1);
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index b78818c02b..5227227b5a 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -302,7 +302,6 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	}
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 
 	txq->last_desc_cleaned = txq->nb_tx_desc - 1;
 	txq->nb_tx_free = txq->nb_tx_desc - 1;
diff --git a/drivers/net/intel/ice/ice_dcf_ethdev.c b/drivers/net/intel/ice/ice_dcf_ethdev.c
index 4ceecc15c6..02a23629d6 100644
--- a/drivers/net/intel/ice/ice_dcf_ethdev.c
+++ b/drivers/net/intel/ice/ice_dcf_ethdev.c
@@ -414,7 +414,6 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	}
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 
 	txq->last_desc_cleaned = txq->nb_tx_desc - 1;
 	txq->nb_tx_free = txq->nb_tx_desc - 1;
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 0c85f9a592..3c1e1d762e 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -1130,7 +1130,6 @@ ice_reset_tx_queue(struct ci_tx_queue *txq)
 	txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 
 	txq->last_desc_cleaned = (uint16_t)(txq->nb_tx_desc - 1);
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_desc - 1);
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 1d123f6350..b36e29c8d2 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -224,7 +224,6 @@ idpf_qc_split_tx_descq_reset(struct ci_tx_queue *txq)
 	}
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 
 	/* Use this as next to clean for split desc queue */
 	txq->last_desc_cleaned = 0;
@@ -284,7 +283,6 @@ idpf_qc_single_tx_queue_reset(struct ci_tx_queue *txq)
 	}
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 
 	txq->last_desc_cleaned = txq->nb_tx_desc - 1;
 	txq->nb_tx_free = txq->nb_tx_desc - 1;
@@ -993,12 +991,12 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txd->qw1.cmd_dtype |= IDPF_TXD_FLEX_FLOW_CMD_EOP;
 
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
-		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
+		txq->rs_compl_count += nb_used;
 
-		if (txq->nb_tx_used >= 32) {
+		if (txq->rs_compl_count >= 32) {
 			txd->qw1.cmd_dtype |= IDPF_TXD_FLEX_FLOW_CMD_RE;
 			/* Update txq RE bit counters */
-			txq->nb_tx_used = 0;
+			txq->rs_compl_count = 0;
 		}
 	}
 
diff --git a/drivers/net/intel/ixgbe/ixgbe_rxtx.c b/drivers/net/intel/ixgbe/ixgbe_rxtx.c
index 3eeec220fd..6b8ff20f61 100644
--- a/drivers/net/intel/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/intel/ixgbe/ixgbe_rxtx.c
@@ -708,12 +708,6 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		 */
 		nb_used = (uint16_t)(tx_pkt->nb_segs + new_ctx);
 
-		if (txp != NULL &&
-				nb_used + txq->nb_tx_used >= txq->tx_rs_thresh)
-			/* set RS on the previous packet in the burst */
-			txp->read.cmd_type_len |=
-				rte_cpu_to_le_32(IXGBE_TXD_CMD_RS);
-
 		/*
 		 * The number of descriptors that must be allocated for a
 		 * packet is the number of segments of that packet, plus 1
@@ -912,7 +906,6 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		 * The last packet data descriptor needs End Of Packet (EOP)
 		 */
 		cmd_type_len |= IXGBE_TXD_CMD_EOP;
-		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
 		/*
@@ -2551,7 +2544,6 @@ ixgbe_reset_tx_queue(struct ci_tx_queue *txq)
 	txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 	/*
 	 * Always allow 1 descriptor to be un-allocated to avoid
 	 * a H/W race condition
diff --git a/drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c b/drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c
index eb7c79eaf9..63c7cb50d3 100644
--- a/drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c
+++ b/drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c
@@ -47,7 +47,6 @@ ixgbe_reset_tx_queue_vec(struct ci_tx_queue *txq)
 	txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 	/*
 	 * Always allow 1 descriptor to be un-allocated to avoid
 	 * a H/W race condition
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 28/36] net/intel: remove index for tracking end of packet
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (26 preceding siblings ...)
  2026-01-13 15:14   ` [PATCH v2 27/36] net/intel: drop unused Tx queue used count Bruce Richardson
@ 2026-01-13 15:14   ` Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 29/36] net/intel: merge ring writes in simple Tx for ice and i40e Bruce Richardson
                     ` (9 subsequent siblings)
  37 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-13 15:14 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Vladimir Medvedkin, Anatoly Burakov,
	Jingjing Wu, Praveen Shetty

The last_id value in each tx_sw_queue entry was no longer used in the
datapath, remove it and its initialization. For the function releasing
packets back, rather than relying on "last_id" to identify end of
packet, instead check for the next pointer being NULL.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h             | 1 -
 drivers/net/intel/i40e/i40e_rxtx.c        | 8 +++-----
 drivers/net/intel/iavf/iavf_rxtx.c        | 9 ++++-----
 drivers/net/intel/ice/ice_dcf_ethdev.c    | 1 -
 drivers/net/intel/ice/ice_rxtx.c          | 9 ++++-----
 drivers/net/intel/idpf/idpf_common_rxtx.c | 2 --
 drivers/net/intel/ixgbe/ixgbe_rxtx.c      | 9 ++++-----
 7 files changed, 15 insertions(+), 24 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 3976766f06..2d3626cbda 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -104,7 +104,6 @@ struct ci_tx_queue;
 struct ci_tx_entry {
 	struct rte_mbuf *mbuf; /* mbuf associated with TX desc, if any. */
 	uint16_t next_id; /* Index of next descriptor in ring. */
-	uint16_t last_id; /* Index of last scattered descriptor. */
 };
 
 /**
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 812a9a7a83..985d84c0f6 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -2521,14 +2521,13 @@ i40e_tx_done_cleanup_full(struct ci_tx_queue *txq,
 			pkt_cnt < free_cnt &&
 			tx_id != tx_last; i++) {
 			if (swr_ring[tx_id].mbuf != NULL) {
-				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
-				swr_ring[tx_id].mbuf = NULL;
-
 				/*
 				 * last segment in the packet,
 				 * increment packet count
 				 */
-				pkt_cnt += (swr_ring[tx_id].last_id == tx_id);
+				pkt_cnt += (swr_ring[tx_id].mbuf->next == NULL) ? 1 : 0;
+				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
+				swr_ring[tx_id].mbuf = NULL;
 			}
 
 			tx_id = swr_ring[tx_id].next_id;
@@ -2621,7 +2620,6 @@ i40e_reset_tx_queue(struct ci_tx_queue *txq)
 		txd->cmd_type_offset_bsz =
 			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
-		txe[i].last_id = i;
 		txe[prev].next_id = i;
 		prev = i;
 	}
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index 5227227b5a..5a4fea03ac 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -296,7 +296,6 @@ reset_tx_queue(struct ci_tx_queue *txq)
 		txq->ci_tx_ring[i].cmd_type_offset_bsz =
 			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
-		txe[i].last_id = i;
 		txe[prev].next_id = i;
 		prev = i;
 	}
@@ -3965,14 +3964,14 @@ iavf_tx_done_cleanup_full(struct ci_tx_queue *txq,
 	while (pkt_cnt < free_cnt) {
 		do {
 			if (swr_ring[tx_id].mbuf != NULL) {
-				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
-				swr_ring[tx_id].mbuf = NULL;
-
 				/*
 				 * last segment in the packet,
 				 * increment packet count
 				 */
-				pkt_cnt += (swr_ring[tx_id].last_id == tx_id);
+				pkt_cnt += (swr_ring[tx_id].mbuf->next == NULL) ? 1 : 0;
+				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
+				swr_ring[tx_id].mbuf = NULL;
+
 			}
 
 			tx_id = swr_ring[tx_id].next_id;
diff --git a/drivers/net/intel/ice/ice_dcf_ethdev.c b/drivers/net/intel/ice/ice_dcf_ethdev.c
index 02a23629d6..abd7875e7b 100644
--- a/drivers/net/intel/ice/ice_dcf_ethdev.c
+++ b/drivers/net/intel/ice/ice_dcf_ethdev.c
@@ -408,7 +408,6 @@ reset_tx_queue(struct ci_tx_queue *txq)
 		txq->ci_tx_ring[i].cmd_type_offset_bsz =
 			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
-		txe[i].last_id = i;
 		txe[prev].next_id = i;
 		prev = i;
 	}
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 3c1e1d762e..7261c07265 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -1121,7 +1121,6 @@ ice_reset_tx_queue(struct ci_tx_queue *txq)
 		txd->cmd_type_offset_bsz =
 			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
-		txe[i].last_id = i;
 		txe[prev].next_id = i;
 		prev = i;
 	}
@@ -3186,14 +3185,14 @@ ice_tx_done_cleanup_full(struct ci_tx_queue *txq,
 			pkt_cnt < free_cnt &&
 			tx_id != tx_last; i++) {
 			if (swr_ring[tx_id].mbuf != NULL) {
-				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
-				swr_ring[tx_id].mbuf = NULL;
-
 				/*
 				 * last segment in the packet,
 				 * increment packet count
 				 */
-				pkt_cnt += (swr_ring[tx_id].last_id == tx_id);
+				pkt_cnt += swr_ring[tx_id].mbuf->next == NULL ? 1 : 0;
+				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
+				swr_ring[tx_id].mbuf = NULL;
+
 			}
 
 			tx_id = swr_ring[tx_id].next_id;
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index b36e29c8d2..781310e564 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -218,7 +218,6 @@ idpf_qc_split_tx_descq_reset(struct ci_tx_queue *txq)
 	prev = (uint16_t)(txq->sw_nb_desc - 1);
 	for (i = 0; i < txq->sw_nb_desc; i++) {
 		txe[i].mbuf = NULL;
-		txe[i].last_id = i;
 		txe[prev].next_id = i;
 		prev = i;
 	}
@@ -277,7 +276,6 @@ idpf_qc_single_tx_queue_reset(struct ci_tx_queue *txq)
 		txq->ci_tx_ring[i].cmd_type_offset_bsz =
 			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
-		txe[i].last_id = i;
 		txe[prev].next_id = i;
 		prev = i;
 	}
diff --git a/drivers/net/intel/ixgbe/ixgbe_rxtx.c b/drivers/net/intel/ixgbe/ixgbe_rxtx.c
index 6b8ff20f61..5f4bee4f2f 100644
--- a/drivers/net/intel/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/intel/ixgbe/ixgbe_rxtx.c
@@ -2407,14 +2407,14 @@ ixgbe_tx_done_cleanup_full(struct ci_tx_queue *txq, uint32_t free_cnt)
 			pkt_cnt < free_cnt &&
 			tx_id != tx_last; i++) {
 			if (swr_ring[tx_id].mbuf != NULL) {
-				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
-				swr_ring[tx_id].mbuf = NULL;
-
 				/*
 				 * last segment in the packet,
 				 * increment packet count
 				 */
-				pkt_cnt += (swr_ring[tx_id].last_id == tx_id);
+				pkt_cnt += swr_ring[tx_id].mbuf->next == NULL ? 1 : 0;
+				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
+				swr_ring[tx_id].mbuf = NULL;
+
 			}
 
 			tx_id = swr_ring[tx_id].next_id;
@@ -2535,7 +2535,6 @@ ixgbe_reset_tx_queue(struct ci_tx_queue *txq)
 
 		txd->wb.status = rte_cpu_to_le_32(IXGBE_TXD_STAT_DD);
 		txe[i].mbuf = NULL;
-		txe[i].last_id = i;
 		txe[prev].next_id = i;
 		prev = i;
 	}
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 29/36] net/intel: merge ring writes in simple Tx for ice and i40e
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (27 preceding siblings ...)
  2026-01-13 15:14   ` [PATCH v2 28/36] net/intel: remove index for tracking end of packet Bruce Richardson
@ 2026-01-13 15:14   ` Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 30/36] net/intel: consolidate ice and i40e buffer free function Bruce Richardson
                     ` (8 subsequent siblings)
  37 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-13 15:14 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

The ice and i40e drivers have identical code for writing ring entries in
the simple Tx path, so merge in the descriptor writing code.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h                 |  6 ++
 drivers/net/intel/common/tx_scalar_fns.h      | 60 ++++++++++++++
 drivers/net/intel/i40e/i40e_rxtx.c            | 79 +------------------
 drivers/net/intel/i40e/i40e_rxtx.h            |  3 -
 .../net/intel/i40e/i40e_rxtx_vec_altivec.c    |  4 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c   |  4 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c |  4 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_neon.c   |  4 +-
 drivers/net/intel/ice/ice_rxtx.c              | 69 +---------------
 drivers/net/intel/ice/ice_rxtx.h              |  2 -
 drivers/net/intel/ice/ice_rxtx_vec_avx2.c     |  4 +-
 drivers/net/intel/ice/ice_rxtx_vec_avx512.c   |  4 +-
 12 files changed, 86 insertions(+), 157 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 2d3626cbda..502b3f2032 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -63,6 +63,12 @@ enum ci_tx_l2tag1_field {
 /* Common maximum data per TX descriptor */
 #define CI_MAX_DATA_PER_TXD     (CI_TXD_QW1_TX_BUF_SZ_M >> CI_TXD_QW1_TX_BUF_SZ_S)
 
+/* Common TX maximum burst size for chunked transmission in simple paths */
+#define CI_TX_MAX_BURST 32
+
+/* Common TX descriptor command flags for simple transmit */
+#define CI_TX_DESC_CMD_DEFAULT (CI_TX_DESC_CMD_ICRC | CI_TX_DESC_CMD_EOP)
+
 /* Checksum offload mask to identify packets requesting offload */
 #define CI_TX_CKSUM_OFFLOAD_MASK (RTE_MBUF_F_TX_IP_CKSUM |		 \
 				   RTE_MBUF_F_TX_L4_MASK |		 \
diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index 3a65797c5f..0d64a63e16 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -12,6 +12,66 @@
 /* depends on common Tx definitions. */
 #include "tx.h"
 
+/* Populate 4 descriptors with data from 4 mbufs */
+static inline void
+ci_tx_fill_hw_ring_tx4(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
+{
+	uint64_t dma_addr;
+	uint32_t i;
+
+	for (i = 0; i < 4; i++, txdp++, pkts++) {
+		dma_addr = rte_mbuf_data_iova(*pkts);
+		txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
+		txdp->cmd_type_offset_bsz =
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)CI_TX_DESC_CMD_DEFAULT << CI_TXD_QW1_CMD_S) |
+				((uint64_t)(*pkts)->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
+	}
+}
+
+/* Populate 1 descriptor with data from 1 mbuf */
+static inline void
+ci_tx_fill_hw_ring_tx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
+{
+	uint64_t dma_addr;
+
+	dma_addr = rte_mbuf_data_iova(*pkts);
+	txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
+	txdp->cmd_type_offset_bsz =
+		rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+			((uint64_t)CI_TX_DESC_CMD_DEFAULT << CI_TXD_QW1_CMD_S) |
+			((uint64_t)(*pkts)->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
+}
+
+/* Fill hardware descriptor ring with mbuf data */
+static inline void
+ci_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
+		   uint16_t nb_pkts)
+{
+	volatile struct ci_tx_desc *txdp = &txq->ci_tx_ring[txq->tx_tail];
+	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
+	const int N_PER_LOOP = 4;
+	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
+	int mainpart, leftover;
+	int i, j;
+
+	mainpart = nb_pkts & ((uint32_t)~N_PER_LOOP_MASK);
+	leftover = nb_pkts & ((uint32_t)N_PER_LOOP_MASK);
+	for (i = 0; i < mainpart; i += N_PER_LOOP) {
+		for (j = 0; j < N_PER_LOOP; ++j)
+			(txep + i + j)->mbuf = *(pkts + i + j);
+		ci_tx_fill_hw_ring_tx4(txdp + i, pkts + i);
+	}
+
+	if (unlikely(leftover > 0)) {
+		for (i = 0; i < leftover; ++i) {
+			(txep + mainpart + i)->mbuf = *(pkts + mainpart + i);
+			ci_tx_fill_hw_ring_tx1(txdp + mainpart + i,
+					       pkts + mainpart + i);
+		}
+	}
+}
+
 /*
  * Common transmit descriptor cleanup function for Intel drivers.
  * Used by ice, i40e, iavf, and idpf drivers.
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 985d84c0f6..cb91eeeab2 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -311,19 +311,6 @@ i40e_parse_tunneling_params(uint64_t ol_flags,
 		*cd_tunneling |= I40E_TXD_CTX_QW0_L4T_CS_MASK;
 }
 
-/* Construct the tx flags */
-static inline uint64_t
-i40e_build_ctob(uint32_t td_cmd,
-		uint32_t td_offset,
-		unsigned int size,
-		uint32_t td_tag)
-{
-	return rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
-			((uint64_t)td_cmd  << CI_TXD_QW1_CMD_S) |
-			((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
-			((uint64_t)size  << CI_TXD_QW1_TX_BUF_SZ_S) |
-			((uint64_t)td_tag  << CI_TXD_QW1_L2TAG1_S));
-}
 
 static inline int
 #ifdef RTE_LIBRTE_I40E_RX_ALLOW_BULK_ALLOC
@@ -1079,64 +1066,6 @@ i40e_tx_free_bufs(struct ci_tx_queue *txq)
 	return txq->tx_rs_thresh;
 }
 
-/* Populate 4 descriptors with data from 4 mbufs */
-static inline void
-tx4(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
-{
-	uint64_t dma_addr;
-	uint32_t i;
-
-	for (i = 0; i < 4; i++, txdp++, pkts++) {
-		dma_addr = rte_mbuf_data_iova(*pkts);
-		txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
-		txdp->cmd_type_offset_bsz =
-			i40e_build_ctob((uint32_t)I40E_TD_CMD, 0,
-					(*pkts)->data_len, 0);
-	}
-}
-
-/* Populate 1 descriptor with data from 1 mbuf */
-static inline void
-tx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
-{
-	uint64_t dma_addr;
-
-	dma_addr = rte_mbuf_data_iova(*pkts);
-	txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
-	txdp->cmd_type_offset_bsz =
-		i40e_build_ctob((uint32_t)I40E_TD_CMD, 0,
-				(*pkts)->data_len, 0);
-}
-
-/* Fill hardware descriptor ring with mbuf data */
-static inline void
-i40e_tx_fill_hw_ring(struct ci_tx_queue *txq,
-		     struct rte_mbuf **pkts,
-		     uint16_t nb_pkts)
-{
-	volatile struct ci_tx_desc *txdp = &txq->ci_tx_ring[txq->tx_tail];
-	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
-	const int N_PER_LOOP = 4;
-	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
-	int mainpart, leftover;
-	int i, j;
-
-	mainpart = (nb_pkts & ((uint32_t) ~N_PER_LOOP_MASK));
-	leftover = (nb_pkts & ((uint32_t)  N_PER_LOOP_MASK));
-	for (i = 0; i < mainpart; i += N_PER_LOOP) {
-		for (j = 0; j < N_PER_LOOP; ++j) {
-			(txep + i + j)->mbuf = *(pkts + i + j);
-		}
-		tx4(txdp + i, pkts + i);
-	}
-	if (unlikely(leftover > 0)) {
-		for (i = 0; i < leftover; ++i) {
-			(txep + mainpart + i)->mbuf = *(pkts + mainpart + i);
-			tx1(txdp + mainpart + i, pkts + mainpart + i);
-		}
-	}
-}
-
 static inline uint16_t
 tx_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
@@ -1161,7 +1090,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
 	if ((txq->tx_tail + nb_pkts) > txq->nb_tx_desc) {
 		n = (uint16_t)(txq->nb_tx_desc - txq->tx_tail);
-		i40e_tx_fill_hw_ring(txq, tx_pkts, n);
+		ci_tx_fill_hw_ring(txq, tx_pkts, n);
 		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
@@ -1169,7 +1098,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	}
 
 	/* Fill hardware descriptor ring with mbuf data */
-	i40e_tx_fill_hw_ring(txq, tx_pkts + n, (uint16_t)(nb_pkts - n));
+	ci_tx_fill_hw_ring(txq, tx_pkts + n, (uint16_t)(nb_pkts - n));
 	txq->tx_tail = (uint16_t)(txq->tx_tail + (nb_pkts - n));
 
 	/* Determine if RS bit needs to be set */
@@ -1198,13 +1127,13 @@ i40e_xmit_pkts_simple(void *tx_queue,
 {
 	uint16_t nb_tx = 0;
 
-	if (likely(nb_pkts <= I40E_TX_MAX_BURST))
+	if (likely(nb_pkts <= CI_TX_MAX_BURST))
 		return tx_xmit_pkts((struct ci_tx_queue *)tx_queue,
 						tx_pkts, nb_pkts);
 
 	while (nb_pkts) {
 		uint16_t ret, num = (uint16_t)RTE_MIN(nb_pkts,
-						I40E_TX_MAX_BURST);
+						CI_TX_MAX_BURST);
 
 		ret = tx_xmit_pkts((struct ci_tx_queue *)tx_queue,
 						&tx_pkts[nb_tx], num);
diff --git a/drivers/net/intel/i40e/i40e_rxtx.h b/drivers/net/intel/i40e/i40e_rxtx.h
index 307ffa3049..0977342064 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.h
+++ b/drivers/net/intel/i40e/i40e_rxtx.h
@@ -47,9 +47,6 @@
 #define I40E_RX_DESC_EXT_STATUS_FLEXBL_MASK   0x03
 #define I40E_RX_DESC_EXT_STATUS_FLEXBL_FLEX   0x01
 
-#define I40E_TD_CMD (CI_TX_DESC_CMD_ICRC |\
-		     CI_TX_DESC_CMD_EOP)
-
 enum i40e_header_split_mode {
 	i40e_header_split_none = 0,
 	i40e_header_split_enabled = 1,
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
index 4c36748d94..68667bdc9b 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
@@ -476,8 +476,8 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
-	uint64_t flags = I40E_TD_CMD;
-	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
+	uint64_t flags = CI_TX_DESC_CMD_DEFAULT;
+	uint64_t rs = CI_TX_DESC_CMD_RS | CI_TX_DESC_CMD_DEFAULT;
 	int i;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
index 502a1842c6..e1672c4371 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
@@ -741,8 +741,8 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
-	uint64_t flags = I40E_TD_CMD;
-	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
+	uint64_t flags = CI_TX_DESC_CMD_DEFAULT;
+	uint64_t rs = CI_TX_DESC_CMD_RS | CI_TX_DESC_CMD_DEFAULT;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
 		ci_tx_free_bufs_vec(txq, i40e_tx_desc_done, false);
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
index d48ff9f51e..bceb95ad2d 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
@@ -801,8 +801,8 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
-	uint64_t flags = I40E_TD_CMD;
-	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
+	uint64_t flags = CI_TX_DESC_CMD_DEFAULT;
+	uint64_t rs = CI_TX_DESC_CMD_RS | CI_TX_DESC_CMD_DEFAULT;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
 		ci_tx_free_bufs_vec(txq, i40e_tx_desc_done, false);
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
index be4c64942e..debc9bda28 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
@@ -626,8 +626,8 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict tx_queue,
 	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
-	uint64_t flags = I40E_TD_CMD;
-	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
+	uint64_t flags = CI_TX_DESC_CMD_DEFAULT;
+	uint64_t rs = CI_TX_DESC_CMD_RS | CI_TX_DESC_CMD_DEFAULT;
 	int i;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 7261c07265..5e4391f120 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3271,67 +3271,6 @@ ice_tx_done_cleanup(void *txq, uint32_t free_cnt)
 		return ice_tx_done_cleanup_full(q, free_cnt);
 }
 
-/* Populate 4 descriptors with data from 4 mbufs */
-static inline void
-tx4(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
-{
-	uint64_t dma_addr;
-	uint32_t i;
-
-	for (i = 0; i < 4; i++, txdp++, pkts++) {
-		dma_addr = rte_mbuf_data_iova(*pkts);
-		txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
-		txdp->cmd_type_offset_bsz =
-			ice_build_ctob((uint32_t)ICE_TD_CMD, 0,
-				       (*pkts)->data_len, 0);
-	}
-}
-
-/* Populate 1 descriptor with data from 1 mbuf */
-static inline void
-tx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
-{
-	uint64_t dma_addr;
-
-	dma_addr = rte_mbuf_data_iova(*pkts);
-	txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
-	txdp->cmd_type_offset_bsz =
-		ice_build_ctob((uint32_t)ICE_TD_CMD, 0,
-			       (*pkts)->data_len, 0);
-}
-
-static inline void
-ice_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
-		    uint16_t nb_pkts)
-{
-	volatile struct ci_tx_desc *txdp = &txq->ci_tx_ring[txq->tx_tail];
-	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
-	const int N_PER_LOOP = 4;
-	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
-	int mainpart, leftover;
-	int i, j;
-
-	/**
-	 * Process most of the packets in chunks of N pkts.  Any
-	 * leftover packets will get processed one at a time.
-	 */
-	mainpart = nb_pkts & ((uint32_t)~N_PER_LOOP_MASK);
-	leftover = nb_pkts & ((uint32_t)N_PER_LOOP_MASK);
-	for (i = 0; i < mainpart; i += N_PER_LOOP) {
-		/* Copy N mbuf pointers to the S/W ring */
-		for (j = 0; j < N_PER_LOOP; ++j)
-			(txep + i + j)->mbuf = *(pkts + i + j);
-		tx4(txdp + i, pkts + i);
-	}
-
-	if (unlikely(leftover > 0)) {
-		for (i = 0; i < leftover; ++i) {
-			(txep + mainpart + i)->mbuf = *(pkts + mainpart + i);
-			tx1(txdp + mainpart + i, pkts + mainpart + i);
-		}
-	}
-}
-
 static inline uint16_t
 tx_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
@@ -3356,7 +3295,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
 	if ((txq->tx_tail + nb_pkts) > txq->nb_tx_desc) {
 		n = (uint16_t)(txq->nb_tx_desc - txq->tx_tail);
-		ice_tx_fill_hw_ring(txq, tx_pkts, n);
+		ci_tx_fill_hw_ring(txq, tx_pkts, n);
 		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
@@ -3364,7 +3303,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	}
 
 	/* Fill hardware descriptor ring with mbuf data */
-	ice_tx_fill_hw_ring(txq, tx_pkts + n, (uint16_t)(nb_pkts - n));
+	ci_tx_fill_hw_ring(txq, tx_pkts + n, (uint16_t)(nb_pkts - n));
 	txq->tx_tail = (uint16_t)(txq->tx_tail + (nb_pkts - n));
 
 	/* Determine if RS bit needs to be set */
@@ -3393,13 +3332,13 @@ ice_xmit_pkts_simple(void *tx_queue,
 {
 	uint16_t nb_tx = 0;
 
-	if (likely(nb_pkts <= ICE_TX_MAX_BURST))
+	if (likely(nb_pkts <= CI_TX_MAX_BURST))
 		return tx_xmit_pkts((struct ci_tx_queue *)tx_queue,
 				    tx_pkts, nb_pkts);
 
 	while (nb_pkts) {
 		uint16_t ret, num = (uint16_t)RTE_MIN(nb_pkts,
-						      ICE_TX_MAX_BURST);
+						      CI_TX_MAX_BURST);
 
 		ret = tx_xmit_pkts((struct ci_tx_queue *)tx_queue,
 				   &tx_pkts[nb_tx], num);
diff --git a/drivers/net/intel/ice/ice_rxtx.h b/drivers/net/intel/ice/ice_rxtx.h
index cd5fa93d1c..ddcd012e8b 100644
--- a/drivers/net/intel/ice/ice_rxtx.h
+++ b/drivers/net/intel/ice/ice_rxtx.h
@@ -46,8 +46,6 @@
 
 #define ICE_SUPPORT_CHAIN_NUM 5
 
-#define ICE_TD_CMD                      CI_TX_DESC_CMD_EOP
-
 #define ICE_VPMD_RX_BURST            CI_VPMD_RX_BURST
 #define ICE_VPMD_TX_BURST            32
 #define ICE_VPMD_RXQ_REARM_THRESH    CI_VPMD_RX_REARM_THRESH
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
index 2922671158..d03f2e5b36 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
@@ -845,8 +845,8 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
-	uint64_t flags = ICE_TD_CMD;
-	uint64_t rs = CI_TX_DESC_CMD_RS | ICE_TD_CMD;
+	uint64_t flags = CI_TX_DESC_CMD_DEFAULT;
+	uint64_t rs = CI_TX_DESC_CMD_RS | CI_TX_DESC_CMD_DEFAULT;
 
 	/* cross rx_thresh boundary is not allowed */
 	nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
index e64b6e227b..004c01054a 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
@@ -909,8 +909,8 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
-	uint64_t flags = ICE_TD_CMD;
-	uint64_t rs = CI_TX_DESC_CMD_RS | ICE_TD_CMD;
+	uint64_t flags = CI_TX_DESC_CMD_DEFAULT;
+	uint64_t rs = CI_TX_DESC_CMD_RS | CI_TX_DESC_CMD_DEFAULT;
 
 	/* cross rx_thresh boundary is not allowed */
 	nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 30/36] net/intel: consolidate ice and i40e buffer free function
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (28 preceding siblings ...)
  2026-01-13 15:14   ` [PATCH v2 29/36] net/intel: merge ring writes in simple Tx for ice and i40e Bruce Richardson
@ 2026-01-13 15:14   ` Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 31/36] net/intel: complete merging simple Tx paths Bruce Richardson
                     ` (7 subsequent siblings)
  37 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-13 15:14 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

The buffer freeing function for the simple scalar Tx path is almost
identical in both ice and i40e drivers, except that the i40e has
batching for the FAST_FREE case. Consolidate both functions into a
common one based off the better i40e version.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h            |  3 ++
 drivers/net/intel/common/tx_scalar_fns.h | 55 ++++++++++++++++++++++
 drivers/net/intel/i40e/i40e_rxtx.c       | 58 +-----------------------
 drivers/net/intel/ice/ice_rxtx.c         | 40 +---------------
 4 files changed, 62 insertions(+), 94 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 502b3f2032..753e3a2e9e 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -66,6 +66,9 @@ enum ci_tx_l2tag1_field {
 /* Common TX maximum burst size for chunked transmission in simple paths */
 #define CI_TX_MAX_BURST 32
 
+/* Common TX maximum free buffer size for batched bulk freeing */
+#define CI_TX_MAX_FREE_BUF_SZ 64
+
 /* Common TX descriptor command flags for simple transmit */
 #define CI_TX_DESC_CMD_DEFAULT (CI_TX_DESC_CMD_ICRC | CI_TX_DESC_CMD_EOP)
 
diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index 0d64a63e16..d472aa24e0 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -72,6 +72,61 @@ ci_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
 	}
 }
 
+/* Free transmitted mbufs from descriptor ring with bulk freeing for Tx simple path */
+static __rte_always_inline int
+ci_tx_free_bufs(struct ci_tx_queue *txq)
+{
+	struct ci_tx_entry *txep;
+	uint16_t tx_rs_thresh = txq->tx_rs_thresh;
+	uint16_t i = 0, j = 0;
+	struct rte_mbuf *free[CI_TX_MAX_FREE_BUF_SZ];
+	const uint16_t k = RTE_ALIGN_FLOOR(tx_rs_thresh, CI_TX_MAX_FREE_BUF_SZ);
+	const uint16_t m = tx_rs_thresh % CI_TX_MAX_FREE_BUF_SZ;
+
+	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
+			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
+		return 0;
+
+	txep = &txq->sw_ring[txq->tx_next_dd - (tx_rs_thresh - 1)];
+
+	for (i = 0; i < tx_rs_thresh; i++)
+		rte_prefetch0((txep + i)->mbuf);
+
+	if (txq->offloads & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE) {
+		if (k) {
+			for (j = 0; j != k; j += CI_TX_MAX_FREE_BUF_SZ) {
+				for (i = 0; i < CI_TX_MAX_FREE_BUF_SZ; ++i, ++txep) {
+					free[i] = txep->mbuf;
+					txep->mbuf = NULL;
+				}
+				rte_mbuf_raw_free_bulk(free[0]->pool, free,
+						CI_TX_MAX_FREE_BUF_SZ);
+			}
+		}
+
+		if (m) {
+			for (i = 0; i < m; ++i, ++txep) {
+				free[i] = txep->mbuf;
+				txep->mbuf = NULL;
+			}
+			rte_mbuf_raw_free_bulk(free[0]->pool, free, m);
+		}
+	} else {
+		for (i = 0; i < txq->tx_rs_thresh; ++i, ++txep) {
+			rte_pktmbuf_free_seg(txep->mbuf);
+			txep->mbuf = NULL;
+		}
+	}
+
+	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + txq->tx_rs_thresh);
+	txq->tx_next_dd = (uint16_t)(txq->tx_next_dd + txq->tx_rs_thresh);
+	if (txq->tx_next_dd >= txq->nb_tx_desc)
+		txq->tx_next_dd = (uint16_t)(txq->tx_rs_thresh - 1);
+
+	return txq->tx_rs_thresh;
+}
+
 /*
  * Common transmit descriptor cleanup function for Intel drivers.
  * Used by ice, i40e, iavf, and idpf drivers.
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index cb91eeeab2..22728af980 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -1012,60 +1012,6 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			get_context_desc, NULL, NULL);
 }
 
-static __rte_always_inline int
-i40e_tx_free_bufs(struct ci_tx_queue *txq)
-{
-	struct ci_tx_entry *txep;
-	uint16_t tx_rs_thresh = txq->tx_rs_thresh;
-	uint16_t i = 0, j = 0;
-	struct rte_mbuf *free[I40E_TX_MAX_FREE_BUF_SZ];
-	const uint16_t k = RTE_ALIGN_FLOOR(tx_rs_thresh, I40E_TX_MAX_FREE_BUF_SZ);
-	const uint16_t m = tx_rs_thresh % I40E_TX_MAX_FREE_BUF_SZ;
-
-	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
-			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
-		return 0;
-
-	txep = &txq->sw_ring[txq->tx_next_dd - (tx_rs_thresh - 1)];
-
-	for (i = 0; i < tx_rs_thresh; i++)
-		rte_prefetch0((txep + i)->mbuf);
-
-	if (txq->offloads & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE) {
-		if (k) {
-			for (j = 0; j != k; j += I40E_TX_MAX_FREE_BUF_SZ) {
-				for (i = 0; i < I40E_TX_MAX_FREE_BUF_SZ; ++i, ++txep) {
-					free[i] = txep->mbuf;
-					txep->mbuf = NULL;
-				}
-				rte_mbuf_raw_free_bulk(free[0]->pool, free,
-						I40E_TX_MAX_FREE_BUF_SZ);
-			}
-		}
-
-		if (m) {
-			for (i = 0; i < m; ++i, ++txep) {
-				free[i] = txep->mbuf;
-				txep->mbuf = NULL;
-			}
-			rte_mbuf_raw_free_bulk(free[0]->pool, free, m);
-		}
-	} else {
-		for (i = 0; i < txq->tx_rs_thresh; ++i, ++txep) {
-			rte_pktmbuf_free_seg(txep->mbuf);
-			txep->mbuf = NULL;
-		}
-	}
-
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + txq->tx_rs_thresh);
-	txq->tx_next_dd = (uint16_t)(txq->tx_next_dd + txq->tx_rs_thresh);
-	if (txq->tx_next_dd >= txq->nb_tx_desc)
-		txq->tx_next_dd = (uint16_t)(txq->tx_rs_thresh - 1);
-
-	return txq->tx_rs_thresh;
-}
-
 static inline uint16_t
 tx_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
@@ -1080,7 +1026,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	 * descriptor, free the associated buffer.
 	 */
 	if (txq->nb_tx_free < txq->tx_free_thresh)
-		i40e_tx_free_bufs(txq);
+		ci_tx_free_bufs(txq);
 
 	/* Use available descriptor only */
 	nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
@@ -2493,7 +2439,7 @@ i40e_tx_done_cleanup_simple(struct ci_tx_queue *txq,
 		if (txq->nb_tx_desc - txq->nb_tx_free < txq->tx_rs_thresh)
 			break;
 
-		n = i40e_tx_free_bufs(txq);
+		n = ci_tx_free_bufs(txq);
 
 		if (n == 0)
 			break;
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 5e4391f120..468c039ab1 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3119,42 +3119,6 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	return ci_xmit_pkts(txq, tx_pkts, nb_pkts, CI_VLAN_IN_L2TAG1, get_context_desc, NULL, NULL);
 }
 
-static __rte_always_inline int
-ice_tx_free_bufs(struct ci_tx_queue *txq)
-{
-	struct ci_tx_entry *txep;
-	uint16_t i;
-
-	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
-	     rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
-	    rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
-		return 0;
-
-	txep = &txq->sw_ring[txq->tx_next_dd - (txq->tx_rs_thresh - 1)];
-
-	for (i = 0; i < txq->tx_rs_thresh; i++)
-		rte_prefetch0((txep + i)->mbuf);
-
-	if (txq->offloads & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE) {
-		for (i = 0; i < txq->tx_rs_thresh; ++i, ++txep) {
-			rte_mempool_put(txep->mbuf->pool, txep->mbuf);
-			txep->mbuf = NULL;
-		}
-	} else {
-		for (i = 0; i < txq->tx_rs_thresh; ++i, ++txep) {
-			rte_pktmbuf_free_seg(txep->mbuf);
-			txep->mbuf = NULL;
-		}
-	}
-
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + txq->tx_rs_thresh);
-	txq->tx_next_dd = (uint16_t)(txq->tx_next_dd + txq->tx_rs_thresh);
-	if (txq->tx_next_dd >= txq->nb_tx_desc)
-		txq->tx_next_dd = (uint16_t)(txq->tx_rs_thresh - 1);
-
-	return txq->tx_rs_thresh;
-}
-
 static int
 ice_tx_done_cleanup_full(struct ci_tx_queue *txq,
 			uint32_t free_cnt)
@@ -3244,7 +3208,7 @@ ice_tx_done_cleanup_simple(struct ci_tx_queue *txq,
 		if (txq->nb_tx_desc - txq->nb_tx_free < txq->tx_rs_thresh)
 			break;
 
-		n = ice_tx_free_bufs(txq);
+		n = ci_tx_free_bufs(txq);
 
 		if (n == 0)
 			break;
@@ -3285,7 +3249,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	 * descriptor, free the associated buffer.
 	 */
 	if (txq->nb_tx_free < txq->tx_free_thresh)
-		ice_tx_free_bufs(txq);
+		ci_tx_free_bufs(txq);
 
 	/* Use available descriptor only */
 	nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 31/36] net/intel: complete merging simple Tx paths
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (29 preceding siblings ...)
  2026-01-13 15:14   ` [PATCH v2 30/36] net/intel: consolidate ice and i40e buffer free function Bruce Richardson
@ 2026-01-13 15:14   ` Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 32/36] net/intel: use non-volatile stores in simple Tx function Bruce Richardson
                     ` (6 subsequent siblings)
  37 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-13 15:14 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

Complete the deduplication/merger of the ice and i40e Tx simple scalar
paths.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar_fns.h | 87 ++++++++++++++++++++++++
 drivers/net/intel/i40e/i40e_rxtx.c       | 74 +-------------------
 drivers/net/intel/ice/ice_rxtx.c         | 74 +-------------------
 3 files changed, 89 insertions(+), 146 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index d472aa24e0..dcdec200ac 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -127,6 +127,93 @@ ci_tx_free_bufs(struct ci_tx_queue *txq)
 	return txq->tx_rs_thresh;
 }
 
+/* Simple burst transmit for descriptor-based simple Tx path
+ *
+ * Transmits a burst of packets by filling hardware descriptors with mbuf
+ * data. Handles ring wrap-around and RS bit management. Performs descriptor
+ * cleanup when tx_free_thresh is reached.
+ *
+ * Returns: number of packets transmitted
+ */
+static inline uint16_t
+ci_xmit_burst_simple(struct ci_tx_queue *txq,
+		     struct rte_mbuf **tx_pkts,
+		     uint16_t nb_pkts)
+{
+	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
+	uint16_t n = 0;
+
+	/**
+	 * Begin scanning the H/W ring for done descriptors when the number
+	 * of available descriptors drops below tx_free_thresh. For each done
+	 * descriptor, free the associated buffer.
+	 */
+	if (txq->nb_tx_free < txq->tx_free_thresh)
+		ci_tx_free_bufs(txq);
+
+	/* Use available descriptor only */
+	nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
+	if (unlikely(!nb_pkts))
+		return 0;
+
+	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
+	if ((txq->tx_tail + nb_pkts) > txq->nb_tx_desc) {
+		n = (uint16_t)(txq->nb_tx_desc - txq->tx_tail);
+		ci_tx_fill_hw_ring(txq, tx_pkts, n);
+		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) <<
+					  CI_TXD_QW1_CMD_S);
+		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
+		txq->tx_tail = 0;
+	}
+
+	/* Fill hardware descriptor ring with mbuf data */
+	ci_tx_fill_hw_ring(txq, tx_pkts + n, (uint16_t)(nb_pkts - n));
+	txq->tx_tail = (uint16_t)(txq->tx_tail + (nb_pkts - n));
+
+	/* Determine if RS bit needs to be set */
+	if (txq->tx_tail > txq->tx_next_rs) {
+		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) <<
+					  CI_TXD_QW1_CMD_S);
+		txq->tx_next_rs =
+			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
+		if (txq->tx_next_rs >= txq->nb_tx_desc)
+			txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
+	}
+
+	if (txq->tx_tail >= txq->nb_tx_desc)
+		txq->tx_tail = 0;
+
+	/* Update the tx tail register */
+	rte_write32_wc((uint32_t)txq->tx_tail, txq->qtx_tail);
+
+	return nb_pkts;
+}
+
+static __rte_always_inline uint16_t
+ci_xmit_pkts_simple(struct ci_tx_queue *txq,
+		     struct rte_mbuf **tx_pkts,
+		     uint16_t nb_pkts)
+{
+	uint16_t nb_tx = 0;
+
+	if (likely(nb_pkts <= CI_TX_MAX_BURST))
+		return ci_xmit_burst_simple(txq, tx_pkts, nb_pkts);
+
+	while (nb_pkts) {
+		uint16_t ret, num = RTE_MIN(nb_pkts, CI_TX_MAX_BURST);
+
+		ret = ci_xmit_burst_simple(txq, &tx_pkts[nb_tx], num);
+		nb_tx += ret;
+		nb_pkts -= ret;
+		if (ret < num)
+			break;
+	}
+
+	return nb_tx;
+}
+
 /*
  * Common transmit descriptor cleanup function for Intel drivers.
  * Used by ice, i40e, iavf, and idpf drivers.
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 22728af980..ac53554234 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -1012,84 +1012,12 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			get_context_desc, NULL, NULL);
 }
 
-static inline uint16_t
-tx_xmit_pkts(struct ci_tx_queue *txq,
-	     struct rte_mbuf **tx_pkts,
-	     uint16_t nb_pkts)
-{
-	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
-	uint16_t n = 0;
-
-	/**
-	 * Begin scanning the H/W ring for done descriptors when the number
-	 * of available descriptors drops below tx_free_thresh. For each done
-	 * descriptor, free the associated buffer.
-	 */
-	if (txq->nb_tx_free < txq->tx_free_thresh)
-		ci_tx_free_bufs(txq);
-
-	/* Use available descriptor only */
-	nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
-	if (unlikely(!nb_pkts))
-		return 0;
-
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
-	if ((txq->tx_tail + nb_pkts) > txq->nb_tx_desc) {
-		n = (uint16_t)(txq->nb_tx_desc - txq->tx_tail);
-		ci_tx_fill_hw_ring(txq, tx_pkts, n);
-		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
-		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
-		txq->tx_tail = 0;
-	}
-
-	/* Fill hardware descriptor ring with mbuf data */
-	ci_tx_fill_hw_ring(txq, tx_pkts + n, (uint16_t)(nb_pkts - n));
-	txq->tx_tail = (uint16_t)(txq->tx_tail + (nb_pkts - n));
-
-	/* Determine if RS bit needs to be set */
-	if (txq->tx_tail > txq->tx_next_rs) {
-		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
-		txq->tx_next_rs =
-			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
-		if (txq->tx_next_rs >= txq->nb_tx_desc)
-			txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
-	}
-
-	if (txq->tx_tail >= txq->nb_tx_desc)
-		txq->tx_tail = 0;
-
-	/* Update the tx tail register */
-	I40E_PCI_REG_WC_WRITE(txq->qtx_tail, txq->tx_tail);
-
-	return nb_pkts;
-}
-
 static uint16_t
 i40e_xmit_pkts_simple(void *tx_queue,
 		      struct rte_mbuf **tx_pkts,
 		      uint16_t nb_pkts)
 {
-	uint16_t nb_tx = 0;
-
-	if (likely(nb_pkts <= CI_TX_MAX_BURST))
-		return tx_xmit_pkts((struct ci_tx_queue *)tx_queue,
-						tx_pkts, nb_pkts);
-
-	while (nb_pkts) {
-		uint16_t ret, num = (uint16_t)RTE_MIN(nb_pkts,
-						CI_TX_MAX_BURST);
-
-		ret = tx_xmit_pkts((struct ci_tx_queue *)tx_queue,
-						&tx_pkts[nb_tx], num);
-		nb_tx = (uint16_t)(nb_tx + ret);
-		nb_pkts = (uint16_t)(nb_pkts - ret);
-		if (ret < num)
-			break;
-	}
-
-	return nb_tx;
+	return ci_xmit_pkts_simple(tx_queue, tx_pkts, nb_pkts);
 }
 
 #ifndef RTE_ARCH_X86
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 468c039ab1..ed82a84dc5 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3235,84 +3235,12 @@ ice_tx_done_cleanup(void *txq, uint32_t free_cnt)
 		return ice_tx_done_cleanup_full(q, free_cnt);
 }
 
-static inline uint16_t
-tx_xmit_pkts(struct ci_tx_queue *txq,
-	     struct rte_mbuf **tx_pkts,
-	     uint16_t nb_pkts)
-{
-	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
-	uint16_t n = 0;
-
-	/**
-	 * Begin scanning the H/W ring for done descriptors when the number
-	 * of available descriptors drops below tx_free_thresh. For each done
-	 * descriptor, free the associated buffer.
-	 */
-	if (txq->nb_tx_free < txq->tx_free_thresh)
-		ci_tx_free_bufs(txq);
-
-	/* Use available descriptor only */
-	nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
-	if (unlikely(!nb_pkts))
-		return 0;
-
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
-	if ((txq->tx_tail + nb_pkts) > txq->nb_tx_desc) {
-		n = (uint16_t)(txq->nb_tx_desc - txq->tx_tail);
-		ci_tx_fill_hw_ring(txq, tx_pkts, n);
-		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
-		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
-		txq->tx_tail = 0;
-	}
-
-	/* Fill hardware descriptor ring with mbuf data */
-	ci_tx_fill_hw_ring(txq, tx_pkts + n, (uint16_t)(nb_pkts - n));
-	txq->tx_tail = (uint16_t)(txq->tx_tail + (nb_pkts - n));
-
-	/* Determine if RS bit needs to be set */
-	if (txq->tx_tail > txq->tx_next_rs) {
-		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
-		txq->tx_next_rs =
-			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
-		if (txq->tx_next_rs >= txq->nb_tx_desc)
-			txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
-	}
-
-	if (txq->tx_tail >= txq->nb_tx_desc)
-		txq->tx_tail = 0;
-
-	/* Update the tx tail register */
-	ICE_PCI_REG_WC_WRITE(txq->qtx_tail, txq->tx_tail);
-
-	return nb_pkts;
-}
-
 static uint16_t
 ice_xmit_pkts_simple(void *tx_queue,
 		     struct rte_mbuf **tx_pkts,
 		     uint16_t nb_pkts)
 {
-	uint16_t nb_tx = 0;
-
-	if (likely(nb_pkts <= CI_TX_MAX_BURST))
-		return tx_xmit_pkts((struct ci_tx_queue *)tx_queue,
-				    tx_pkts, nb_pkts);
-
-	while (nb_pkts) {
-		uint16_t ret, num = (uint16_t)RTE_MIN(nb_pkts,
-						      CI_TX_MAX_BURST);
-
-		ret = tx_xmit_pkts((struct ci_tx_queue *)tx_queue,
-				   &tx_pkts[nb_tx], num);
-		nb_tx = (uint16_t)(nb_tx + ret);
-		nb_pkts = (uint16_t)(nb_pkts - ret);
-		if (ret < num)
-			break;
-	}
-
-	return nb_tx;
+	return ci_xmit_pkts_simple(tx_queue, tx_pkts, nb_pkts);
 }
 
 static const struct ci_rx_path_info ice_rx_path_infos[] = {
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 32/36] net/intel: use non-volatile stores in simple Tx function
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (30 preceding siblings ...)
  2026-01-13 15:14   ` [PATCH v2 31/36] net/intel: complete merging simple Tx paths Bruce Richardson
@ 2026-01-13 15:14   ` Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 33/36] net/intel: align scalar simple Tx path with vector logic Bruce Richardson
                     ` (5 subsequent siblings)
  37 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-13 15:14 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

The simple Tx code path can be reworked to use non-volatile stores - as
is the case with the full-featured Tx path - by reusing the existing
write_txd function (which just needs to be moved up in the header file).
This gives a small performance boost.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar_fns.h | 55 +++++++-----------------
 1 file changed, 16 insertions(+), 39 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index dcdec200ac..16f802e902 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -12,35 +12,13 @@
 /* depends on common Tx definitions. */
 #include "tx.h"
 
-/* Populate 4 descriptors with data from 4 mbufs */
 static inline void
-ci_tx_fill_hw_ring_tx4(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
+write_txd(volatile void *txd, uint64_t qw0, uint64_t qw1)
 {
-	uint64_t dma_addr;
-	uint32_t i;
-
-	for (i = 0; i < 4; i++, txdp++, pkts++) {
-		dma_addr = rte_mbuf_data_iova(*pkts);
-		txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
-		txdp->cmd_type_offset_bsz =
-			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
-				((uint64_t)CI_TX_DESC_CMD_DEFAULT << CI_TXD_QW1_CMD_S) |
-				((uint64_t)(*pkts)->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
-	}
-}
+	uint64_t *txd_qw =  __rte_assume_aligned(RTE_CAST_PTR(void *, txd), 16);
 
-/* Populate 1 descriptor with data from 1 mbuf */
-static inline void
-ci_tx_fill_hw_ring_tx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
-{
-	uint64_t dma_addr;
-
-	dma_addr = rte_mbuf_data_iova(*pkts);
-	txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
-	txdp->cmd_type_offset_bsz =
-		rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
-			((uint64_t)CI_TX_DESC_CMD_DEFAULT << CI_TXD_QW1_CMD_S) |
-			((uint64_t)(*pkts)->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
+	txd_qw[0] = rte_cpu_to_le_64(qw0);
+	txd_qw[1] = rte_cpu_to_le_64(qw1);
 }
 
 /* Fill hardware descriptor ring with mbuf data */
@@ -60,14 +38,22 @@ ci_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
 	for (i = 0; i < mainpart; i += N_PER_LOOP) {
 		for (j = 0; j < N_PER_LOOP; ++j)
 			(txep + i + j)->mbuf = *(pkts + i + j);
-		ci_tx_fill_hw_ring_tx4(txdp + i, pkts + i);
+		for (j = 0; j < N_PER_LOOP; ++j)
+			write_txd(txdp + i + j, rte_mbuf_data_iova(*(pkts + i + j)),
+				CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)CI_TX_DESC_CMD_DEFAULT << CI_TXD_QW1_CMD_S) |
+				((uint64_t)(*(pkts + i + j))->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 	}
 
 	if (unlikely(leftover > 0)) {
 		for (i = 0; i < leftover; ++i) {
-			(txep + mainpart + i)->mbuf = *(pkts + mainpart + i);
-			ci_tx_fill_hw_ring_tx1(txdp + mainpart + i,
-					       pkts + mainpart + i);
+			uint16_t idx = mainpart + i;
+			(txep + idx)->mbuf = *(pkts + idx);
+			write_txd(txdp + idx, rte_mbuf_data_iova(*(pkts + idx)),
+				CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)CI_TX_DESC_CMD_DEFAULT << CI_TXD_QW1_CMD_S) |
+				((uint64_t)(*(pkts + idx))->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
+
 		}
 	}
 }
@@ -367,15 +353,6 @@ struct ci_timesstamp_queue_fns {
 	write_ts_tail_t write_ts_tail;
 };
 
-static inline void
-write_txd(volatile void *txd, uint64_t qw0, uint64_t qw1)
-{
-	uint64_t *txd_qw = __rte_assume_aligned(RTE_CAST_PTR(void *, txd), 16);
-
-	txd_qw[0] = rte_cpu_to_le_64(qw0);
-	txd_qw[1] = rte_cpu_to_le_64(qw1);
-}
-
 static inline uint16_t
 ci_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 33/36] net/intel: align scalar simple Tx path with vector logic
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (31 preceding siblings ...)
  2026-01-13 15:14   ` [PATCH v2 32/36] net/intel: use non-volatile stores in simple Tx function Bruce Richardson
@ 2026-01-13 15:14   ` Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 34/36] net/intel: use vector SW ring entry for simple path Bruce Richardson
                     ` (4 subsequent siblings)
  37 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-13 15:14 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

The scalar simple Tx path has the same restrictions as the vector Tx
path, so we can use the same logic flow in both, to try and ensure we
get best performance from the scalar path.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar_fns.h | 54 +++++++++++++++---------
 1 file changed, 34 insertions(+), 20 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index 16f802e902..f85ca741a9 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -21,13 +21,11 @@ write_txd(volatile void *txd, uint64_t qw0, uint64_t qw1)
 	txd_qw[1] = rte_cpu_to_le_64(qw1);
 }
 
-/* Fill hardware descriptor ring with mbuf data */
+/* Fill hardware descriptor ring with mbuf data (simple path) */
 static inline void
-ci_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
-		   uint16_t nb_pkts)
+ci_tx_fill_hw_ring_simple(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts,
+			  uint16_t nb_pkts)
 {
-	volatile struct ci_tx_desc *txdp = &txq->ci_tx_ring[txq->tx_tail];
-	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
 	const int N_PER_LOOP = 4;
 	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
 	int mainpart, leftover;
@@ -36,8 +34,6 @@ ci_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
 	mainpart = nb_pkts & ((uint32_t)~N_PER_LOOP_MASK);
 	leftover = nb_pkts & ((uint32_t)N_PER_LOOP_MASK);
 	for (i = 0; i < mainpart; i += N_PER_LOOP) {
-		for (j = 0; j < N_PER_LOOP; ++j)
-			(txep + i + j)->mbuf = *(pkts + i + j);
 		for (j = 0; j < N_PER_LOOP; ++j)
 			write_txd(txdp + i + j, rte_mbuf_data_iova(*(pkts + i + j)),
 				CI_TX_DESC_DTYPE_DATA |
@@ -48,12 +44,10 @@ ci_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
 	if (unlikely(leftover > 0)) {
 		for (i = 0; i < leftover; ++i) {
 			uint16_t idx = mainpart + i;
-			(txep + idx)->mbuf = *(pkts + idx);
 			write_txd(txdp + idx, rte_mbuf_data_iova(*(pkts + idx)),
 				CI_TX_DESC_DTYPE_DATA |
 				((uint64_t)CI_TX_DESC_CMD_DEFAULT << CI_TXD_QW1_CMD_S) |
 				((uint64_t)(*(pkts + idx))->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
-
 		}
 	}
 }
@@ -127,6 +121,9 @@ ci_xmit_burst_simple(struct ci_tx_queue *txq,
 		     uint16_t nb_pkts)
 {
 	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
+	volatile struct ci_tx_desc *txdp;
+	struct ci_tx_entry *txep;
+	uint16_t tx_id;
 	uint16_t n = 0;
 
 	/**
@@ -142,23 +139,41 @@ ci_xmit_burst_simple(struct ci_tx_queue *txq,
 	if (unlikely(!nb_pkts))
 		return 0;
 
+	tx_id = txq->tx_tail;
+	txdp = &txr[tx_id];
+	txep = &txq->sw_ring[tx_id];
+
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
-	if ((txq->tx_tail + nb_pkts) > txq->nb_tx_desc) {
-		n = (uint16_t)(txq->nb_tx_desc - txq->tx_tail);
-		ci_tx_fill_hw_ring(txq, tx_pkts, n);
+
+	if ((tx_id + nb_pkts) > txq->nb_tx_desc) {
+		n = (uint16_t)(txq->nb_tx_desc - tx_id);
+
+		/* Store mbufs in backlog */
+		ci_tx_backlog_entry(txep, tx_pkts, n);
+
+		/* Write descriptors to HW ring */
+		ci_tx_fill_hw_ring_simple(txdp, tx_pkts, n);
+
 		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) <<
 					  CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
-		txq->tx_tail = 0;
+
+		tx_id = 0;
+		txdp = &txr[tx_id];
+		txep = &txq->sw_ring[tx_id];
 	}
 
-	/* Fill hardware descriptor ring with mbuf data */
-	ci_tx_fill_hw_ring(txq, tx_pkts + n, (uint16_t)(nb_pkts - n));
-	txq->tx_tail = (uint16_t)(txq->tx_tail + (nb_pkts - n));
+	/* Store remaining mbufs in backlog */
+	ci_tx_backlog_entry(txep, tx_pkts + n, (uint16_t)(nb_pkts - n));
+
+	/* Write remaining descriptors to HW ring */
+	ci_tx_fill_hw_ring_simple(txdp, tx_pkts + n, (uint16_t)(nb_pkts - n));
+
+	tx_id = (uint16_t)(tx_id + (nb_pkts - n));
 
 	/* Determine if RS bit needs to be set */
-	if (txq->tx_tail > txq->tx_next_rs) {
+	if (tx_id > txq->tx_next_rs) {
 		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) <<
 					  CI_TXD_QW1_CMD_S);
@@ -168,11 +183,10 @@ ci_xmit_burst_simple(struct ci_tx_queue *txq,
 			txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 	}
 
-	if (txq->tx_tail >= txq->nb_tx_desc)
-		txq->tx_tail = 0;
+	txq->tx_tail = tx_id;
 
 	/* Update the tx tail register */
-	rte_write32_wc((uint32_t)txq->tx_tail, txq->qtx_tail);
+	rte_write32_wc((uint32_t)tx_id, txq->qtx_tail);
 
 	return nb_pkts;
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 34/36] net/intel: use vector SW ring entry for simple path
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (32 preceding siblings ...)
  2026-01-13 15:14   ` [PATCH v2 33/36] net/intel: align scalar simple Tx path with vector logic Bruce Richardson
@ 2026-01-13 15:14   ` Bruce Richardson
  2026-01-13 15:14   ` [PATCH v2 35/36] net/intel: use vector mbuf cleanup from simple scalar path Bruce Richardson
                     ` (3 subsequent siblings)
  37 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-13 15:14 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Praveen Shetty, Vladimir Medvedkin,
	Anatoly Burakov, Jingjing Wu

The simple scalar Tx path does not need to use the full sw_entry
structure that the full Tx path uses, so rename the flag for "vector_tx"
to instead be "use_vec_entry" since its sole purpose is to flag the use
of the smaller tx_entry_vec structure. Then set this flag for the simple
Tx path, giving us a perf boost.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h                    |  6 ++++--
 drivers/net/intel/common/tx_scalar_fns.h         | 14 +++++++-------
 drivers/net/intel/cpfl/cpfl_rxtx.c               |  4 ++--
 drivers/net/intel/i40e/i40e_rxtx.c               |  2 +-
 drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c      |  2 +-
 drivers/net/intel/ice/ice_rxtx.c                 |  2 +-
 drivers/net/intel/idpf/idpf_common_rxtx_avx512.c |  2 +-
 drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c  |  2 +-
 8 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 753e3a2e9e..44270bf3e6 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -160,7 +160,7 @@ struct ci_tx_queue {
 	rte_iova_t tx_ring_dma;        /* TX ring DMA address */
 	bool tx_deferred_start; /* don't start this queue in dev start */
 	bool q_set;             /* indicate if tx queue has been configured */
-	bool vector_tx;         /* port is using vector TX */
+	bool use_vec_entry;     /* use sw_ring_vec (true for vector and simple paths) */
 	union {                  /* the VSI this queue belongs to */
 		struct i40e_vsi *i40e_vsi;
 		struct iavf_vsi *iavf_vsi;
@@ -343,7 +343,8 @@ ci_txq_release_all_mbufs(struct ci_tx_queue *txq, bool use_ctx)
 	if (unlikely(!txq || !txq->sw_ring))
 		return;
 
-	if (!txq->vector_tx) {
+	if (!txq->use_vec_entry) {
+		/* Regular scalar path uses sw_ring with ci_tx_entry */
 		for (uint16_t i = 0; i < txq->nb_tx_desc; i++) {
 			if (txq->sw_ring[i].mbuf != NULL) {
 				rte_pktmbuf_free_seg(txq->sw_ring[i].mbuf);
@@ -354,6 +355,7 @@ ci_txq_release_all_mbufs(struct ci_tx_queue *txq, bool use_ctx)
 	}
 
 	/**
+	 *  Vector and simple paths use sw_ring_vec (ci_tx_entry_vec).
 	 *  vPMD tx will not set sw_ring's mbuf to NULL after free,
 	 *  so determining buffers to free is a little more complex.
 	 */
diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index f85ca741a9..b284b80cbe 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -56,7 +56,7 @@ ci_tx_fill_hw_ring_simple(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pk
 static __rte_always_inline int
 ci_tx_free_bufs(struct ci_tx_queue *txq)
 {
-	struct ci_tx_entry *txep;
+	struct ci_tx_entry_vec *txep;
 	uint16_t tx_rs_thresh = txq->tx_rs_thresh;
 	uint16_t i = 0, j = 0;
 	struct rte_mbuf *free[CI_TX_MAX_FREE_BUF_SZ];
@@ -68,7 +68,7 @@ ci_tx_free_bufs(struct ci_tx_queue *txq)
 			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
 		return 0;
 
-	txep = &txq->sw_ring[txq->tx_next_dd - (tx_rs_thresh - 1)];
+	txep = &txq->sw_ring_vec[txq->tx_next_dd - (tx_rs_thresh - 1)];
 
 	for (i = 0; i < tx_rs_thresh; i++)
 		rte_prefetch0((txep + i)->mbuf);
@@ -122,7 +122,7 @@ ci_xmit_burst_simple(struct ci_tx_queue *txq,
 {
 	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
 	volatile struct ci_tx_desc *txdp;
-	struct ci_tx_entry *txep;
+	struct ci_tx_entry_vec *txep;
 	uint16_t tx_id;
 	uint16_t n = 0;
 
@@ -141,7 +141,7 @@ ci_xmit_burst_simple(struct ci_tx_queue *txq,
 
 	tx_id = txq->tx_tail;
 	txdp = &txr[tx_id];
-	txep = &txq->sw_ring[tx_id];
+	txep = &txq->sw_ring_vec[tx_id];
 
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
 
@@ -149,7 +149,7 @@ ci_xmit_burst_simple(struct ci_tx_queue *txq,
 		n = (uint16_t)(txq->nb_tx_desc - tx_id);
 
 		/* Store mbufs in backlog */
-		ci_tx_backlog_entry(txep, tx_pkts, n);
+		ci_tx_backlog_entry_vec(txep, tx_pkts, n);
 
 		/* Write descriptors to HW ring */
 		ci_tx_fill_hw_ring_simple(txdp, tx_pkts, n);
@@ -161,11 +161,11 @@ ci_xmit_burst_simple(struct ci_tx_queue *txq,
 
 		tx_id = 0;
 		txdp = &txr[tx_id];
-		txep = &txq->sw_ring[tx_id];
+		txep = &txq->sw_ring_vec[tx_id];
 	}
 
 	/* Store remaining mbufs in backlog */
-	ci_tx_backlog_entry(txep, tx_pkts + n, (uint16_t)(nb_pkts - n));
+	ci_tx_backlog_entry_vec(txep, tx_pkts + n, (uint16_t)(nb_pkts - n));
 
 	/* Write remaining descriptors to HW ring */
 	ci_tx_fill_hw_ring_simple(txdp, tx_pkts + n, (uint16_t)(nb_pkts - n));
diff --git a/drivers/net/intel/cpfl/cpfl_rxtx.c b/drivers/net/intel/cpfl/cpfl_rxtx.c
index a3127e7c97..6d8798a60f 100644
--- a/drivers/net/intel/cpfl/cpfl_rxtx.c
+++ b/drivers/net/intel/cpfl/cpfl_rxtx.c
@@ -328,7 +328,7 @@ cpfl_tx_queue_release(void *txq)
 		rte_free(q->complq);
 	}
 
-	ci_txq_release_all_mbufs(q, q->vector_tx);
+	ci_txq_release_all_mbufs(q, q->use_vec_entry);
 	rte_free(q->sw_ring);
 	rte_memzone_free(q->mz);
 	rte_free(cpfl_txq);
@@ -1335,7 +1335,7 @@ cpfl_tx_queue_stop(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 	}
 
 	txq = &cpfl_txq->base;
-	ci_txq_release_all_mbufs(txq, txq->vector_tx);
+	ci_txq_release_all_mbufs(txq, txq->use_vec_entry);
 	if (vport->txq_model == VIRTCHNL2_QUEUE_MODEL_SINGLE) {
 		idpf_qc_single_tx_queue_reset(txq);
 	} else {
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index ac53554234..185e45fb9a 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -1453,7 +1453,7 @@ i40e_dev_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 		PMD_DRV_LOG(WARNING, "TX queue %u is deferred start",
 			    tx_queue_id);
 
-	txq->vector_tx = ad->tx_vec_allowed;
+	txq->use_vec_entry = ad->tx_vec_allowed || ad->tx_simple_allowed;
 
 	/*
 	 * tx_queue_id is queue id application refers to, while
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
index cea4ee9863..374c713a94 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
@@ -1803,7 +1803,7 @@ iavf_xmit_pkts_vec_avx2_offload(void *tx_queue, struct rte_mbuf **tx_pkts,
 int __rte_cold
 iavf_txq_vec_setup(struct ci_tx_queue *txq)
 {
-	txq->vector_tx = true;
+	txq->use_vec_entry = true;
 	return 0;
 }
 
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index ed82a84dc5..06f7e85c12 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -882,7 +882,7 @@ ice_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 		}
 
 	/* record what kind of descriptor cleanup we need on teardown */
-	txq->vector_tx = ad->tx_vec_allowed;
+	txq->use_vec_entry = ad->tx_vec_allowed || ad->tx_simple_allowed;
 
 	if (txq->tsq != NULL && txq->tsq->ts_flag > 0) {
 		struct ice_aqc_set_txtime_qgrp *ts_elem;
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
index 49ace35615..666ad1a4dd 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
@@ -1365,6 +1365,6 @@ idpf_qc_tx_vec_avx512_setup(struct ci_tx_queue *txq)
 	if (!txq)
 		return 0;
 
-	txq->vector_tx = true;
+	txq->use_vec_entry = true;
 	return 0;
 }
diff --git a/drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c b/drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c
index 63c7cb50d3..c42b8fc96b 100644
--- a/drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c
+++ b/drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c
@@ -111,7 +111,7 @@ ixgbe_txq_vec_setup(struct ci_tx_queue *txq)
 	/* leave the first one for overflow */
 	txq->sw_ring_vec = txq->sw_ring_vec + 1;
 	txq->ops = &vec_txq_ops;
-	txq->vector_tx = 1;
+	txq->use_vec_entry = true;
 
 	return 0;
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 35/36] net/intel: use vector mbuf cleanup from simple scalar path
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (33 preceding siblings ...)
  2026-01-13 15:14   ` [PATCH v2 34/36] net/intel: use vector SW ring entry for simple path Bruce Richardson
@ 2026-01-13 15:14   ` Bruce Richardson
  2026-01-13 15:15   ` [PATCH v2 36/36] net/idpf: enable simple Tx function Bruce Richardson
                     ` (2 subsequent siblings)
  37 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-13 15:14 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

Since the simple scalar path now uses the vector Tx entry struct, we can
leverage the vector mbuf cleanup function from that path and avoid
having a separate cleanup function for it.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar_fns.h | 71 +++++-------------------
 drivers/net/intel/i40e/i40e_rxtx.c       |  2 +-
 drivers/net/intel/ice/ice_rxtx.c         |  2 +-
 3 files changed, 17 insertions(+), 58 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index b284b80cbe..ce3837a201 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -21,6 +21,20 @@ write_txd(volatile void *txd, uint64_t qw0, uint64_t qw1)
 	txd_qw[1] = rte_cpu_to_le_64(qw1);
 }
 
+static __rte_always_inline int
+ci_tx_desc_done_simple(struct ci_tx_queue *txq, uint16_t idx)
+{
+	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz & rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
+}
+
+/* Free transmitted mbufs using vector-style cleanup */
+static __rte_always_inline int
+ci_tx_free_bufs_simple(struct ci_tx_queue *txq)
+{
+	return ci_tx_free_bufs_vec(txq, ci_tx_desc_done_simple, false);
+}
+
 /* Fill hardware descriptor ring with mbuf data (simple path) */
 static inline void
 ci_tx_fill_hw_ring_simple(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts,
@@ -52,61 +66,6 @@ ci_tx_fill_hw_ring_simple(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pk
 	}
 }
 
-/* Free transmitted mbufs from descriptor ring with bulk freeing for Tx simple path */
-static __rte_always_inline int
-ci_tx_free_bufs(struct ci_tx_queue *txq)
-{
-	struct ci_tx_entry_vec *txep;
-	uint16_t tx_rs_thresh = txq->tx_rs_thresh;
-	uint16_t i = 0, j = 0;
-	struct rte_mbuf *free[CI_TX_MAX_FREE_BUF_SZ];
-	const uint16_t k = RTE_ALIGN_FLOOR(tx_rs_thresh, CI_TX_MAX_FREE_BUF_SZ);
-	const uint16_t m = tx_rs_thresh % CI_TX_MAX_FREE_BUF_SZ;
-
-	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
-			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
-		return 0;
-
-	txep = &txq->sw_ring_vec[txq->tx_next_dd - (tx_rs_thresh - 1)];
-
-	for (i = 0; i < tx_rs_thresh; i++)
-		rte_prefetch0((txep + i)->mbuf);
-
-	if (txq->offloads & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE) {
-		if (k) {
-			for (j = 0; j != k; j += CI_TX_MAX_FREE_BUF_SZ) {
-				for (i = 0; i < CI_TX_MAX_FREE_BUF_SZ; ++i, ++txep) {
-					free[i] = txep->mbuf;
-					txep->mbuf = NULL;
-				}
-				rte_mbuf_raw_free_bulk(free[0]->pool, free,
-						CI_TX_MAX_FREE_BUF_SZ);
-			}
-		}
-
-		if (m) {
-			for (i = 0; i < m; ++i, ++txep) {
-				free[i] = txep->mbuf;
-				txep->mbuf = NULL;
-			}
-			rte_mbuf_raw_free_bulk(free[0]->pool, free, m);
-		}
-	} else {
-		for (i = 0; i < txq->tx_rs_thresh; ++i, ++txep) {
-			rte_pktmbuf_free_seg(txep->mbuf);
-			txep->mbuf = NULL;
-		}
-	}
-
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + txq->tx_rs_thresh);
-	txq->tx_next_dd = (uint16_t)(txq->tx_next_dd + txq->tx_rs_thresh);
-	if (txq->tx_next_dd >= txq->nb_tx_desc)
-		txq->tx_next_dd = (uint16_t)(txq->tx_rs_thresh - 1);
-
-	return txq->tx_rs_thresh;
-}
-
 /* Simple burst transmit for descriptor-based simple Tx path
  *
  * Transmits a burst of packets by filling hardware descriptors with mbuf
@@ -132,7 +91,7 @@ ci_xmit_burst_simple(struct ci_tx_queue *txq,
 	 * descriptor, free the associated buffer.
 	 */
 	if (txq->nb_tx_free < txq->tx_free_thresh)
-		ci_tx_free_bufs(txq);
+		ci_tx_free_bufs_simple(txq);
 
 	/* Use available descriptor only */
 	nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 185e45fb9a..820a955158 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -2367,7 +2367,7 @@ i40e_tx_done_cleanup_simple(struct ci_tx_queue *txq,
 		if (txq->nb_tx_desc - txq->nb_tx_free < txq->tx_rs_thresh)
 			break;
 
-		n = ci_tx_free_bufs(txq);
+		n = ci_tx_free_bufs_simple(txq);
 
 		if (n == 0)
 			break;
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 06f7e85c12..be9d88dda6 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3208,7 +3208,7 @@ ice_tx_done_cleanup_simple(struct ci_tx_queue *txq,
 		if (txq->nb_tx_desc - txq->nb_tx_free < txq->tx_rs_thresh)
 			break;
 
-		n = ci_tx_free_bufs(txq);
+		n = ci_tx_free_bufs_simple(txq);
 
 		if (n == 0)
 			break;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v2 36/36] net/idpf: enable simple Tx function
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (34 preceding siblings ...)
  2026-01-13 15:14   ` [PATCH v2 35/36] net/intel: use vector mbuf cleanup from simple scalar path Bruce Richardson
@ 2026-01-13 15:15   ` Bruce Richardson
  2026-01-13 17:17   ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Stephen Hemminger
  2026-01-23  6:26   ` Stephen Hemminger
  37 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-13 15:15 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Jingjing Wu, Praveen Shetty

The common "simple Tx" function - in some ways a scalar version of the
vector Tx functions - can be used by the idpf driver as well as i40e and
ice, so add support for it to the driver.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/idpf/idpf_common_device.h |  2 ++
 drivers/net/intel/idpf/idpf_common_rxtx.c   | 19 +++++++++++++++
 drivers/net/intel/idpf/idpf_common_rxtx.h   |  3 +++
 drivers/net/intel/idpf/idpf_rxtx.c          | 26 ++++++++++++++++++++-
 4 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/drivers/net/intel/idpf/idpf_common_device.h b/drivers/net/intel/idpf/idpf_common_device.h
index 31915a03d4..527aa9b3dc 100644
--- a/drivers/net/intel/idpf/idpf_common_device.h
+++ b/drivers/net/intel/idpf/idpf_common_device.h
@@ -78,6 +78,7 @@ enum idpf_rx_func_type {
 enum idpf_tx_func_type {
 	IDPF_TX_DEFAULT,
 	IDPF_TX_SINGLEQ,
+	IDPF_TX_SINGLEQ_SIMPLE,
 	IDPF_TX_SINGLEQ_AVX2,
 	IDPF_TX_AVX512,
 	IDPF_TX_SINGLEQ_AVX512,
@@ -100,6 +101,7 @@ struct idpf_adapter {
 
 	bool is_tx_singleq; /* true - single queue model, false - split queue model */
 	bool is_rx_singleq; /* true - single queue model, false - split queue model */
+	bool tx_simple_allowed; /* true if all queues support simple TX */
 
 	/* For timestamp */
 	uint64_t time_hw;
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 781310e564..bf2e9363d4 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -1348,6 +1348,15 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			idpf_set_tso_ctx, NULL, NULL);
 }
 
+RTE_EXPORT_INTERNAL_SYMBOL(idpf_dp_singleq_xmit_pkts_simple)
+uint16_t
+idpf_dp_singleq_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
+				   uint16_t nb_pkts)
+{
+	return ci_xmit_pkts_simple(tx_queue, tx_pkts, nb_pkts);
+}
+
+
 /* TX prep functions */
 RTE_EXPORT_INTERNAL_SYMBOL(idpf_dp_prep_pkts)
 uint16_t
@@ -1533,6 +1542,16 @@ const struct ci_tx_path_info idpf_tx_path_infos[] = {
 			.single_queue = true
 		}
 	},
+	[IDPF_TX_SINGLEQ_SIMPLE] = {
+		.pkt_burst = idpf_dp_singleq_xmit_pkts_simple,
+		.info = "Single Queue Scalar Simple",
+		.features = {
+			.tx_offloads = IDPF_TX_VECTOR_OFFLOADS,
+			.single_queue = true,
+			.simple_tx = true,
+		}
+	},
+
 #ifdef RTE_ARCH_X86
 	[IDPF_TX_SINGLEQ_AVX2] = {
 		.pkt_burst = idpf_dp_singleq_xmit_pkts_avx2,
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.h b/drivers/net/intel/idpf/idpf_common_rxtx.h
index fe7094d434..914cab0f25 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.h
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.h
@@ -221,6 +221,9 @@ __rte_internal
 uint16_t idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 				   uint16_t nb_pkts);
 __rte_internal
+uint16_t idpf_dp_singleq_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
+				   uint16_t nb_pkts);
+__rte_internal
 uint16_t idpf_dp_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			   uint16_t nb_pkts);
 __rte_internal
diff --git a/drivers/net/intel/idpf/idpf_rxtx.c b/drivers/net/intel/idpf/idpf_rxtx.c
index 5c2516f556..a2bb4b766d 100644
--- a/drivers/net/intel/idpf/idpf_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_rxtx.c
@@ -497,6 +497,22 @@ idpf_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 	txq->q_set = true;
 	dev->data->tx_queues[queue_idx] = txq;
 
+	/* Set tx_simple_allowed flag based on queue configuration.
+	 * For queue 0: explicitly set the flag based on its configuration.
+	 * For other queues: only set to false if this queue cannot use simple_tx.
+	 */
+	if (vport->txq_model == VIRTCHNL2_QUEUE_MODEL_SPLIT)
+		goto out;
+
+	/* for first queue, default to true, disable later if any queue can't meet conditions */
+	if (queue_idx == 0)
+		adapter->tx_simple_allowed = true;
+
+	if ((txq->offloads != (txq->offloads & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE)) ||
+			txq->tx_rs_thresh < IDPF_VPMD_TX_MAX_BURST)
+		adapter->tx_simple_allowed = false;
+
+out:
 	return 0;
 
 err_complq_setup:
@@ -639,6 +655,7 @@ int
 idpf_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 {
 	struct idpf_vport *vport = dev->data->dev_private;
+	struct idpf_adapter *ad = vport->adapter;
 	struct ci_tx_queue *txq = dev->data->tx_queues[tx_queue_id];
 	int err = 0;
 
@@ -655,6 +672,12 @@ idpf_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 		return err;
 	}
 
+	/* Record what kind of descriptor cleanup we need on teardown.
+	 * For single queue mode, vector or simple tx paths use vec entry format.
+	 */
+	if (vport->txq_model == VIRTCHNL2_QUEUE_MODEL_SINGLE)
+		txq->use_vec_entry = ad->tx_simple_allowed;
+
 	/* Ready to switch the queue on */
 	err = idpf_vc_queue_switch(vport, tx_queue_id, false, true,
 							VIRTCHNL2_QUEUE_TYPE_TX);
@@ -835,7 +858,8 @@ idpf_set_tx_function(struct rte_eth_dev *dev)
 	struct ci_tx_path_features req_features = {
 		.tx_offloads = dev->data->dev_conf.txmode.offloads,
 		.simd_width = RTE_VECT_SIMD_DISABLED,
-		.single_queue = (vport->txq_model == VIRTCHNL2_QUEUE_MODEL_SINGLE)
+		.single_queue = (vport->txq_model == VIRTCHNL2_QUEUE_MODEL_SINGLE),
+		.simple_tx = ad->tx_simple_allowed
 	};
 
 	/* The primary process selects the tx path for all processes. */
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* Re: [PATCH v2 00/36] combine multiple Intel scalar Tx paths
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (35 preceding siblings ...)
  2026-01-13 15:15   ` [PATCH v2 36/36] net/idpf: enable simple Tx function Bruce Richardson
@ 2026-01-13 17:17   ` Stephen Hemminger
  2026-01-23  6:26   ` Stephen Hemminger
  37 siblings, 0 replies; 274+ messages in thread
From: Stephen Hemminger @ 2026-01-13 17:17 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

On Tue, 13 Jan 2026 15:14:24 +0000
Bruce Richardson <bruce.richardson@intel.com> wrote:

> The scalar Tx paths, with support for offloads and multiple mbufs
> per packet, are almost identical across drivers ice, i40e, iavf and
> the single-queue mode of idpf. Therefore, we can do some rework to
> combine these code paths into a single function which is parameterized
> by compile-time constants, allowing code saving to give us a single
> path to optimize and maintain - apart from edge cases like IPSec
> support in iavf.
> 
> The ixgbe driver has a number of similarities too, which we take
> advantage of where we can, but the overall descriptor format is
> sufficiently different that its main scalar code path is kept
> separate.
> 
> Once merged, we can then optimize the drivers a bit to improve
> performance, and also easily extend some drivers to use additional
> paths for better performance, e.g. add the "simple scalar" path
> to IDPF driver for better performance on platforms without AVX.
> 
> V2:
>  - reworked the simple-scalar path as well as full scalar one
>  - added simple scalar path support to idpf driver
>  - small cleanups, e.g. issues flagged by checkpatch
> 
> Bruce Richardson (36):
>   net/intel: create common Tx descriptor structure
>   net/intel: use common Tx ring structure
>   net/intel: create common post-Tx cleanup function
>   net/intel: consolidate definitions for Tx desc fields
>   net/intel: create separate header for Tx scalar fns
>   net/intel: add common fn to calculate needed descriptors
>   net/ice: refactor context descriptor handling
>   net/i40e: refactor context descriptor handling
>   net/idpf: refactor context descriptor handling
>   net/intel: consolidate checksum mask definition
>   net/intel: create common checksum Tx offload function
>   net/intel: create a common scalar Tx function
>   net/i40e: use common scalar Tx function
>   net/intel: add IPsec hooks to common Tx function
>   net/intel: support configurable VLAN tag insertion on Tx
>   net/iavf: use common scalar Tx function
>   net/i40e: document requirement for QinQ support
>   net/idpf: use common scalar Tx function
>   net/intel: avoid writing the final pkt descriptor twice
>   eal: add macro for marking assumed alignment
>   net/intel: write descriptors using non-volatile pointers
>   net/intel: remove unnecessary flag clearing
>   net/intel: mark mid-burst ring cleanup as unlikely
>   net/intel: add special handling for single desc packets
>   net/intel: use separate array for desc status tracking
>   net/ixgbe: use separate array for desc status tracking
>   net/intel: drop unused Tx queue used count
>   net/intel: remove index for tracking end of packet
>   net/intel: merge ring writes in simple Tx for ice and i40e
>   net/intel: consolidate ice and i40e buffer free function
>   net/intel: complete merging simple Tx paths
>   net/intel: use non-volatile stores in simple Tx function
>   net/intel: align scalar simple Tx path with vector logic
>   net/intel: use vector SW ring entry for simple path
>   net/intel: use vector mbuf cleanup from simple scalar path
>   net/idpf: enable simple Tx function
> 
>  doc/guides/nics/i40e.rst                      |  18 +
>  drivers/net/intel/common/tx.h                 | 116 ++-
>  drivers/net/intel/common/tx_scalar_fns.h      | 595 ++++++++++++++
>  drivers/net/intel/cpfl/cpfl_rxtx.c            |   8 +-
>  drivers/net/intel/i40e/i40e_fdir.c            |  34 +-
>  drivers/net/intel/i40e/i40e_rxtx.c            | 670 +++-------------
>  drivers/net/intel/i40e/i40e_rxtx.h            |  16 -
>  .../net/intel/i40e/i40e_rxtx_vec_altivec.c    |  25 +-
>  drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c   |  36 +-
>  drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c |  52 +-
>  drivers/net/intel/i40e/i40e_rxtx_vec_common.h |   6 +-
>  drivers/net/intel/i40e/i40e_rxtx_vec_neon.c   |  25 +-
>  drivers/net/intel/iavf/iavf_rxtx.c            | 642 ++++-----------
>  drivers/net/intel/iavf/iavf_rxtx.h            |  30 +-
>  drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c   |  55 +-
>  drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c | 104 +--
>  drivers/net/intel/iavf/iavf_rxtx_vec_common.h |  36 +-
>  drivers/net/intel/ice/ice_dcf_ethdev.c        |  10 +-
>  drivers/net/intel/ice/ice_rxtx.c              | 737 ++++--------------
>  drivers/net/intel/ice/ice_rxtx.h              |  15 -
>  drivers/net/intel/ice/ice_rxtx_vec_avx2.c     |  55 +-
>  drivers/net/intel/ice/ice_rxtx_vec_avx512.c   |  53 +-
>  drivers/net/intel/ice/ice_rxtx_vec_common.h   |  43 +-
>  drivers/net/intel/idpf/idpf_common_device.h   |   2 +
>  drivers/net/intel/idpf/idpf_common_rxtx.c     | 315 ++------
>  drivers/net/intel/idpf/idpf_common_rxtx.h     |  24 +-
>  .../net/intel/idpf/idpf_common_rxtx_avx2.c    |  53 +-
>  .../net/intel/idpf/idpf_common_rxtx_avx512.c  |  55 +-
>  drivers/net/intel/idpf/idpf_rxtx.c            |  43 +-
>  drivers/net/intel/idpf/idpf_rxtx_vec_common.h |   6 +-
>  drivers/net/intel/ixgbe/ixgbe_rxtx.c          | 103 ++-
>  .../net/intel/ixgbe/ixgbe_rxtx_vec_common.c   |   3 +-
>  lib/eal/include/rte_common.h                  |   6 +
>  33 files changed, 1565 insertions(+), 2426 deletions(-)
>  create mode 100644 drivers/net/intel/common/tx_scalar_fns.h
> 
> --
> 2.51.0


Series-Acked-by: Stephen Hemminger <stephen@networkplumber.org>

Looks ok to me, asked Claude for second opinion.

Its suggestion about long log message is overblown.
Although, I would suggest being more succinct. 

# DPDK Patch Review: Intel Tx Consolidation Series (v2)

**Series:** `[PATCH v2 01-36/36]` Intel Tx code consolidation  
**Author:** Bruce Richardson <bruce.richardson@intel.com>  
**Patches Reviewed:** 36  
**Review Date:** January 13, 2026

---

## Executive Summary

This is a substantial refactoring series that consolidates Tx (transmit) descriptor structures and functions across Intel network drivers (i40e, ice, iavf, idpf, ixgbe). The series is well-structured with clear commit messages and proper attribution. A few minor issues were identified.

| Severity | Count |
|----------|-------|
| Error    | 1     |
| Warning  | 4     |
| Info     | 2     |

---

## Errors (Must Fix)

### 1. Patch 17/36: Line exceeds 100 characters

**File:** `drivers/net/intel/i40e/i40e_rxtx.c`  
**Subject:** `net/i40e: document requirement for QinQ support`

```c
PMD_DRV_LOG(WARNING, "Double VLAN insertion may not work correctly without RTE_ETH_RX_OFFLOAD_VLAN_EXTEND set in Rx configuration.");
```

**Issue:** Line is 136 characters, exceeding the 100-character limit for source code.

**Suggested fix:** Split the log message:
```c
PMD_DRV_LOG(WARNING, "Double VLAN insertion may not work correctly "
		"without RTE_ETH_RX_OFFLOAD_VLAN_EXTEND set in Rx configuration.");
```

---

## Warnings (Should Fix)

### 2. Patch 14/36: Implicit pointer comparison

**File:** `drivers/net/intel/common/tx_scalar_fns.h`  
**Subject:** `net/intel: add IPsec hooks to common Tx function`

```c
md = RTE_MBUF_DYNFIELD(mbuf, txq->ipsec_crypto_pkt_md_offset,
			     struct iavf_ipsec_crypto_pkt_metadata *);
if (!md)
```

**Issue:** Pointer comparison uses `!md` instead of explicit `md == NULL`.

**Suggested fix:**
```c
if (md == NULL)
```

### 3. Patch 16/36: Implicit integer comparison

**File:** `drivers/net/intel/iavf/iavf_rxtx.c`  
**Subject:** `net/iavf: use common scalar Tx function`

```c
if (!iavf_calc_context_desc(mbuf, iavf_vlan_flag))
```

**Issue:** `iavf_calc_context_desc()` returns `uint16_t`. Comparison should be explicit.

**Suggested fix:**
```c
if (iavf_calc_context_desc(mbuf, iavf_vlan_flag) == 0)
```

### 4. Patches 19, 25, 26: Implicit integer comparison with rte_is_power_of_2()

**Multiple files across patches**

```c
if (!rte_is_power_of_2(tx_rs_thresh)) {
```

**Issue:** While `rte_is_power_of_2()` acts as a boolean predicate, it returns `int`. Strictly, the comparison should be explicit.

**Suggested fix:**
```c
if (rte_is_power_of_2(tx_rs_thresh) == 0) {
```

*Note: This is a borderline issue as the function is semantically boolean. May be acceptable.*

### 5. Patch 36/36: Double blank lines

**File:** `drivers/net/intel/idpf/idpf_common_rxtx.c`  
**Subject:** `net/idpf: enable simple Tx function`

```c
	return ci_xmit_pkts_simple(tx_queue, tx_pkts, nb_pkts);
}


/* TX prep functions */
```

**Issue:** Two consecutive blank lines after function definition.

**Suggested fix:** Remove one blank line.

---

## Info (Consider)

### 6. Patch 20/36: New EAL macro without release notes

**File:** `lib/eal/include/rte_common.h`  
**Subject:** `eal: add macro for marking assumed alignment`

The patch adds `__rte_assume_aligned` macro to EAL common header. While this is an internal optimization helper, significant EAL additions typically warrant a release note entry.

**Suggestion:** Consider adding a brief mention in release notes for the current release cycle.

### 7. Overall: No documentation for new internal APIs

The series adds new internal functions (e.g., `idpf_dp_singleq_xmit_pkts_simple`) marked with `__rte_internal`. While internal APIs don't require Doxygen, brief inline comments explaining their purpose would aid maintainability.

---

## Compliance Summary

### Commit Message Checklist

| Check | Status |
|-------|--------|
| Subject lines ≤60 characters | ✅ All pass (max: 51 chars) |
| Lowercase after colon | ✅ All pass |
| Correct component prefix | ✅ All pass (`net/intel:`, `net/i40e:`, `eal:`, etc.) |
| Imperative mood | ✅ All pass |
| No trailing period | ✅ All pass |
| Signed-off-by present | ✅ All 36 patches |
| Real name and valid email | ✅ Bruce Richardson <bruce.richardson@intel.com> |
| Body wrapped at 75 chars | ✅ All pass |

### Code Style Checklist

| Check | Status |
|-------|--------|
| Lines ≤100 characters | ❌ 1 violation (Patch 17) |
| No trailing whitespace | ✅ Pass |
| `__rte_internal` alone on line | ✅ Correct usage |
| Explicit pointer comparisons | ⚠️ 1 violation (Patch 14) |
| Explicit integer comparisons | ⚠️ ~6 instances |
| No double blank lines | ⚠️ 1 violation (Patch 36) |
| No unnecessary void* casts | ✅ Pass |
| No forbidden tokens | ✅ Pass |

### Structure Checklist

| Check | Status |
|-------|--------|
| Each commit compiles independently | ✅ Appears correct |
| Code and docs updated together | ✅ Patch 17 adds docs with code |
| New internal APIs marked `__rte_internal` | ✅ Correct |
| Release notes updated | ⚠️ Consider for EAL changes |

---

## Technical Assessment

The series accomplishes significant code consolidation:

1. **Common Tx descriptor structure** (`struct ci_tx_desc`) unifies identical 16-byte descriptors across i40e, iavf, ice, and idpf drivers.

2. **Shared scalar Tx function** (`ci_xmit_pkts()`) reduces code duplication significantly.

3. **Simple Tx path** optimization enables scalar code to use the more efficient vector SW ring entry format.

4. **New EAL macro** (`__rte_assume_aligned`) provides portable way to mark pointer alignment assumptions for compiler optimization.

The refactoring maintains backward compatibility and should not introduce functional regressions.

---

## Recommendation

**Acceptable with minor revisions.** Address the Error and consider fixing the Warnings before merge.

---

*Review generated according to DPDK AGENTS.md guidelines*



^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v2 00/36] combine multiple Intel scalar Tx paths
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (36 preceding siblings ...)
  2026-01-13 17:17   ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Stephen Hemminger
@ 2026-01-23  6:26   ` Stephen Hemminger
  2026-01-26  9:02     ` Bruce Richardson
  37 siblings, 1 reply; 274+ messages in thread
From: Stephen Hemminger @ 2026-01-23  6:26 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

On Tue, 13 Jan 2026 15:14:24 +0000
Bruce Richardson <bruce.richardson@intel.com> wrote:

> The scalar Tx paths, with support for offloads and multiple mbufs
> per packet, are almost identical across drivers ice, i40e, iavf and
> the single-queue mode of idpf. Therefore, we can do some rework to
> combine these code paths into a single function which is parameterized
> by compile-time constants, allowing code saving to give us a single
> path to optimize and maintain - apart from edge cases like IPSec
> support in iavf.
> 
> The ixgbe driver has a number of similarities too, which we take
> advantage of where we can, but the overall descriptor format is
> sufficiently different that its main scalar code path is kept
> separate.
> 
> Once merged, we can then optimize the drivers a bit to improve
> performance, and also easily extend some drivers to use additional
> paths for better performance, e.g. add the "simple scalar" path
> to IDPF driver for better performance on platforms without AVX.
> 
> V2:
>  - reworked the simple-scalar path as well as full scalar one
>  - added simple scalar path support to idpf driver
>  - small cleanups, e.g. issues flagged by checkpatch
> 
> Bruce Richardson (36):
>   net/intel: create common Tx descriptor structure
>   net/intel: use common Tx ring structure
>   net/intel: create common post-Tx cleanup function
>   net/intel: consolidate definitions for Tx desc fields
>   net/intel: create separate header for Tx scalar fns
>   net/intel: add common fn to calculate needed descriptors
>   net/ice: refactor context descriptor handling
>   net/i40e: refactor context descriptor handling
>   net/idpf: refactor context descriptor handling
>   net/intel: consolidate checksum mask definition
>   net/intel: create common checksum Tx offload function
>   net/intel: create a common scalar Tx function
>   net/i40e: use common scalar Tx function
>   net/intel: add IPsec hooks to common Tx function
>   net/intel: support configurable VLAN tag insertion on Tx
>   net/iavf: use common scalar Tx function
>   net/i40e: document requirement for QinQ support
>   net/idpf: use common scalar Tx function
>   net/intel: avoid writing the final pkt descriptor twice
>   eal: add macro for marking assumed alignment
>   net/intel: write descriptors using non-volatile pointers
>   net/intel: remove unnecessary flag clearing
>   net/intel: mark mid-burst ring cleanup as unlikely
>   net/intel: add special handling for single desc packets
>   net/intel: use separate array for desc status tracking
>   net/ixgbe: use separate array for desc status tracking
>   net/intel: drop unused Tx queue used count
>   net/intel: remove index for tracking end of packet
>   net/intel: merge ring writes in simple Tx for ice and i40e
>   net/intel: consolidate ice and i40e buffer free function
>   net/intel: complete merging simple Tx paths
>   net/intel: use non-volatile stores in simple Tx function
>   net/intel: align scalar simple Tx path with vector logic
>   net/intel: use vector SW ring entry for simple path
>   net/intel: use vector mbuf cleanup from simple scalar path
>   net/idpf: enable simple Tx function
> 
>  doc/guides/nics/i40e.rst                      |  18 +
>  drivers/net/intel/common/tx.h                 | 116 ++-
>  drivers/net/intel/common/tx_scalar_fns.h      | 595 ++++++++++++++
>  drivers/net/intel/cpfl/cpfl_rxtx.c            |   8 +-
>  drivers/net/intel/i40e/i40e_fdir.c            |  34 +-
>  drivers/net/intel/i40e/i40e_rxtx.c            | 670 +++-------------
>  drivers/net/intel/i40e/i40e_rxtx.h            |  16 -
>  .../net/intel/i40e/i40e_rxtx_vec_altivec.c    |  25 +-
>  drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c   |  36 +-
>  drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c |  52 +-
>  drivers/net/intel/i40e/i40e_rxtx_vec_common.h |   6 +-
>  drivers/net/intel/i40e/i40e_rxtx_vec_neon.c   |  25 +-
>  drivers/net/intel/iavf/iavf_rxtx.c            | 642 ++++-----------
>  drivers/net/intel/iavf/iavf_rxtx.h            |  30 +-
>  drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c   |  55 +-
>  drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c | 104 +--
>  drivers/net/intel/iavf/iavf_rxtx_vec_common.h |  36 +-
>  drivers/net/intel/ice/ice_dcf_ethdev.c        |  10 +-
>  drivers/net/intel/ice/ice_rxtx.c              | 737 ++++--------------
>  drivers/net/intel/ice/ice_rxtx.h              |  15 -
>  drivers/net/intel/ice/ice_rxtx_vec_avx2.c     |  55 +-
>  drivers/net/intel/ice/ice_rxtx_vec_avx512.c   |  53 +-
>  drivers/net/intel/ice/ice_rxtx_vec_common.h   |  43 +-
>  drivers/net/intel/idpf/idpf_common_device.h   |   2 +
>  drivers/net/intel/idpf/idpf_common_rxtx.c     | 315 ++------
>  drivers/net/intel/idpf/idpf_common_rxtx.h     |  24 +-
>  .../net/intel/idpf/idpf_common_rxtx_avx2.c    |  53 +-
>  .../net/intel/idpf/idpf_common_rxtx_avx512.c  |  55 +-
>  drivers/net/intel/idpf/idpf_rxtx.c            |  43 +-
>  drivers/net/intel/idpf/idpf_rxtx_vec_common.h |   6 +-
>  drivers/net/intel/ixgbe/ixgbe_rxtx.c          | 103 ++-
>  .../net/intel/ixgbe/ixgbe_rxtx_vec_common.c   |   3 +-
>  lib/eal/include/rte_common.h                  |   6 +
>  33 files changed, 1565 insertions(+), 2426 deletions(-)
>  create mode 100644 drivers/net/intel/common/tx_scalar_fns.h
> 

Series-Acked-by: Stephen Hemminger <stephen@networkplumber.org>

Love to see common code and "fix it once"
It was too large for the batch scripts, but here is AI review.
It sees only minor stuff which you could fix (or skip).

# DPDK Patch Review: Intel Tx Consolidation Series

**Series:** [PATCH v2 01-36/36] Intel Tx driver consolidation  
**Author:** Bruce Richardson <bruce.richardson@intel.com>  
**Review Date:** 2026-01-22  
**Review Against:** AGENTS.md (DPDK Code Review Guidelines)

---

## Executive Summary

This 36-patch series consolidates Tx descriptor handling across Intel network drivers (i40e, iavf, ice, idpf, ixgbe). Overall the patches are **well-structured** with proper code organization, but there are several **issues requiring attention** before merge.

| Severity | Count | Summary |
|----------|-------|---------|
| **Error** | 1 | Source code line exceeds 100 characters |
| **Warning** | 5 | Commit message style issues, double blank lines |
| **Info** | 2 | Minor style observations |

---

## Detailed Findings

### ERRORS (Must Fix)

#### 1. Source Code Line Length Violation
**Patch 17/36:** `net/i40e: document requirement for QinQ support`  
**Location:** `drivers/net/intel/i40e/i40e_rxtx.c`  
**Issue:** Line exceeds 100-character limit (135 characters)

```c
PMD_DRV_LOG(WARNING, "Double VLAN insertion may not work correctly without RTE_ETH_RX_OFFLOAD_VLAN_EXTEND set in Rx configuration.");
```

**Fix:** Split the log message across multiple lines or into multiple log calls:
```c
PMD_DRV_LOG(WARNING, "Double VLAN insertion may not work correctly "
            "without RTE_ETH_RX_OFFLOAD_VLAN_EXTEND in Rx config.");
```

---

### WARNINGS (Should Fix)

#### 2. Commit Body Starts with Lowercase
**Patches 08/36 and 09/36**

Per DPDK convention, commit message body should start with a capital letter.

| Patch | Current | Should Be |
|-------|---------|-----------|
| 08/36 `net/i40e: refactor context descriptor handling` | "move all context..." | "Move all context..." |
| 09/36 `net/idpf: refactor context descriptor handling` | "move all context..." | "Move all context..." |

---

#### 3. Double Blank Lines in Code
The following patches add consecutive blank lines, which is a minor style violation:

| Patch | Location | Line |
|-------|----------|------|
| 06/36 `net/intel: add common fn to calculate needed` | tx_scalar_fns.h | ~line 5093 |
| 07/36 `net/ice: refactor context descriptor handling` | ice_rxtx.c | ~line 5379 |
| 36/36 `net/idpf: enable simple Tx function` | idpf_common_rxtx.c | After `idpf_dp_singleq_xmit_pkts_simple()` |

**Example from Patch 36:**
```c
+uint16_t
+idpf_dp_singleq_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
+				   uint16_t nb_pkts)
+{
+	return ci_xmit_pkts_simple(tx_queue, tx_pkts, nb_pkts);
+}
+
+
+/* TX prep functions */   <-- Extra blank line above
```

---

#### 4. "TX" vs "Tx" in Comments
Multiple patches use "TX" in comments rather than the DPDK-preferred "Tx":

- Patch 04: `/* Common TX Descriptor QW1 Field Definitions */`
- Patch 06: `/* Calculate the number of TX descriptors needed for each pkt */`
- Patch 07: `/* TX context descriptor based double VLAN insert */`
- Patch 12: `/* Setup TX Descriptor */`
- And others...

While not strictly an error (comments aren't checked as strictly as commit messages), consistency with "Tx" is preferred per `devtools/words-case.txt`.

---

#### 5. EAL Macro Addition Without Release Notes
**Patch 20/36:** `eal: add macro for marking assumed alignment`

Adds `__rte_assume_aligned` macro to `lib/eal/include/rte_common.h`. Consider adding a release notes entry for this new public macro, even though it's primarily for internal optimization use.

---

### INFO (Consider)

#### 6. Subject Line Truncation
**Patch 06/36:** Subject line `net/intel: add common fn to calculate needed` appears truncated (missing "descriptors").

While technically within the 60-char limit, the full meaning is lost. Consider:
- `net/intel: add fn to calc needed descriptors` (43 chars)
- Or keep two-line format in cover letter

---

#### 7. New Internal API Properly Tagged
**Patch 36/36** correctly adds `__rte_internal` tag for `idpf_dp_singleq_xmit_pkts_simple()` in the header file (not .c file). ✓

---

## Compliance Summary

### Commit Message Checks

| Check | Status |
|-------|--------|
| Subject ≤60 chars | ✓ All pass (35-51 chars) |
| Lowercase after prefix | ✓ All pass |
| No trailing period | ✓ All pass |
| Signed-off-by present | ✓ All 36 patches |
| Body ≤75 chars | ✓ All pass |
| Imperative mood | ✓ All pass |
| Correct prefix (net/intel, net/ice, etc.) | ✓ All pass |
| Body starts with capital | ✗ 2 failures (patches 8, 9) |

### Code Style Checks

| Check | Status |
|-------|--------|
| Lines ≤100 chars | ✗ 1 failure (patch 17) |
| No trailing whitespace | ✓ All pass |
| SPDX headers on new files | ✓ Pass (patch 5 new file) |
| `__rte_internal` in headers only | ✓ Pass |
| No double blank lines | ✗ 3 failures |
| Proper Tx/Rx capitalization | ⚠ Comments use "TX" |

### License Checks

| Check | Status |
|-------|--------|
| New file has SPDX | ✓ `tx_scalar_fns.h` has BSD-3-Clause |
| Copyright follows SPDX | ✓ Pass |
| Blank line before code | ✓ Pass |

---

## Files Changed Summary

- **22 files** modified in patch 1 alone
- **New file created:** `drivers/net/intel/common/tx_scalar_fns.h`
- **Drivers affected:** i40e, iavf, ice, idpf, cpfl, ixgbe
- **Documentation updated:** `doc/guides/nics/i40e.rst` (patch 17)

---

## Recommendations

1. **Critical:** Fix the 135-char line in patch 17 before merge
2. **Important:** Capitalize "Move" in patches 8 and 9 commit messages
3. **Minor:** Remove extra blank lines in patches 6, 7, and 36
4. **Optional:** Consider release notes entry for new EAL macro
5. **Optional:** Standardize comment style to use "Tx" instead of "TX"

---

## Verdict

**Conditional Accept** - Series is well-designed and the code consolidation is valuable. Fix the error (line length) and warnings (commit message capitalization, double blank lines) before merge.




^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v2 00/36] combine multiple Intel scalar Tx paths
  2026-01-23  6:26   ` Stephen Hemminger
@ 2026-01-26  9:02     ` Bruce Richardson
  0 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-26  9:02 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

On Thu, Jan 22, 2026 at 10:26:33PM -0800, Stephen Hemminger wrote:
> On Tue, 13 Jan 2026 15:14:24 +0000
> Bruce Richardson <bruce.richardson@intel.com> wrote:
> 
> > The scalar Tx paths, with support for offloads and multiple mbufs
> > per packet, are almost identical across drivers ice, i40e, iavf and
> > the single-queue mode of idpf. Therefore, we can do some rework to
> > combine these code paths into a single function which is parameterized
> > by compile-time constants, allowing code saving to give us a single
> > path to optimize and maintain - apart from edge cases like IPSec
> > support in iavf.
> > 
> > The ixgbe driver has a number of similarities too, which we take
> > advantage of where we can, but the overall descriptor format is
> > sufficiently different that its main scalar code path is kept
> > separate.
> > 
> > Once merged, we can then optimize the drivers a bit to improve
> > performance, and also easily extend some drivers to use additional
> > paths for better performance, e.g. add the "simple scalar" path
> > to IDPF driver for better performance on platforms without AVX.
> > 
> > V2:
> >  - reworked the simple-scalar path as well as full scalar one
> >  - added simple scalar path support to idpf driver
> >  - small cleanups, e.g. issues flagged by checkpatch
> > 
> > Bruce Richardson (36):
> >   net/intel: create common Tx descriptor structure
> >   net/intel: use common Tx ring structure
> >   net/intel: create common post-Tx cleanup function
> >   net/intel: consolidate definitions for Tx desc fields
> >   net/intel: create separate header for Tx scalar fns
> >   net/intel: add common fn to calculate needed descriptors
> >   net/ice: refactor context descriptor handling
> >   net/i40e: refactor context descriptor handling
> >   net/idpf: refactor context descriptor handling
> >   net/intel: consolidate checksum mask definition
> >   net/intel: create common checksum Tx offload function
> >   net/intel: create a common scalar Tx function
> >   net/i40e: use common scalar Tx function
> >   net/intel: add IPsec hooks to common Tx function
> >   net/intel: support configurable VLAN tag insertion on Tx
> >   net/iavf: use common scalar Tx function
> >   net/i40e: document requirement for QinQ support
> >   net/idpf: use common scalar Tx function
> >   net/intel: avoid writing the final pkt descriptor twice
> >   eal: add macro for marking assumed alignment
> >   net/intel: write descriptors using non-volatile pointers
> >   net/intel: remove unnecessary flag clearing
> >   net/intel: mark mid-burst ring cleanup as unlikely
> >   net/intel: add special handling for single desc packets
> >   net/intel: use separate array for desc status tracking
> >   net/ixgbe: use separate array for desc status tracking
> >   net/intel: drop unused Tx queue used count
> >   net/intel: remove index for tracking end of packet
> >   net/intel: merge ring writes in simple Tx for ice and i40e
> >   net/intel: consolidate ice and i40e buffer free function
> >   net/intel: complete merging simple Tx paths
> >   net/intel: use non-volatile stores in simple Tx function
> >   net/intel: align scalar simple Tx path with vector logic
> >   net/intel: use vector SW ring entry for simple path
> >   net/intel: use vector mbuf cleanup from simple scalar path
> >   net/idpf: enable simple Tx function
> > 
> >  doc/guides/nics/i40e.rst                      |  18 +
> >  drivers/net/intel/common/tx.h                 | 116 ++-
> >  drivers/net/intel/common/tx_scalar_fns.h      | 595 ++++++++++++++
> >  drivers/net/intel/cpfl/cpfl_rxtx.c            |   8 +-
> >  drivers/net/intel/i40e/i40e_fdir.c            |  34 +-
> >  drivers/net/intel/i40e/i40e_rxtx.c            | 670 +++-------------
> >  drivers/net/intel/i40e/i40e_rxtx.h            |  16 -
> >  .../net/intel/i40e/i40e_rxtx_vec_altivec.c    |  25 +-
> >  drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c   |  36 +-
> >  drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c |  52 +-
> >  drivers/net/intel/i40e/i40e_rxtx_vec_common.h |   6 +-
> >  drivers/net/intel/i40e/i40e_rxtx_vec_neon.c   |  25 +-
> >  drivers/net/intel/iavf/iavf_rxtx.c            | 642 ++++-----------
> >  drivers/net/intel/iavf/iavf_rxtx.h            |  30 +-
> >  drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c   |  55 +-
> >  drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c | 104 +--
> >  drivers/net/intel/iavf/iavf_rxtx_vec_common.h |  36 +-
> >  drivers/net/intel/ice/ice_dcf_ethdev.c        |  10 +-
> >  drivers/net/intel/ice/ice_rxtx.c              | 737 ++++--------------
> >  drivers/net/intel/ice/ice_rxtx.h              |  15 -
> >  drivers/net/intel/ice/ice_rxtx_vec_avx2.c     |  55 +-
> >  drivers/net/intel/ice/ice_rxtx_vec_avx512.c   |  53 +-
> >  drivers/net/intel/ice/ice_rxtx_vec_common.h   |  43 +-
> >  drivers/net/intel/idpf/idpf_common_device.h   |   2 +
> >  drivers/net/intel/idpf/idpf_common_rxtx.c     | 315 ++------
> >  drivers/net/intel/idpf/idpf_common_rxtx.h     |  24 +-
> >  .../net/intel/idpf/idpf_common_rxtx_avx2.c    |  53 +-
> >  .../net/intel/idpf/idpf_common_rxtx_avx512.c  |  55 +-
> >  drivers/net/intel/idpf/idpf_rxtx.c            |  43 +-
> >  drivers/net/intel/idpf/idpf_rxtx_vec_common.h |   6 +-
> >  drivers/net/intel/ixgbe/ixgbe_rxtx.c          | 103 ++-
> >  .../net/intel/ixgbe/ixgbe_rxtx_vec_common.c   |   3 +-
> >  lib/eal/include/rte_common.h                  |   6 +
> >  33 files changed, 1565 insertions(+), 2426 deletions(-)
> >  create mode 100644 drivers/net/intel/common/tx_scalar_fns.h
> > 
> 
> Series-Acked-by: Stephen Hemminger <stephen@networkplumber.org>
> 
> Love to see common code and "fix it once"
> It was too large for the batch scripts, but here is AI review.
> It sees only minor stuff which you could fix (or skip).
> 

Thanks for running the AI review.

> # DPDK Patch Review: Intel Tx Consolidation Series
> 
> **Series:** [PATCH v2 01-36/36] Intel Tx driver consolidation  
> **Author:** Bruce Richardson <bruce.richardson@intel.com>  
> **Review Date:** 2026-01-22  
> **Review Against:** AGENTS.md (DPDK Code Review Guidelines)
> 
> ---
> 
> ## Executive Summary
> 
> This 36-patch series consolidates Tx descriptor handling across Intel network drivers (i40e, iavf, ice, idpf, ixgbe). Overall the patches are **well-structured** with proper code organization, but there are several **issues requiring attention** before merge.
> 
> | Severity | Count | Summary |
> |----------|-------|---------|
> | **Error** | 1 | Source code line exceeds 100 characters |
> | **Warning** | 5 | Commit message style issues, double blank lines |
> | **Info** | 2 | Minor style observations |
> 
> ---
> 
> ## Detailed Findings
> 
> ### ERRORS (Must Fix)
> 
> #### 1. Source Code Line Length Violation
> **Patch 17/36:** `net/i40e: document requirement for QinQ support`  
> **Location:** `drivers/net/intel/i40e/i40e_rxtx.c`  
> **Issue:** Line exceeds 100-character limit (135 characters)
> 
> ```c
> PMD_DRV_LOG(WARNING, "Double VLAN insertion may not work correctly without RTE_ETH_RX_OFFLOAD_VLAN_EXTEND set in Rx configuration.");
> ```
> 
> **Fix:** Split the log message across multiple lines or into multiple log calls:
> ```c
> PMD_DRV_LOG(WARNING, "Double VLAN insertion may not work correctly "
>             "without RTE_ETH_RX_OFFLOAD_VLAN_EXTEND in Rx config.");
> ```
> 

This is one where the AI guidelines need an update. We need to teach it
that it's better to have a long line than to split error messages.

/Bruce

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [PATCH v3 00/36] combine multiple Intel scalar Tx paths
  2025-12-19 17:25 [RFC PATCH 00/27] combine multiple Intel scalar Tx paths Bruce Richardson
                   ` (27 preceding siblings ...)
  2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
@ 2026-01-30 11:41 ` Bruce Richardson
  2026-01-30 11:41   ` [PATCH v3 01/36] net/intel: create common Tx descriptor structure Bruce Richardson
                     ` (35 more replies)
  2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
  30 siblings, 36 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-30 11:41 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

The scalar Tx paths, with support for offloads and multiple mbufs
per packet, are almost identical across drivers ice, i40e, iavf and
the single-queue mode of idpf. Therefore, we can do some rework to
combine these code paths into a single function which is parameterized
by compile-time constants, allowing code saving to give us a single
path to optimize and maintain - apart from edge cases like IPSec
support in iavf.

The ixgbe driver has a number of similarities too, which we take
advantage of where we can, but the overall descriptor format is
sufficiently different that its main scalar code path is kept
separate.

Once merged, we can then optimize the drivers a bit to improve
performance, and also easily extend some drivers to use additional
paths for better performance, e.g. add the "simple scalar" path
to IDPF driver for better performance on platforms without AVX.

V3:
- rebase on top of latest next-net-intel tree
- fix issues with iavf and cpfl drivers seen in some testing

V2:
 - reworked the simple-scalar path as well as full scalar one
 - added simple scalar path support to idpf driver
 - small cleanups, e.g. issues flagged by checkpatch


Bruce Richardson (36):
  net/intel: create common Tx descriptor structure
  net/intel: use common Tx ring structure
  net/intel: create common post-Tx cleanup function
  net/intel: consolidate definitions for Tx desc fields
  net/intel: create separate header for Tx scalar fns
  net/intel: add common fn to calculate needed descriptors
  net/ice: refactor context descriptor handling
  net/i40e: refactor context descriptor handling
  net/idpf: refactor context descriptor handling
  net/intel: consolidate checksum mask definition
  net/intel: create common checksum Tx offload function
  net/intel: create a common scalar Tx function
  net/i40e: use common scalar Tx function
  net/intel: add IPsec hooks to common Tx function
  net/intel: support configurable VLAN tag insertion on Tx
  net/iavf: use common scalar Tx function
  net/i40e: document requirement for QinQ support
  net/idpf: use common scalar Tx function
  net/intel: avoid writing the final pkt descriptor twice
  eal: add macro for marking assumed alignment
  net/intel: write descriptors using non-volatile pointers
  net/intel: remove unnecessary flag clearing
  net/intel: mark mid-burst ring cleanup as unlikely
  net/intel: add special handling for single desc packets
  net/intel: use separate array for desc status tracking
  net/ixgbe: use separate array for desc status tracking
  net/intel: drop unused Tx queue used count
  net/intel: remove index for tracking end of packet
  net/intel: merge ring writes in simple Tx for ice and i40e
  net/intel: consolidate ice and i40e buffer free function
  net/intel: complete merging simple Tx paths
  net/intel: use non-volatile stores in simple Tx function
  net/intel: align scalar simple Tx path with vector logic
  net/intel: use vector SW ring entry for simple path
  net/intel: use vector mbuf cleanup from simple scalar path
  net/idpf: enable simple Tx function

 doc/guides/nics/i40e.rst                      |  18 +
 drivers/net/intel/common/tx.h                 | 117 ++-
 drivers/net/intel/common/tx_scalar_fns.h      | 594 ++++++++++++++
 drivers/net/intel/cpfl/cpfl_rxtx.c            |  25 +-
 drivers/net/intel/i40e/i40e_fdir.c            |  34 +-
 drivers/net/intel/i40e/i40e_rxtx.c            | 673 +++-------------
 drivers/net/intel/i40e/i40e_rxtx.h            |  16 -
 .../net/intel/i40e/i40e_rxtx_vec_altivec.c    |  25 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c   |  36 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c |  52 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_common.h |   6 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_neon.c   |  25 +-
 drivers/net/intel/iavf/iavf_rxtx.c            | 642 ++++-----------
 drivers/net/intel/iavf/iavf_rxtx.h            |  30 +-
 drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c   |  55 +-
 drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c | 104 +--
 drivers/net/intel/iavf/iavf_rxtx_vec_common.h |  36 +-
 drivers/net/intel/ice/ice_dcf_ethdev.c        |  10 +-
 drivers/net/intel/ice/ice_rxtx.c              | 740 ++++--------------
 drivers/net/intel/ice/ice_rxtx.h              |  15 -
 drivers/net/intel/ice/ice_rxtx_vec_avx2.c     |  55 +-
 drivers/net/intel/ice/ice_rxtx_vec_avx512.c   |  53 +-
 drivers/net/intel/ice/ice_rxtx_vec_common.h   |  43 +-
 drivers/net/intel/idpf/idpf_common_device.h   |   2 +
 drivers/net/intel/idpf/idpf_common_rxtx.c     | 314 ++------
 drivers/net/intel/idpf/idpf_common_rxtx.h     |  24 +-
 .../net/intel/idpf/idpf_common_rxtx_avx2.c    |  53 +-
 .../net/intel/idpf/idpf_common_rxtx_avx512.c  |  55 +-
 drivers/net/intel/idpf/idpf_rxtx.c            |  43 +-
 drivers/net/intel/idpf/idpf_rxtx_vec_common.h |   6 +-
 drivers/net/intel/ixgbe/ixgbe_rxtx.c          | 103 ++-
 .../net/intel/ixgbe/ixgbe_rxtx_vec_common.c   |   3 +-
 lib/eal/include/rte_common.h                  |   6 +
 33 files changed, 1577 insertions(+), 2436 deletions(-)
 create mode 100644 drivers/net/intel/common/tx_scalar_fns.h

--
2.51.0


^ permalink raw reply	[flat|nested] 274+ messages in thread

* [PATCH v3 01/36] net/intel: create common Tx descriptor structure
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
@ 2026-01-30 11:41   ` Bruce Richardson
  2026-02-06  9:56     ` Loftus, Ciara
  2026-01-30 11:41   ` [PATCH v3 02/36] net/intel: use common Tx ring structure Bruce Richardson
                     ` (34 subsequent siblings)
  35 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-01-30 11:41 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Praveen Shetty, Vladimir Medvedkin,
	Anatoly Burakov, Jingjing Wu

The Tx descriptors used by the i40e, iavf, ice and idpf drivers are all
identical 16-byte descriptors, so define a common struct for them. Since
original struct definitions are in base code, leave them in place, but
only use the new structs in DPDK code.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h                 | 16 ++++++---
 drivers/net/intel/cpfl/cpfl_rxtx.c            |  2 +-
 drivers/net/intel/i40e/i40e_fdir.c            |  4 +--
 drivers/net/intel/i40e/i40e_rxtx.c            | 26 +++++++-------
 .../net/intel/i40e/i40e_rxtx_vec_altivec.c    |  6 ++--
 drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c   |  6 ++--
 drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c |  6 ++--
 drivers/net/intel/i40e/i40e_rxtx_vec_neon.c   |  6 ++--
 drivers/net/intel/iavf/iavf_rxtx.c            | 16 ++++-----
 drivers/net/intel/iavf/iavf_rxtx.h            |  2 +-
 drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c   |  6 ++--
 drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c | 12 +++----
 drivers/net/intel/ice/ice_dcf_ethdev.c        |  2 +-
 drivers/net/intel/ice/ice_rxtx.c              | 36 +++++++++----------
 drivers/net/intel/ice/ice_rxtx_vec_avx2.c     |  6 ++--
 drivers/net/intel/ice/ice_rxtx_vec_avx512.c   |  6 ++--
 drivers/net/intel/idpf/idpf_common_rxtx.c     | 20 +++++------
 drivers/net/intel/idpf/idpf_common_rxtx.h     |  2 +-
 .../net/intel/idpf/idpf_common_rxtx_avx2.c    |  8 ++---
 .../net/intel/idpf/idpf_common_rxtx_avx512.c  |  8 ++---
 drivers/net/intel/idpf/idpf_rxtx.c            |  2 +-
 drivers/net/intel/idpf/idpf_rxtx_vec_common.h |  2 +-
 22 files changed, 104 insertions(+), 96 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index e295d83e3a..d7561a2bbb 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -10,6 +10,14 @@
 #include <rte_ethdev.h>
 #include <rte_vect.h>
 
+/*
+ * Structure of a 16-byte Tx descriptor common across i40e, ice, iavf and idpf drivers
+ */
+struct ci_tx_desc {
+	uint64_t buffer_addr; /* Address of descriptor's data buf */
+	uint64_t cmd_type_offset_bsz;
+};
+
 /* forward declaration of the common intel (ci) queue structure */
 struct ci_tx_queue;
 
@@ -33,10 +41,10 @@ typedef void (*ice_tx_release_mbufs_t)(struct ci_tx_queue *txq);
 
 struct ci_tx_queue {
 	union { /* TX ring virtual address */
-		volatile struct i40e_tx_desc *i40e_tx_ring;
-		volatile struct iavf_tx_desc *iavf_tx_ring;
-		volatile struct ice_tx_desc *ice_tx_ring;
-		volatile struct idpf_base_tx_desc *idpf_tx_ring;
+		volatile struct ci_tx_desc *i40e_tx_ring;
+		volatile struct ci_tx_desc *iavf_tx_ring;
+		volatile struct ci_tx_desc *ice_tx_ring;
+		volatile struct ci_tx_desc *idpf_tx_ring;
 		volatile union ixgbe_adv_tx_desc *ixgbe_tx_ring;
 	};
 	volatile uint8_t *qtx_tail;               /* register address of tail */
diff --git a/drivers/net/intel/cpfl/cpfl_rxtx.c b/drivers/net/intel/cpfl/cpfl_rxtx.c
index d0438b5da0..78bc3e9b49 100644
--- a/drivers/net/intel/cpfl/cpfl_rxtx.c
+++ b/drivers/net/intel/cpfl/cpfl_rxtx.c
@@ -131,7 +131,7 @@ cpfl_dma_zone_reserve(struct rte_eth_dev *dev, uint16_t queue_idx,
 			ring_size = RTE_ALIGN(len * sizeof(struct idpf_flex_tx_sched_desc),
 					      CPFL_DMA_MEM_ALIGN);
 		else
-			ring_size = RTE_ALIGN(len * sizeof(struct idpf_base_tx_desc),
+			ring_size = RTE_ALIGN(len * sizeof(struct ci_tx_desc),
 					      CPFL_DMA_MEM_ALIGN);
 		memcpy(ring_name, "cpfl Tx ring", sizeof("cpfl Tx ring"));
 		break;
diff --git a/drivers/net/intel/i40e/i40e_fdir.c b/drivers/net/intel/i40e/i40e_fdir.c
index 55d18c5d4a..605df73c9e 100644
--- a/drivers/net/intel/i40e/i40e_fdir.c
+++ b/drivers/net/intel/i40e/i40e_fdir.c
@@ -1377,7 +1377,7 @@ i40e_find_available_buffer(struct rte_eth_dev *dev)
 	 */
 	if (fdir_info->txq_available_buf_count <= 0) {
 		uint16_t tmp_tail;
-		volatile struct i40e_tx_desc *tmp_txdp;
+		volatile struct ci_tx_desc *tmp_txdp;
 
 		tmp_tail = txq->tx_tail;
 		tmp_txdp = &txq->i40e_tx_ring[tmp_tail + 1];
@@ -1628,7 +1628,7 @@ i40e_flow_fdir_filter_programming(struct i40e_pf *pf,
 	struct ci_tx_queue *txq = pf->fdir.txq;
 	struct ci_rx_queue *rxq = pf->fdir.rxq;
 	const struct i40e_fdir_action *fdir_action = &filter->action;
-	volatile struct i40e_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	volatile struct i40e_filter_program_desc *fdirdp;
 	uint32_t td_cmd;
 	uint16_t vsi_id;
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 1c3586778c..92d49ccb79 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -388,7 +388,7 @@ static inline int
 i40e_xmit_cleanup(struct ci_tx_queue *txq)
 {
 	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	volatile struct i40e_tx_desc *txd = txq->i40e_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->i40e_tx_ring;
 	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
 	uint16_t nb_tx_desc = txq->nb_tx_desc;
 	uint16_t desc_to_clean_to;
@@ -1092,8 +1092,8 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	struct ci_tx_queue *txq;
 	struct ci_tx_entry *sw_ring;
 	struct ci_tx_entry *txe, *txn;
-	volatile struct i40e_tx_desc *txd;
-	volatile struct i40e_tx_desc *txr;
+	volatile struct ci_tx_desc *txd;
+	volatile struct ci_tx_desc *txr;
 	struct rte_mbuf *tx_pkt;
 	struct rte_mbuf *m_seg;
 	uint32_t cd_tunneling_params;
@@ -1398,7 +1398,7 @@ i40e_tx_free_bufs(struct ci_tx_queue *txq)
 
 /* Populate 4 descriptors with data from 4 mbufs */
 static inline void
-tx4(volatile struct i40e_tx_desc *txdp, struct rte_mbuf **pkts)
+tx4(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
 {
 	uint64_t dma_addr;
 	uint32_t i;
@@ -1414,7 +1414,7 @@ tx4(volatile struct i40e_tx_desc *txdp, struct rte_mbuf **pkts)
 
 /* Populate 1 descriptor with data from 1 mbuf */
 static inline void
-tx1(volatile struct i40e_tx_desc *txdp, struct rte_mbuf **pkts)
+tx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
 {
 	uint64_t dma_addr;
 
@@ -1431,7 +1431,7 @@ i40e_tx_fill_hw_ring(struct ci_tx_queue *txq,
 		     struct rte_mbuf **pkts,
 		     uint16_t nb_pkts)
 {
-	volatile struct i40e_tx_desc *txdp = &txq->i40e_tx_ring[txq->tx_tail];
+	volatile struct ci_tx_desc *txdp = &txq->i40e_tx_ring[txq->tx_tail];
 	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
 	const int N_PER_LOOP = 4;
 	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
@@ -1459,7 +1459,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
 	     uint16_t nb_pkts)
 {
-	volatile struct i40e_tx_desc *txr = txq->i40e_tx_ring;
+	volatile struct ci_tx_desc *txr = txq->i40e_tx_ring;
 	uint16_t n = 0;
 
 	/**
@@ -2616,7 +2616,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	}
 
 	/* Allocate TX hardware ring descriptors. */
-	ring_size = sizeof(struct i40e_tx_desc) * I40E_MAX_RING_DESC;
+	ring_size = sizeof(struct ci_tx_desc) * I40E_MAX_RING_DESC;
 	ring_size = RTE_ALIGN(ring_size, I40E_DMA_MEM_ALIGN);
 	tz = rte_eth_dma_zone_reserve(dev, "i40e_tx_ring", queue_idx,
 			      ring_size, I40E_RING_BASE_ALIGN, socket_id);
@@ -2640,7 +2640,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	txq->tx_deferred_start = tx_conf->tx_deferred_start;
 
 	txq->tx_ring_dma = tz->iova;
-	txq->i40e_tx_ring = (struct i40e_tx_desc *)tz->addr;
+	txq->i40e_tx_ring = (struct ci_tx_desc *)tz->addr;
 
 	/* Allocate software ring */
 	txq->sw_ring =
@@ -2913,13 +2913,13 @@ i40e_reset_tx_queue(struct ci_tx_queue *txq)
 	}
 
 	txe = txq->sw_ring;
-	size = sizeof(struct i40e_tx_desc) * txq->nb_tx_desc;
+	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
 		((volatile char *)txq->i40e_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		volatile struct i40e_tx_desc *txd = &txq->i40e_tx_ring[i];
+		volatile struct ci_tx_desc *txd = &txq->i40e_tx_ring[i];
 
 		txd->cmd_type_offset_bsz =
 			rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE);
@@ -3221,7 +3221,7 @@ i40e_fdir_setup_tx_resources(struct i40e_pf *pf)
 	}
 
 	/* Allocate TX hardware ring descriptors. */
-	ring_size = sizeof(struct i40e_tx_desc) * I40E_FDIR_NUM_TX_DESC;
+	ring_size = sizeof(struct ci_tx_desc) * I40E_FDIR_NUM_TX_DESC;
 	ring_size = RTE_ALIGN(ring_size, I40E_DMA_MEM_ALIGN);
 
 	tz = rte_eth_dma_zone_reserve(dev, "fdir_tx_ring",
@@ -3240,7 +3240,7 @@ i40e_fdir_setup_tx_resources(struct i40e_pf *pf)
 	txq->i40e_vsi = pf->fdir.fdir_vsi;
 
 	txq->tx_ring_dma = tz->iova;
-	txq->i40e_tx_ring = (struct i40e_tx_desc *)tz->addr;
+	txq->i40e_tx_ring = (struct ci_tx_desc *)tz->addr;
 
 	/*
 	 * don't need to allocate software ring and reset for the fdir
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
index bbb6d907cf..ef5b252898 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
@@ -446,7 +446,7 @@ i40e_recv_scattered_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 }
 
 static inline void
-vtx1(volatile struct i40e_tx_desc *txdp,
+vtx1(volatile struct ci_tx_desc *txdp,
 	struct rte_mbuf *pkt, uint64_t flags)
 {
 	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
@@ -459,7 +459,7 @@ vtx1(volatile struct i40e_tx_desc *txdp,
 }
 
 static inline void
-vtx(volatile struct i40e_tx_desc *txdp,
+vtx(volatile struct ci_tx_desc *txdp,
 	struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
 	int i;
@@ -473,7 +473,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 			  uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct i40e_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
index 4e398b3140..137c1f9765 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
@@ -681,7 +681,7 @@ i40e_recv_scattered_pkts_vec_avx2(void *rx_queue, struct rte_mbuf **rx_pkts,
 
 
 static inline void
-vtx1(volatile struct i40e_tx_desc *txdp,
+vtx1(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf *pkt, uint64_t flags)
 {
 	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
@@ -694,7 +694,7 @@ vtx1(volatile struct i40e_tx_desc *txdp,
 }
 
 static inline void
-vtx(volatile struct i40e_tx_desc *txdp,
+vtx(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
 	const uint64_t hi_qw_tmpl = (I40E_TX_DESC_DTYPE_DATA |
@@ -739,7 +739,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 			  uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct i40e_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
index 571987d27a..6971488750 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
@@ -750,7 +750,7 @@ i40e_recv_scattered_pkts_vec_avx512(void *rx_queue,
 }
 
 static inline void
-vtx1(volatile struct i40e_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags)
+vtx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags)
 {
 	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
 		((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT) |
@@ -762,7 +762,7 @@ vtx1(volatile struct i40e_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags)
 }
 
 static inline void
-vtx(volatile struct i40e_tx_desc *txdp,
+vtx(volatile struct ci_tx_desc *txdp,
 	struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
 	const uint64_t hi_qw_tmpl = (I40E_TX_DESC_DTYPE_DATA |
@@ -807,7 +807,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 				 uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct i40e_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
index b5be0c1b59..6404b70c56 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
@@ -597,7 +597,7 @@ i40e_recv_scattered_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 }
 
 static inline void
-vtx1(volatile struct i40e_tx_desc *txdp,
+vtx1(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf *pkt, uint64_t flags)
 {
 	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
@@ -609,7 +609,7 @@ vtx1(volatile struct i40e_tx_desc *txdp,
 }
 
 static inline void
-vtx(volatile struct i40e_tx_desc *txdp, struct rte_mbuf **pkt,
+vtx(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkt,
 		uint16_t nb_pkts,  uint64_t flags)
 {
 	int i;
@@ -623,7 +623,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict tx_queue,
 	struct rte_mbuf **__rte_restrict tx_pkts, uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct i40e_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index 4b763627bc..e4421a9932 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -267,7 +267,7 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	}
 
 	txe = txq->sw_ring;
-	size = sizeof(struct iavf_tx_desc) * txq->nb_tx_desc;
+	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
 		((volatile char *)txq->iavf_tx_ring)[i] = 0;
 
@@ -827,7 +827,7 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	}
 
 	/* Allocate TX hardware ring descriptors. */
-	ring_size = sizeof(struct iavf_tx_desc) * IAVF_MAX_RING_DESC;
+	ring_size = sizeof(struct ci_tx_desc) * IAVF_MAX_RING_DESC;
 	ring_size = RTE_ALIGN(ring_size, IAVF_DMA_MEM_ALIGN);
 	mz = rte_eth_dma_zone_reserve(dev, "iavf_tx_ring", queue_idx,
 				      ring_size, IAVF_RING_BASE_ALIGN,
@@ -839,7 +839,7 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
 		return -ENOMEM;
 	}
 	txq->tx_ring_dma = mz->iova;
-	txq->iavf_tx_ring = (struct iavf_tx_desc *)mz->addr;
+	txq->iavf_tx_ring = (struct ci_tx_desc *)mz->addr;
 
 	txq->mz = mz;
 	reset_tx_queue(txq);
@@ -2333,7 +2333,7 @@ iavf_xmit_cleanup(struct ci_tx_queue *txq)
 	uint16_t desc_to_clean_to;
 	uint16_t nb_tx_to_clean;
 
-	volatile struct iavf_tx_desc *txd = txq->iavf_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->iavf_tx_ring;
 
 	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
 	if (desc_to_clean_to >= nb_tx_desc)
@@ -2723,7 +2723,7 @@ iavf_calc_pkt_desc(struct rte_mbuf *tx_pkt)
 }
 
 static inline void
-iavf_fill_data_desc(volatile struct iavf_tx_desc *desc,
+iavf_fill_data_desc(volatile struct ci_tx_desc *desc,
 	uint64_t desc_template,	uint16_t buffsz,
 	uint64_t buffer_addr)
 {
@@ -2756,7 +2756,7 @@ uint16_t
 iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = tx_queue;
-	volatile struct iavf_tx_desc *txr = txq->iavf_tx_ring;
+	volatile struct ci_tx_desc *txr = txq->iavf_tx_ring;
 	struct ci_tx_entry *txe_ring = txq->sw_ring;
 	struct ci_tx_entry *txe, *txn;
 	struct rte_mbuf *mb, *mb_seg;
@@ -2774,7 +2774,7 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	txe = &txe_ring[desc_idx];
 
 	for (idx = 0; idx < nb_pkts; idx++) {
-		volatile struct iavf_tx_desc *ddesc;
+		volatile struct ci_tx_desc *ddesc;
 		struct iavf_ipsec_crypto_pkt_metadata *ipsec_md;
 
 		uint16_t nb_desc_ctx, nb_desc_ipsec;
@@ -2895,7 +2895,7 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		mb_seg = mb;
 
 		do {
-			ddesc = (volatile struct iavf_tx_desc *)
+			ddesc = (volatile struct ci_tx_desc *)
 					&txr[desc_idx];
 
 			txn = &txe_ring[txe->next_id];
diff --git a/drivers/net/intel/iavf/iavf_rxtx.h b/drivers/net/intel/iavf/iavf_rxtx.h
index e1f78dcde0..dd6d884fc1 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.h
+++ b/drivers/net/intel/iavf/iavf_rxtx.h
@@ -678,7 +678,7 @@ void iavf_dump_tx_descriptor(const struct ci_tx_queue *txq,
 			    const volatile void *desc, uint16_t tx_id)
 {
 	const char *name;
-	const volatile struct iavf_tx_desc *tx_desc = desc;
+	const volatile struct ci_tx_desc *tx_desc = desc;
 	enum iavf_tx_desc_dtype_value type;
 
 
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
index e29958e0bc..5b62d51cf7 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
@@ -1630,7 +1630,7 @@ iavf_recv_scattered_pkts_vec_avx2_flex_rxd_offload(void *rx_queue,
 
 
 static __rte_always_inline void
-iavf_vtx1(volatile struct iavf_tx_desc *txdp,
+iavf_vtx1(volatile struct ci_tx_desc *txdp,
 	  struct rte_mbuf *pkt, uint64_t flags, bool offload, uint8_t vlan_flag)
 {
 	uint64_t high_qw =
@@ -1646,7 +1646,7 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp,
 }
 
 static __rte_always_inline void
-iavf_vtx(volatile struct iavf_tx_desc *txdp,
+iavf_vtx(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags, bool offload, uint8_t vlan_flag)
 {
 	const uint64_t hi_qw_tmpl = (IAVF_TX_DESC_DTYPE_DATA |
@@ -1713,7 +1713,7 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 			       uint16_t nb_pkts, bool offload)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct iavf_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	/* bit2 is reserved and must be set to 1 according to Spec */
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c b/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
index 7c0907b7cf..d79d96c7b7 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
@@ -1840,7 +1840,7 @@ tx_backlog_entry_avx512(struct ci_tx_entry_vec *txep,
 }
 
 static __rte_always_inline void
-iavf_vtx1(volatile struct iavf_tx_desc *txdp,
+iavf_vtx1(volatile struct ci_tx_desc *txdp,
 	  struct rte_mbuf *pkt, uint64_t flags,
 	  bool offload, uint8_t vlan_flag)
 {
@@ -1859,7 +1859,7 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp,
 #define IAVF_TX_LEN_MASK 0xAA
 #define IAVF_TX_OFF_MASK 0x55
 static __rte_always_inline void
-iavf_vtx(volatile struct iavf_tx_desc *txdp,
+iavf_vtx(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags,
 		bool offload, uint8_t vlan_flag)
 {
@@ -2068,7 +2068,7 @@ iavf_fill_ctx_desc_tunnelling_field(volatile uint64_t *qw0,
 }
 
 static __rte_always_inline void
-ctx_vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt,
+ctx_vtx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf *pkt,
 		uint64_t flags, bool offload, uint8_t vlan_flag)
 {
 	uint64_t high_ctx_qw = IAVF_TX_DESC_DTYPE_CONTEXT;
@@ -2106,7 +2106,7 @@ ctx_vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt,
 }
 
 static __rte_always_inline void
-ctx_vtx(volatile struct iavf_tx_desc *txdp,
+ctx_vtx(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags,
 		bool offload, uint8_t vlan_flag)
 {
@@ -2203,7 +2203,7 @@ iavf_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 				 uint16_t nb_pkts, bool offload)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct iavf_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	/* bit2 is reserved and must be set to 1 according to Spec */
@@ -2271,7 +2271,7 @@ iavf_xmit_fixed_burst_vec_avx512_ctx(void *tx_queue, struct rte_mbuf **tx_pkts,
 				 uint16_t nb_pkts, bool offload)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct iavf_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, nb_mbuf, tx_id;
 	/* bit2 is reserved and must be set to 1 according to Spec */
diff --git a/drivers/net/intel/ice/ice_dcf_ethdev.c b/drivers/net/intel/ice/ice_dcf_ethdev.c
index 81da5a4656..ab1d499cef 100644
--- a/drivers/net/intel/ice/ice_dcf_ethdev.c
+++ b/drivers/net/intel/ice/ice_dcf_ethdev.c
@@ -399,7 +399,7 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	}
 
 	txe = txq->sw_ring;
-	size = sizeof(struct ice_tx_desc) * txq->nb_tx_desc;
+	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
 		((volatile char *)txq->ice_tx_ring)[i] = 0;
 
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index f3bc79423d..74b80e7df3 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -1115,13 +1115,13 @@ ice_reset_tx_queue(struct ci_tx_queue *txq)
 	}
 
 	txe = txq->sw_ring;
-	size = sizeof(struct ice_tx_desc) * txq->nb_tx_desc;
+	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
 		((volatile char *)txq->ice_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		volatile struct ice_tx_desc *txd = &txq->ice_tx_ring[i];
+		volatile struct ci_tx_desc *txd = &txq->ice_tx_ring[i];
 
 		txd->cmd_type_offset_bsz =
 			rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE);
@@ -1623,7 +1623,7 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
 	}
 
 	/* Allocate TX hardware ring descriptors. */
-	ring_size = sizeof(struct ice_tx_desc) * ICE_MAX_NUM_DESC_BY_MAC(hw);
+	ring_size = sizeof(struct ci_tx_desc) * ICE_MAX_NUM_DESC_BY_MAC(hw);
 	ring_size = RTE_ALIGN(ring_size, ICE_DMA_MEM_ALIGN);
 	tz = rte_eth_dma_zone_reserve(dev, "ice_tx_ring", queue_idx,
 				      ring_size, ICE_RING_BASE_ALIGN,
@@ -2619,7 +2619,7 @@ ice_fdir_setup_tx_resources(struct ice_pf *pf)
 	}
 
 	/* Allocate TX hardware ring descriptors. */
-	ring_size = sizeof(struct ice_tx_desc) * ICE_FDIR_NUM_TX_DESC;
+	ring_size = sizeof(struct ci_tx_desc) * ICE_FDIR_NUM_TX_DESC;
 	ring_size = RTE_ALIGN(ring_size, ICE_DMA_MEM_ALIGN);
 
 	tz = rte_eth_dma_zone_reserve(dev, "fdir_tx_ring",
@@ -2638,7 +2638,7 @@ ice_fdir_setup_tx_resources(struct ice_pf *pf)
 	txq->ice_vsi = pf->fdir.fdir_vsi;
 
 	txq->tx_ring_dma = tz->iova;
-	txq->ice_tx_ring = (struct ice_tx_desc *)tz->addr;
+	txq->ice_tx_ring = (struct ci_tx_desc *)tz->addr;
 	/*
 	 * don't need to allocate software ring and reset for the fdir
 	 * program queue just set the queue has been configured.
@@ -3027,7 +3027,7 @@ static inline int
 ice_xmit_cleanup(struct ci_tx_queue *txq)
 {
 	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	volatile struct ice_tx_desc *txd = txq->ice_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->ice_tx_ring;
 	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
 	uint16_t nb_tx_desc = txq->nb_tx_desc;
 	uint16_t desc_to_clean_to;
@@ -3148,8 +3148,8 @@ uint16_t
 ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq;
-	volatile struct ice_tx_desc *ice_tx_ring;
-	volatile struct ice_tx_desc *txd;
+	volatile struct ci_tx_desc *ice_tx_ring;
+	volatile struct ci_tx_desc *txd;
 	struct ci_tx_entry *sw_ring;
 	struct ci_tx_entry *txe, *txn;
 	struct rte_mbuf *tx_pkt;
@@ -3312,7 +3312,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 			while ((ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) &&
 				unlikely(slen > ICE_MAX_DATA_PER_TXD)) {
-				txd->buf_addr = rte_cpu_to_le_64(buf_dma_addr);
+				txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
 				txd->cmd_type_offset_bsz =
 				rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DATA |
 				((uint64_t)td_cmd << ICE_TXD_QW1_CMD_S) |
@@ -3331,7 +3331,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 				txn = &sw_ring[txe->next_id];
 			}
 
-			txd->buf_addr = rte_cpu_to_le_64(buf_dma_addr);
+			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
 			txd->cmd_type_offset_bsz =
 				rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DATA |
 				((uint64_t)td_cmd << ICE_TXD_QW1_CMD_S) |
@@ -3563,14 +3563,14 @@ ice_tx_done_cleanup(void *txq, uint32_t free_cnt)
 
 /* Populate 4 descriptors with data from 4 mbufs */
 static inline void
-tx4(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkts)
+tx4(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
 {
 	uint64_t dma_addr;
 	uint32_t i;
 
 	for (i = 0; i < 4; i++, txdp++, pkts++) {
 		dma_addr = rte_mbuf_data_iova(*pkts);
-		txdp->buf_addr = rte_cpu_to_le_64(dma_addr);
+		txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
 		txdp->cmd_type_offset_bsz =
 			ice_build_ctob((uint32_t)ICE_TD_CMD, 0,
 				       (*pkts)->data_len, 0);
@@ -3579,12 +3579,12 @@ tx4(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkts)
 
 /* Populate 1 descriptor with data from 1 mbuf */
 static inline void
-tx1(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkts)
+tx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
 {
 	uint64_t dma_addr;
 
 	dma_addr = rte_mbuf_data_iova(*pkts);
-	txdp->buf_addr = rte_cpu_to_le_64(dma_addr);
+	txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
 	txdp->cmd_type_offset_bsz =
 		ice_build_ctob((uint32_t)ICE_TD_CMD, 0,
 			       (*pkts)->data_len, 0);
@@ -3594,7 +3594,7 @@ static inline void
 ice_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
 		    uint16_t nb_pkts)
 {
-	volatile struct ice_tx_desc *txdp = &txq->ice_tx_ring[txq->tx_tail];
+	volatile struct ci_tx_desc *txdp = &txq->ice_tx_ring[txq->tx_tail];
 	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
 	const int N_PER_LOOP = 4;
 	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
@@ -3627,7 +3627,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
 	     uint16_t nb_pkts)
 {
-	volatile struct ice_tx_desc *txr = txq->ice_tx_ring;
+	volatile struct ci_tx_desc *txr = txq->ice_tx_ring;
 	uint16_t n = 0;
 
 	/**
@@ -4882,7 +4882,7 @@ ice_fdir_programming(struct ice_pf *pf, struct ice_fltr_desc *fdir_desc)
 	struct ci_tx_queue *txq = pf->fdir.txq;
 	struct ci_rx_queue *rxq = pf->fdir.rxq;
 	volatile struct ice_fltr_desc *fdirdp;
-	volatile struct ice_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	uint32_t td_cmd;
 	uint16_t i;
 
@@ -4892,7 +4892,7 @@ ice_fdir_programming(struct ice_pf *pf, struct ice_fltr_desc *fdir_desc)
 	fdirdp->dtype_cmd_vsi_fdid = fdir_desc->dtype_cmd_vsi_fdid;
 
 	txdp = &txq->ice_tx_ring[txq->tx_tail + 1];
-	txdp->buf_addr = rte_cpu_to_le_64(pf->fdir.dma_addr);
+	txdp->buffer_addr = rte_cpu_to_le_64(pf->fdir.dma_addr);
 	td_cmd = ICE_TX_DESC_CMD_EOP |
 		ICE_TX_DESC_CMD_RS  |
 		ICE_TX_DESC_CMD_DUMMY;
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
index 0ba1d557ca..bef7bb00ba 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
@@ -774,7 +774,7 @@ ice_recv_scattered_pkts_vec_avx2_offload(void *rx_queue,
 }
 
 static __rte_always_inline void
-ice_vtx1(volatile struct ice_tx_desc *txdp,
+ice_vtx1(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf *pkt, uint64_t flags, bool offload)
 {
 	uint64_t high_qw =
@@ -789,7 +789,7 @@ ice_vtx1(volatile struct ice_tx_desc *txdp,
 }
 
 static __rte_always_inline void
-ice_vtx(volatile struct ice_tx_desc *txdp,
+ice_vtx(volatile struct ci_tx_desc *txdp,
 	struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags, bool offload)
 {
 	const uint64_t hi_qw_tmpl = (ICE_TX_DESC_DTYPE_DATA |
@@ -852,7 +852,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 			      uint16_t nb_pkts, bool offload)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct ice_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = ICE_TD_CMD;
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
index 7c6fe82072..1f6bf5fc8e 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
@@ -847,7 +847,7 @@ ice_recv_scattered_pkts_vec_avx512_offload(void *rx_queue,
 }
 
 static __rte_always_inline void
-ice_vtx1(volatile struct ice_tx_desc *txdp,
+ice_vtx1(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf *pkt, uint64_t flags, bool do_offload)
 {
 	uint64_t high_qw =
@@ -863,7 +863,7 @@ ice_vtx1(volatile struct ice_tx_desc *txdp,
 }
 
 static __rte_always_inline void
-ice_vtx(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkt,
+ice_vtx(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkt,
 	uint16_t nb_pkts,  uint64_t flags, bool do_offload)
 {
 	const uint64_t hi_qw_tmpl = (ICE_TX_DESC_DTYPE_DATA |
@@ -916,7 +916,7 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 				uint16_t nb_pkts, bool do_offload)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct ice_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = ICE_TD_CMD;
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 797ee515dd..be3c1ef216 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -264,13 +264,13 @@ idpf_qc_single_tx_queue_reset(struct ci_tx_queue *txq)
 	}
 
 	txe = txq->sw_ring;
-	size = sizeof(struct idpf_base_tx_desc) * txq->nb_tx_desc;
+	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
 		((volatile char *)txq->idpf_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		txq->idpf_tx_ring[i].qw1 =
+		txq->idpf_tx_ring[i].cmd_type_offset_bsz =
 			rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
@@ -1335,14 +1335,14 @@ idpf_xmit_cleanup(struct ci_tx_queue *txq)
 	uint16_t desc_to_clean_to;
 	uint16_t nb_tx_to_clean;
 
-	volatile struct idpf_base_tx_desc *txd = txq->idpf_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->idpf_tx_ring;
 
 	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
 	if (desc_to_clean_to >= nb_tx_desc)
 		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
 
 	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-	if ((txd[desc_to_clean_to].qw1 &
+	if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
 	     rte_cpu_to_le_64(IDPF_TXD_QW1_DTYPE_M)) !=
 	    rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE)) {
 		TX_LOG(DEBUG, "TX descriptor %4u is not done "
@@ -1358,7 +1358,7 @@ idpf_xmit_cleanup(struct ci_tx_queue *txq)
 		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
 					    last_desc_cleaned);
 
-	txd[desc_to_clean_to].qw1 = 0;
+	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
 
 	txq->last_desc_cleaned = desc_to_clean_to;
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
@@ -1372,8 +1372,8 @@ uint16_t
 idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			  uint16_t nb_pkts)
 {
-	volatile struct idpf_base_tx_desc *txd;
-	volatile struct idpf_base_tx_desc *txr;
+	volatile struct ci_tx_desc *txd;
+	volatile struct ci_tx_desc *txr;
 	union idpf_tx_offload tx_offload = {0};
 	struct ci_tx_entry *txe, *txn;
 	struct ci_tx_entry *sw_ring;
@@ -1491,8 +1491,8 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			/* Setup TX Descriptor */
 			slen = m_seg->data_len;
 			buf_dma_addr = rte_mbuf_data_iova(m_seg);
-			txd->buf_addr = rte_cpu_to_le_64(buf_dma_addr);
-			txd->qw1 = rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DATA |
+			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
+			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DATA |
 				((uint64_t)td_cmd  << IDPF_TXD_QW1_CMD_S) |
 				((uint64_t)td_offset << IDPF_TXD_QW1_OFFSET_S) |
 				((uint64_t)slen << IDPF_TXD_QW1_TX_BUF_SZ_S));
@@ -1519,7 +1519,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			txq->nb_tx_used = 0;
 		}
 
-		txd->qw1 |= rte_cpu_to_le_16(td_cmd << IDPF_TXD_QW1_CMD_S);
+		txd->cmd_type_offset_bsz |= rte_cpu_to_le_16(td_cmd << IDPF_TXD_QW1_CMD_S);
 	}
 
 end_of_tx:
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.h b/drivers/net/intel/idpf/idpf_common_rxtx.h
index 7c6ff5d047..2f2fa153b2 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.h
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.h
@@ -182,7 +182,7 @@ union idpf_tx_offload {
 };
 
 union idpf_tx_desc {
-	struct idpf_base_tx_desc *tx_ring;
+	struct ci_tx_desc *tx_ring;
 	struct idpf_flex_tx_sched_desc *desc_ring;
 	struct idpf_splitq_tx_compl_desc *compl_ring;
 };
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c b/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
index 21c8f79254..5f5d538dcb 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
@@ -483,7 +483,7 @@ idpf_dp_singleq_recv_pkts_avx2(void *rx_queue, struct rte_mbuf **rx_pkts, uint16
 }
 
 static inline void
-idpf_singleq_vtx1(volatile struct idpf_base_tx_desc *txdp,
+idpf_singleq_vtx1(volatile struct ci_tx_desc *txdp,
 		  struct rte_mbuf *pkt, uint64_t flags)
 {
 	uint64_t high_qw =
@@ -497,7 +497,7 @@ idpf_singleq_vtx1(volatile struct idpf_base_tx_desc *txdp,
 }
 
 static inline void
-idpf_singleq_vtx(volatile struct idpf_base_tx_desc *txdp,
+idpf_singleq_vtx(volatile struct ci_tx_desc *txdp,
 		 struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
 	const uint64_t hi_qw_tmpl = (IDPF_TX_DESC_DTYPE_DATA |
@@ -556,7 +556,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts
 				       uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct idpf_base_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = IDPF_TX_DESC_CMD_EOP;
@@ -604,7 +604,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->idpf_tx_ring[txq->tx_next_rs].qw1 |=
+		txq->idpf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)IDPF_TX_DESC_CMD_RS) <<
 					 IDPF_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
index bc2cadd738..c1ec3d1222 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
@@ -1000,7 +1000,7 @@ idpf_dp_splitq_recv_pkts_avx512(void *rx_queue, struct rte_mbuf **rx_pkts,
 }
 
 static __rte_always_inline void
-idpf_singleq_vtx1(volatile struct idpf_base_tx_desc *txdp,
+idpf_singleq_vtx1(volatile struct ci_tx_desc *txdp,
 	  struct rte_mbuf *pkt, uint64_t flags)
 {
 	uint64_t high_qw =
@@ -1016,7 +1016,7 @@ idpf_singleq_vtx1(volatile struct idpf_base_tx_desc *txdp,
 #define IDPF_TX_LEN_MASK 0xAA
 #define IDPF_TX_OFF_MASK 0x55
 static __rte_always_inline void
-idpf_singleq_vtx(volatile struct idpf_base_tx_desc *txdp,
+idpf_singleq_vtx(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
 	const uint64_t hi_qw_tmpl = (IDPF_TX_DESC_DTYPE_DATA  |
@@ -1072,7 +1072,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pk
 					 uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = tx_queue;
-	volatile struct idpf_base_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = IDPF_TX_DESC_CMD_EOP;
@@ -1123,7 +1123,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pk
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->idpf_tx_ring[txq->tx_next_rs].qw1 |=
+		txq->idpf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)IDPF_TX_DESC_CMD_RS) <<
 					 IDPF_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/idpf/idpf_rxtx.c b/drivers/net/intel/idpf/idpf_rxtx.c
index cee454244f..8aa44585fe 100644
--- a/drivers/net/intel/idpf/idpf_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_rxtx.c
@@ -72,7 +72,7 @@ idpf_dma_zone_reserve(struct rte_eth_dev *dev, uint16_t queue_idx,
 			ring_size = RTE_ALIGN(len * sizeof(struct idpf_flex_tx_sched_desc),
 					      IDPF_DMA_MEM_ALIGN);
 		else
-			ring_size = RTE_ALIGN(len * sizeof(struct idpf_base_tx_desc),
+			ring_size = RTE_ALIGN(len * sizeof(struct ci_tx_desc),
 					      IDPF_DMA_MEM_ALIGN);
 		rte_memcpy(ring_name, "idpf Tx ring", sizeof("idpf Tx ring"));
 		break;
diff --git a/drivers/net/intel/idpf/idpf_rxtx_vec_common.h b/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
index 425f0792a1..4702061484 100644
--- a/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
+++ b/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
@@ -31,7 +31,7 @@ idpf_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 	if (txq->complq != NULL)
 		return 1;
 
-	return (txq->idpf_tx_ring[idx].qw1 &
+	return (txq->idpf_tx_ring[idx].cmd_type_offset_bsz &
 			rte_cpu_to_le_64(IDPF_TXD_QW1_DTYPE_M)) ==
 				rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE);
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v3 02/36] net/intel: use common Tx ring structure
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
  2026-01-30 11:41   ` [PATCH v3 01/36] net/intel: create common Tx descriptor structure Bruce Richardson
@ 2026-01-30 11:41   ` Bruce Richardson
  2026-02-06  9:59     ` Loftus, Ciara
  2026-01-30 11:41   ` [PATCH v3 03/36] net/intel: create common post-Tx cleanup function Bruce Richardson
                     ` (33 subsequent siblings)
  35 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-01-30 11:41 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Praveen Shetty, Vladimir Medvedkin,
	Anatoly Burakov, Jingjing Wu

Rather than having separate per-driver ring pointers in a union, since
we now have a common descriptor type, we can merge all but the ixgbe
pointer into one value.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h                 |  5 +--
 drivers/net/intel/cpfl/cpfl_rxtx.c            |  2 +-
 drivers/net/intel/i40e/i40e_fdir.c            |  6 ++--
 drivers/net/intel/i40e/i40e_rxtx.c            | 22 ++++++------
 .../net/intel/i40e/i40e_rxtx_vec_altivec.c    |  6 ++--
 drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c   |  6 ++--
 drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c |  6 ++--
 drivers/net/intel/i40e/i40e_rxtx_vec_common.h |  2 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_neon.c   |  6 ++--
 drivers/net/intel/iavf/iavf_rxtx.c            | 14 ++++----
 drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c   |  6 ++--
 drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c | 12 +++----
 drivers/net/intel/iavf/iavf_rxtx_vec_common.h |  2 +-
 drivers/net/intel/ice/ice_dcf_ethdev.c        |  4 +--
 drivers/net/intel/ice/ice_rxtx.c              | 34 +++++++++----------
 drivers/net/intel/ice/ice_rxtx_vec_avx2.c     |  6 ++--
 drivers/net/intel/ice/ice_rxtx_vec_avx512.c   |  6 ++--
 drivers/net/intel/ice/ice_rxtx_vec_common.h   |  2 +-
 drivers/net/intel/idpf/idpf_common_rxtx.c     |  8 ++---
 .../net/intel/idpf/idpf_common_rxtx_avx2.c    |  6 ++--
 .../net/intel/idpf/idpf_common_rxtx_avx512.c  |  6 ++--
 drivers/net/intel/idpf/idpf_rxtx.c            |  2 +-
 drivers/net/intel/idpf/idpf_rxtx_vec_common.h |  2 +-
 23 files changed, 84 insertions(+), 87 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index d7561a2bbb..8cf63e59ab 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -41,10 +41,7 @@ typedef void (*ice_tx_release_mbufs_t)(struct ci_tx_queue *txq);
 
 struct ci_tx_queue {
 	union { /* TX ring virtual address */
-		volatile struct ci_tx_desc *i40e_tx_ring;
-		volatile struct ci_tx_desc *iavf_tx_ring;
-		volatile struct ci_tx_desc *ice_tx_ring;
-		volatile struct ci_tx_desc *idpf_tx_ring;
+		volatile struct ci_tx_desc *ci_tx_ring;
 		volatile union ixgbe_adv_tx_desc *ixgbe_tx_ring;
 	};
 	volatile uint8_t *qtx_tail;               /* register address of tail */
diff --git a/drivers/net/intel/cpfl/cpfl_rxtx.c b/drivers/net/intel/cpfl/cpfl_rxtx.c
index 78bc3e9b49..bc5bec65f0 100644
--- a/drivers/net/intel/cpfl/cpfl_rxtx.c
+++ b/drivers/net/intel/cpfl/cpfl_rxtx.c
@@ -606,7 +606,7 @@ cpfl_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 	}
 
 	if (!is_splitq) {
-		txq->idpf_tx_ring = mz->addr;
+		txq->ci_tx_ring = mz->addr;
 		idpf_qc_single_tx_queue_reset(txq);
 	} else {
 		txq->desc_ring = mz->addr;
diff --git a/drivers/net/intel/i40e/i40e_fdir.c b/drivers/net/intel/i40e/i40e_fdir.c
index 605df73c9e..8a01aec0e2 100644
--- a/drivers/net/intel/i40e/i40e_fdir.c
+++ b/drivers/net/intel/i40e/i40e_fdir.c
@@ -1380,7 +1380,7 @@ i40e_find_available_buffer(struct rte_eth_dev *dev)
 		volatile struct ci_tx_desc *tmp_txdp;
 
 		tmp_tail = txq->tx_tail;
-		tmp_txdp = &txq->i40e_tx_ring[tmp_tail + 1];
+		tmp_txdp = &txq->ci_tx_ring[tmp_tail + 1];
 
 		do {
 			if ((tmp_txdp->cmd_type_offset_bsz &
@@ -1637,7 +1637,7 @@ i40e_flow_fdir_filter_programming(struct i40e_pf *pf,
 
 	PMD_DRV_LOG(INFO, "filling filter programming descriptor.");
 	fdirdp = (volatile struct i40e_filter_program_desc *)
-				(&txq->i40e_tx_ring[txq->tx_tail]);
+				(&txq->ci_tx_ring[txq->tx_tail]);
 
 	fdirdp->qindex_flex_ptype_vsi =
 			rte_cpu_to_le_32((fdir_action->rx_queue <<
@@ -1707,7 +1707,7 @@ i40e_flow_fdir_filter_programming(struct i40e_pf *pf,
 	fdirdp->fd_id = rte_cpu_to_le_32(filter->soft_id);
 
 	PMD_DRV_LOG(INFO, "filling transmit descriptor.");
-	txdp = &txq->i40e_tx_ring[txq->tx_tail + 1];
+	txdp = &txq->ci_tx_ring[txq->tx_tail + 1];
 	txdp->buffer_addr = rte_cpu_to_le_64(pf->fdir.dma_addr[txq->tx_tail >> 1]);
 
 	td_cmd = I40E_TX_DESC_CMD_EOP |
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 92d49ccb79..210fc0201e 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -388,7 +388,7 @@ static inline int
 i40e_xmit_cleanup(struct ci_tx_queue *txq)
 {
 	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	volatile struct ci_tx_desc *txd = txq->i40e_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
 	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
 	uint16_t nb_tx_desc = txq->nb_tx_desc;
 	uint16_t desc_to_clean_to;
@@ -1112,7 +1112,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 	txq = tx_queue;
 	sw_ring = txq->sw_ring;
-	txr = txq->i40e_tx_ring;
+	txr = txq->ci_tx_ring;
 	tx_id = txq->tx_tail;
 	txe = &sw_ring[tx_id];
 
@@ -1347,7 +1347,7 @@ i40e_tx_free_bufs(struct ci_tx_queue *txq)
 	const uint16_t k = RTE_ALIGN_FLOOR(tx_rs_thresh, I40E_TX_MAX_FREE_BUF_SZ);
 	const uint16_t m = tx_rs_thresh % I40E_TX_MAX_FREE_BUF_SZ;
 
-	if ((txq->i40e_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
+	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
 			rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) !=
 			rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE))
 		return 0;
@@ -1431,7 +1431,7 @@ i40e_tx_fill_hw_ring(struct ci_tx_queue *txq,
 		     struct rte_mbuf **pkts,
 		     uint16_t nb_pkts)
 {
-	volatile struct ci_tx_desc *txdp = &txq->i40e_tx_ring[txq->tx_tail];
+	volatile struct ci_tx_desc *txdp = &txq->ci_tx_ring[txq->tx_tail];
 	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
 	const int N_PER_LOOP = 4;
 	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
@@ -1459,7 +1459,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
 	     uint16_t nb_pkts)
 {
-	volatile struct ci_tx_desc *txr = txq->i40e_tx_ring;
+	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
 	uint16_t n = 0;
 
 	/**
@@ -2421,7 +2421,7 @@ i40e_dev_tx_descriptor_status(void *tx_queue, uint16_t offset)
 			desc -= txq->nb_tx_desc;
 	}
 
-	status = &txq->i40e_tx_ring[desc].cmd_type_offset_bsz;
+	status = &txq->ci_tx_ring[desc].cmd_type_offset_bsz;
 	mask = rte_le_to_cpu_64(I40E_TXD_QW1_DTYPE_MASK);
 	expect = rte_cpu_to_le_64(
 		I40E_TX_DESC_DTYPE_DESC_DONE << I40E_TXD_QW1_DTYPE_SHIFT);
@@ -2618,7 +2618,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	/* Allocate TX hardware ring descriptors. */
 	ring_size = sizeof(struct ci_tx_desc) * I40E_MAX_RING_DESC;
 	ring_size = RTE_ALIGN(ring_size, I40E_DMA_MEM_ALIGN);
-	tz = rte_eth_dma_zone_reserve(dev, "i40e_tx_ring", queue_idx,
+	tz = rte_eth_dma_zone_reserve(dev, "ci_tx_ring", queue_idx,
 			      ring_size, I40E_RING_BASE_ALIGN, socket_id);
 	if (!tz) {
 		i40e_tx_queue_release(txq);
@@ -2640,7 +2640,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	txq->tx_deferred_start = tx_conf->tx_deferred_start;
 
 	txq->tx_ring_dma = tz->iova;
-	txq->i40e_tx_ring = (struct ci_tx_desc *)tz->addr;
+	txq->ci_tx_ring = (struct ci_tx_desc *)tz->addr;
 
 	/* Allocate software ring */
 	txq->sw_ring =
@@ -2915,11 +2915,11 @@ i40e_reset_tx_queue(struct ci_tx_queue *txq)
 	txe = txq->sw_ring;
 	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
-		((volatile char *)txq->i40e_tx_ring)[i] = 0;
+		((volatile char *)txq->ci_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		volatile struct ci_tx_desc *txd = &txq->i40e_tx_ring[i];
+		volatile struct ci_tx_desc *txd = &txq->ci_tx_ring[i];
 
 		txd->cmd_type_offset_bsz =
 			rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE);
@@ -3240,7 +3240,7 @@ i40e_fdir_setup_tx_resources(struct i40e_pf *pf)
 	txq->i40e_vsi = pf->fdir.fdir_vsi;
 
 	txq->tx_ring_dma = tz->iova;
-	txq->i40e_tx_ring = (struct ci_tx_desc *)tz->addr;
+	txq->ci_tx_ring = (struct ci_tx_desc *)tz->addr;
 
 	/*
 	 * don't need to allocate software ring and reset for the fdir
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
index ef5b252898..81e9e2bc0b 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
@@ -489,7 +489,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->i40e_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = &txq->sw_ring_vec[tx_id];
 
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
@@ -509,7 +509,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->i40e_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = &txq->sw_ring_vec[tx_id];
 	}
 
@@ -519,7 +519,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->i40e_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
 						I40E_TXD_QW1_CMD_SHIFT);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
index 137c1f9765..f054bd41bf 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
@@ -753,7 +753,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->i40e_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = &txq->sw_ring_vec[tx_id];
 
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
@@ -774,7 +774,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->i40e_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = &txq->sw_ring_vec[tx_id];
 	}
 
@@ -784,7 +784,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->i40e_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
 						I40E_TXD_QW1_CMD_SHIFT);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
index 6971488750..9a967faeee 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
@@ -821,7 +821,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->i40e_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = (void *)txq->sw_ring;
 	txep += tx_id;
 
@@ -843,7 +843,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = txq->i40e_tx_ring;
+		txdp = txq->ci_tx_ring;
 		txep = (void *)txq->sw_ring;
 	}
 
@@ -853,7 +853,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->i40e_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
 						I40E_TXD_QW1_CMD_SHIFT);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_common.h b/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
index 14651f2f06..1fd7fc75bf 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
@@ -15,7 +15,7 @@
 static inline int
 i40e_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 {
-	return (txq->i40e_tx_ring[idx].cmd_type_offset_bsz &
+	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
 			rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) ==
 				rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE);
 }
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
index 6404b70c56..0b95152232 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
@@ -638,7 +638,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict tx_queue,
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->i40e_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = &txq->sw_ring_vec[tx_id];
 
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
@@ -658,7 +658,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict tx_queue,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->i40e_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = &txq->sw_ring_vec[tx_id];
 	}
 
@@ -668,7 +668,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict tx_queue,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->i40e_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
 						I40E_TXD_QW1_CMD_SHIFT);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index e4421a9932..807bc92a45 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -269,11 +269,11 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	txe = txq->sw_ring;
 	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
-		((volatile char *)txq->iavf_tx_ring)[i] = 0;
+		((volatile char *)txq->ci_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		txq->iavf_tx_ring[i].cmd_type_offset_bsz =
+		txq->ci_tx_ring[i].cmd_type_offset_bsz =
 			rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
@@ -829,7 +829,7 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	/* Allocate TX hardware ring descriptors. */
 	ring_size = sizeof(struct ci_tx_desc) * IAVF_MAX_RING_DESC;
 	ring_size = RTE_ALIGN(ring_size, IAVF_DMA_MEM_ALIGN);
-	mz = rte_eth_dma_zone_reserve(dev, "iavf_tx_ring", queue_idx,
+	mz = rte_eth_dma_zone_reserve(dev, "ci_tx_ring", queue_idx,
 				      ring_size, IAVF_RING_BASE_ALIGN,
 				      socket_id);
 	if (!mz) {
@@ -839,7 +839,7 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
 		return -ENOMEM;
 	}
 	txq->tx_ring_dma = mz->iova;
-	txq->iavf_tx_ring = (struct ci_tx_desc *)mz->addr;
+	txq->ci_tx_ring = (struct ci_tx_desc *)mz->addr;
 
 	txq->mz = mz;
 	reset_tx_queue(txq);
@@ -2333,7 +2333,7 @@ iavf_xmit_cleanup(struct ci_tx_queue *txq)
 	uint16_t desc_to_clean_to;
 	uint16_t nb_tx_to_clean;
 
-	volatile struct ci_tx_desc *txd = txq->iavf_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
 
 	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
 	if (desc_to_clean_to >= nb_tx_desc)
@@ -2756,7 +2756,7 @@ uint16_t
 iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = tx_queue;
-	volatile struct ci_tx_desc *txr = txq->iavf_tx_ring;
+	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
 	struct ci_tx_entry *txe_ring = txq->sw_ring;
 	struct ci_tx_entry *txe, *txn;
 	struct rte_mbuf *mb, *mb_seg;
@@ -4462,7 +4462,7 @@ iavf_dev_tx_desc_status(void *tx_queue, uint16_t offset)
 			desc -= txq->nb_tx_desc;
 	}
 
-	status = &txq->iavf_tx_ring[desc].cmd_type_offset_bsz;
+	status = &txq->ci_tx_ring[desc].cmd_type_offset_bsz;
 	mask = rte_le_to_cpu_64(IAVF_TXD_QW1_DTYPE_MASK);
 	expect = rte_cpu_to_le_64(
 		 IAVF_TX_DESC_DTYPE_DESC_DONE << IAVF_TXD_QW1_DTYPE_SHIFT);
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
index 5b62d51cf7..89ce841b9e 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
@@ -1729,7 +1729,7 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	nb_commit = nb_pkts;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->iavf_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = &txq->sw_ring_vec[tx_id];
 
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
@@ -1750,7 +1750,7 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->iavf_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = &txq->sw_ring_vec[tx_id];
 	}
 
@@ -1760,7 +1760,7 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->iavf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
 					 IAVF_TXD_QW1_CMD_SHIFT);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c b/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
index d79d96c7b7..ad1b0b90cd 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
@@ -2219,7 +2219,7 @@ iavf_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	nb_commit = nb_pkts;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->iavf_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = (void *)txq->sw_ring;
 	txep += tx_id;
 
@@ -2241,7 +2241,7 @@ iavf_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->iavf_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = (void *)txq->sw_ring;
 		txep += tx_id;
 	}
@@ -2252,7 +2252,7 @@ iavf_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->iavf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
 					 IAVF_TXD_QW1_CMD_SHIFT);
 		txq->tx_next_rs =
@@ -2288,7 +2288,7 @@ iavf_xmit_fixed_burst_vec_avx512_ctx(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	nb_pkts = nb_commit >> 1;
 	tx_id = txq->tx_tail;
-	txdp = &txq->iavf_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = (void *)txq->sw_ring;
 	txep += (tx_id >> 1);
 
@@ -2309,7 +2309,7 @@ iavf_xmit_fixed_burst_vec_avx512_ctx(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 		tx_id = 0;
 		/* avoid reach the end of ring */
-		txdp = txq->iavf_tx_ring;
+		txdp = txq->ci_tx_ring;
 		txep = (void *)txq->sw_ring;
 	}
 
@@ -2320,7 +2320,7 @@ iavf_xmit_fixed_burst_vec_avx512_ctx(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 
 	if (tx_id > txq->tx_next_rs) {
-		txq->iavf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
 					 IAVF_TXD_QW1_CMD_SHIFT);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_common.h b/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
index f1ea57034f..1832b76f89 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
@@ -14,7 +14,7 @@
 static inline int
 iavf_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 {
-	return (txq->iavf_tx_ring[idx].cmd_type_offset_bsz &
+	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
 			rte_cpu_to_le_64(IAVF_TXD_QW1_DTYPE_MASK)) ==
 				rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE);
 }
diff --git a/drivers/net/intel/ice/ice_dcf_ethdev.c b/drivers/net/intel/ice/ice_dcf_ethdev.c
index ab1d499cef..5f537b4c12 100644
--- a/drivers/net/intel/ice/ice_dcf_ethdev.c
+++ b/drivers/net/intel/ice/ice_dcf_ethdev.c
@@ -401,11 +401,11 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	txe = txq->sw_ring;
 	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
-		((volatile char *)txq->ice_tx_ring)[i] = 0;
+		((volatile char *)txq->ci_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		txq->ice_tx_ring[i].cmd_type_offset_bsz =
+		txq->ci_tx_ring[i].cmd_type_offset_bsz =
 			rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 74b80e7df3..e3ffbdb587 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -1117,11 +1117,11 @@ ice_reset_tx_queue(struct ci_tx_queue *txq)
 	txe = txq->sw_ring;
 	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
-		((volatile char *)txq->ice_tx_ring)[i] = 0;
+		((volatile char *)txq->ci_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		volatile struct ci_tx_desc *txd = &txq->ice_tx_ring[i];
+		volatile struct ci_tx_desc *txd = &txq->ci_tx_ring[i];
 
 		txd->cmd_type_offset_bsz =
 			rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE);
@@ -1625,7 +1625,7 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
 	/* Allocate TX hardware ring descriptors. */
 	ring_size = sizeof(struct ci_tx_desc) * ICE_MAX_NUM_DESC_BY_MAC(hw);
 	ring_size = RTE_ALIGN(ring_size, ICE_DMA_MEM_ALIGN);
-	tz = rte_eth_dma_zone_reserve(dev, "ice_tx_ring", queue_idx,
+	tz = rte_eth_dma_zone_reserve(dev, "ci_tx_ring", queue_idx,
 				      ring_size, ICE_RING_BASE_ALIGN,
 				      socket_id);
 	if (!tz) {
@@ -1649,7 +1649,7 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
 	txq->tx_deferred_start = tx_conf->tx_deferred_start;
 
 	txq->tx_ring_dma = tz->iova;
-	txq->ice_tx_ring = tz->addr;
+	txq->ci_tx_ring = tz->addr;
 
 	/* Allocate software ring */
 	txq->sw_ring =
@@ -2555,7 +2555,7 @@ ice_tx_descriptor_status(void *tx_queue, uint16_t offset)
 			desc -= txq->nb_tx_desc;
 	}
 
-	status = &txq->ice_tx_ring[desc].cmd_type_offset_bsz;
+	status = &txq->ci_tx_ring[desc].cmd_type_offset_bsz;
 	mask = rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M);
 	expect = rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE <<
 				  ICE_TXD_QW1_DTYPE_S);
@@ -2638,7 +2638,7 @@ ice_fdir_setup_tx_resources(struct ice_pf *pf)
 	txq->ice_vsi = pf->fdir.fdir_vsi;
 
 	txq->tx_ring_dma = tz->iova;
-	txq->ice_tx_ring = (struct ci_tx_desc *)tz->addr;
+	txq->ci_tx_ring = (struct ci_tx_desc *)tz->addr;
 	/*
 	 * don't need to allocate software ring and reset for the fdir
 	 * program queue just set the queue has been configured.
@@ -3027,7 +3027,7 @@ static inline int
 ice_xmit_cleanup(struct ci_tx_queue *txq)
 {
 	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	volatile struct ci_tx_desc *txd = txq->ice_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
 	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
 	uint16_t nb_tx_desc = txq->nb_tx_desc;
 	uint16_t desc_to_clean_to;
@@ -3148,7 +3148,7 @@ uint16_t
 ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq;
-	volatile struct ci_tx_desc *ice_tx_ring;
+	volatile struct ci_tx_desc *ci_tx_ring;
 	volatile struct ci_tx_desc *txd;
 	struct ci_tx_entry *sw_ring;
 	struct ci_tx_entry *txe, *txn;
@@ -3171,7 +3171,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 	txq = tx_queue;
 	sw_ring = txq->sw_ring;
-	ice_tx_ring = txq->ice_tx_ring;
+	ci_tx_ring = txq->ci_tx_ring;
 	tx_id = txq->tx_tail;
 	txe = &sw_ring[tx_id];
 
@@ -3257,7 +3257,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			/* Setup TX context descriptor if required */
 			volatile struct ice_tx_ctx_desc *ctx_txd =
 				(volatile struct ice_tx_ctx_desc *)
-					&ice_tx_ring[tx_id];
+					&ci_tx_ring[tx_id];
 			uint16_t cd_l2tag2 = 0;
 			uint64_t cd_type_cmd_tso_mss = ICE_TX_DESC_DTYPE_CTX;
 
@@ -3299,7 +3299,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		m_seg = tx_pkt;
 
 		do {
-			txd = &ice_tx_ring[tx_id];
+			txd = &ci_tx_ring[tx_id];
 			txn = &sw_ring[txe->next_id];
 
 			if (txe->mbuf)
@@ -3327,7 +3327,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 				txe->last_id = tx_last;
 				tx_id = txe->next_id;
 				txe = txn;
-				txd = &ice_tx_ring[tx_id];
+				txd = &ci_tx_ring[tx_id];
 				txn = &sw_ring[txe->next_id];
 			}
 
@@ -3410,7 +3410,7 @@ ice_tx_free_bufs(struct ci_tx_queue *txq)
 	struct ci_tx_entry *txep;
 	uint16_t i;
 
-	if ((txq->ice_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
+	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
 	     rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M)) !=
 	    rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE))
 		return 0;
@@ -3594,7 +3594,7 @@ static inline void
 ice_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
 		    uint16_t nb_pkts)
 {
-	volatile struct ci_tx_desc *txdp = &txq->ice_tx_ring[txq->tx_tail];
+	volatile struct ci_tx_desc *txdp = &txq->ci_tx_ring[txq->tx_tail];
 	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
 	const int N_PER_LOOP = 4;
 	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
@@ -3627,7 +3627,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
 	     uint16_t nb_pkts)
 {
-	volatile struct ci_tx_desc *txr = txq->ice_tx_ring;
+	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
 	uint16_t n = 0;
 
 	/**
@@ -4887,11 +4887,11 @@ ice_fdir_programming(struct ice_pf *pf, struct ice_fltr_desc *fdir_desc)
 	uint16_t i;
 
 	fdirdp = (volatile struct ice_fltr_desc *)
-		(&txq->ice_tx_ring[txq->tx_tail]);
+		(&txq->ci_tx_ring[txq->tx_tail]);
 	fdirdp->qidx_compq_space_stat = fdir_desc->qidx_compq_space_stat;
 	fdirdp->dtype_cmd_vsi_fdid = fdir_desc->dtype_cmd_vsi_fdid;
 
-	txdp = &txq->ice_tx_ring[txq->tx_tail + 1];
+	txdp = &txq->ci_tx_ring[txq->tx_tail + 1];
 	txdp->buffer_addr = rte_cpu_to_le_64(pf->fdir.dma_addr);
 	td_cmd = ICE_TX_DESC_CMD_EOP |
 		ICE_TX_DESC_CMD_RS  |
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
index bef7bb00ba..0a1df0b2f6 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
@@ -869,7 +869,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->ice_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = &txq->sw_ring_vec[tx_id];
 
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
@@ -890,7 +890,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->ice_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = &txq->sw_ring_vec[tx_id];
 	}
 
@@ -900,7 +900,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->ice_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS) <<
 					 ICE_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
index 1f6bf5fc8e..d42f41461f 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
@@ -933,7 +933,7 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->ice_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = (void *)txq->sw_ring;
 	txep += tx_id;
 
@@ -955,7 +955,7 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = txq->ice_tx_ring;
+		txdp = txq->ci_tx_ring;
 		txep = (void *)txq->sw_ring;
 	}
 
@@ -965,7 +965,7 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->ice_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS) <<
 					 ICE_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_common.h b/drivers/net/intel/ice/ice_rxtx_vec_common.h
index ff46a8fb49..8ba591e403 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_common.h
+++ b/drivers/net/intel/ice/ice_rxtx_vec_common.h
@@ -11,7 +11,7 @@
 static inline int
 ice_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 {
-	return (txq->ice_tx_ring[idx].cmd_type_offset_bsz &
+	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
 			rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M)) ==
 				rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE);
 }
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index be3c1ef216..51074bda3a 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -266,11 +266,11 @@ idpf_qc_single_tx_queue_reset(struct ci_tx_queue *txq)
 	txe = txq->sw_ring;
 	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
-		((volatile char *)txq->idpf_tx_ring)[i] = 0;
+		((volatile char *)txq->ci_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		txq->idpf_tx_ring[i].cmd_type_offset_bsz =
+		txq->ci_tx_ring[i].cmd_type_offset_bsz =
 			rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
@@ -1335,7 +1335,7 @@ idpf_xmit_cleanup(struct ci_tx_queue *txq)
 	uint16_t desc_to_clean_to;
 	uint16_t nb_tx_to_clean;
 
-	volatile struct ci_tx_desc *txd = txq->idpf_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
 
 	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
 	if (desc_to_clean_to >= nb_tx_desc)
@@ -1398,7 +1398,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		return nb_tx;
 
 	sw_ring = txq->sw_ring;
-	txr = txq->idpf_tx_ring;
+	txr = txq->ci_tx_ring;
 	tx_id = txq->tx_tail;
 	txe = &sw_ring[tx_id];
 
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c b/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
index 5f5d538dcb..04efee3722 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
@@ -573,7 +573,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->idpf_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = &txq->sw_ring_vec[tx_id];
 
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
@@ -594,7 +594,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->idpf_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = &txq->sw_ring_vec[tx_id];
 	}
 
@@ -604,7 +604,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->idpf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)IDPF_TX_DESC_CMD_RS) <<
 					 IDPF_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
index c1ec3d1222..d5e5a2ca5f 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
@@ -1090,7 +1090,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pk
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->idpf_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = (void *)txq->sw_ring;
 	txep += tx_id;
 
@@ -1112,7 +1112,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pk
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->idpf_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = (void *)txq->sw_ring;
 		txep += tx_id;
 	}
@@ -1123,7 +1123,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pk
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->idpf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)IDPF_TX_DESC_CMD_RS) <<
 					 IDPF_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/idpf/idpf_rxtx.c b/drivers/net/intel/idpf/idpf_rxtx.c
index 8aa44585fe..0de54d9305 100644
--- a/drivers/net/intel/idpf/idpf_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_rxtx.c
@@ -481,7 +481,7 @@ idpf_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 	}
 
 	if (!is_splitq) {
-		txq->idpf_tx_ring = mz->addr;
+		txq->ci_tx_ring = mz->addr;
 		idpf_qc_single_tx_queue_reset(txq);
 	} else {
 		txq->desc_ring = mz->addr;
diff --git a/drivers/net/intel/idpf/idpf_rxtx_vec_common.h b/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
index 4702061484..b5e8574667 100644
--- a/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
+++ b/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
@@ -31,7 +31,7 @@ idpf_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 	if (txq->complq != NULL)
 		return 1;
 
-	return (txq->idpf_tx_ring[idx].cmd_type_offset_bsz &
+	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
 			rte_cpu_to_le_64(IDPF_TXD_QW1_DTYPE_M)) ==
 				rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE);
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v3 03/36] net/intel: create common post-Tx cleanup function
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
  2026-01-30 11:41   ` [PATCH v3 01/36] net/intel: create common Tx descriptor structure Bruce Richardson
  2026-01-30 11:41   ` [PATCH v3 02/36] net/intel: use common Tx ring structure Bruce Richardson
@ 2026-01-30 11:41   ` Bruce Richardson
  2026-02-06 10:07     ` Loftus, Ciara
  2026-01-30 11:41   ` [PATCH v3 04/36] net/intel: consolidate definitions for Tx desc fields Bruce Richardson
                     ` (32 subsequent siblings)
  35 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-01-30 11:41 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Vladimir Medvedkin, Anatoly Burakov,
	Jingjing Wu, Praveen Shetty

The code used in ice, iavf, idpf and i40e for doing cleanup of mbufs
after they had been transmitted was identical. Therefore deduplicate it
by moving to common and remove the driver-specific versions.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h             | 53 ++++++++++++++++++++
 drivers/net/intel/i40e/i40e_rxtx.c        | 49 ++----------------
 drivers/net/intel/iavf/iavf_rxtx.c        | 50 ++-----------------
 drivers/net/intel/ice/ice_rxtx.c          | 60 ++---------------------
 drivers/net/intel/idpf/idpf_common_rxtx.c | 46 ++---------------
 5 files changed, 71 insertions(+), 187 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 8cf63e59ab..a89412c195 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -259,6 +259,59 @@ ci_tx_free_bufs_vec(struct ci_tx_queue *txq, ci_desc_done_fn desc_done, bool ctx
 	return txq->tx_rs_thresh;
 }
 
+/*
+ * Common transmit descriptor cleanup function for Intel drivers.
+ * Used by ice, i40e, iavf, and idpf drivers.
+ *
+ * Returns:
+ *   0 on success
+ *  -1 if cleanup cannot proceed (descriptors not yet processed by HW)
+ */
+static __rte_always_inline int
+ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
+{
+	struct ci_tx_entry *sw_ring = txq->sw_ring;
+	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
+	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
+	uint16_t nb_tx_desc = txq->nb_tx_desc;
+	uint16_t desc_to_clean_to;
+	uint16_t nb_tx_to_clean;
+
+	/* Determine the last descriptor needing to be cleaned */
+	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
+	if (desc_to_clean_to >= nb_tx_desc)
+		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
+
+	/* Check to make sure the last descriptor to clean is done */
+	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
+
+	/* Check if descriptor is done - all drivers use 0xF as done value in bits 3:0 */
+	if ((txd[desc_to_clean_to].cmd_type_offset_bsz & rte_cpu_to_le_64(0xFUL)) !=
+			rte_cpu_to_le_64(0xFUL)) {
+		/* Descriptor not yet processed by hardware */
+		return -1;
+	}
+
+	/* Figure out how many descriptors will be cleaned */
+	if (last_desc_cleaned > desc_to_clean_to)
+		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) + desc_to_clean_to);
+	else
+		nb_tx_to_clean = (uint16_t)(desc_to_clean_to - last_desc_cleaned);
+
+	/* The last descriptor to clean is done, so that means all the
+	 * descriptors from the last descriptor that was cleaned
+	 * up to the last descriptor with the RS bit set
+	 * are done. Only reset the threshold descriptor.
+	 */
+	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
+
+	/* Update the txq to reflect the last descriptor that was cleaned */
+	txq->last_desc_cleaned = desc_to_clean_to;
+	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
+
+	return 0;
+}
+
 static inline void
 ci_txq_release_all_mbufs(struct ci_tx_queue *txq, bool use_ctx)
 {
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 210fc0201e..2760e76e99 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -384,45 +384,6 @@ i40e_build_ctob(uint32_t td_cmd,
 			((uint64_t)td_tag  << I40E_TXD_QW1_L2TAG1_SHIFT));
 }
 
-static inline int
-i40e_xmit_cleanup(struct ci_tx_queue *txq)
-{
-	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
-	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
-	uint16_t nb_tx_desc = txq->nb_tx_desc;
-	uint16_t desc_to_clean_to;
-	uint16_t nb_tx_to_clean;
-
-	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
-	if (desc_to_clean_to >= nb_tx_desc)
-		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
-
-	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-	if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) !=
-			rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE)) {
-		PMD_TX_LOG(DEBUG, "TX descriptor %4u is not done "
-			   "(port=%d queue=%d)", desc_to_clean_to,
-			   txq->port_id, txq->queue_id);
-		return -1;
-	}
-
-	if (last_desc_cleaned > desc_to_clean_to)
-		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) +
-							desc_to_clean_to);
-	else
-		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
-					last_desc_cleaned);
-
-	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
-
-	txq->last_desc_cleaned = desc_to_clean_to;
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
-
-	return 0;
-}
-
 static inline int
 #ifdef RTE_LIBRTE_I40E_RX_ALLOW_BULK_ALLOC
 check_rx_burst_bulk_alloc_preconditions(struct ci_rx_queue *rxq)
@@ -1118,7 +1079,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 	/* Check if the descriptor ring needs to be cleaned. */
 	if (txq->nb_tx_free < txq->tx_free_thresh)
-		(void)i40e_xmit_cleanup(txq);
+		(void)ci_tx_xmit_cleanup(txq);
 
 	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
 		td_cmd = 0;
@@ -1159,14 +1120,14 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
 
 		if (nb_used > txq->nb_tx_free) {
-			if (i40e_xmit_cleanup(txq) != 0) {
+			if (ci_tx_xmit_cleanup(txq) != 0) {
 				if (nb_tx == 0)
 					return 0;
 				goto end_of_tx;
 			}
 			if (unlikely(nb_used > txq->tx_rs_thresh)) {
 				while (nb_used > txq->nb_tx_free) {
-					if (i40e_xmit_cleanup(txq) != 0) {
+					if (ci_tx_xmit_cleanup(txq) != 0) {
 						if (nb_tx == 0)
 							return 0;
 						goto end_of_tx;
@@ -2808,7 +2769,7 @@ i40e_tx_done_cleanup_full(struct ci_tx_queue *txq,
 	tx_last = txq->tx_tail;
 	tx_id  = swr_ring[tx_last].next_id;
 
-	if (txq->nb_tx_free == 0 && i40e_xmit_cleanup(txq))
+	if (txq->nb_tx_free == 0 && ci_tx_xmit_cleanup(txq))
 		return 0;
 
 	nb_tx_to_clean = txq->nb_tx_free;
@@ -2842,7 +2803,7 @@ i40e_tx_done_cleanup_full(struct ci_tx_queue *txq,
 			break;
 
 		if (pkt_cnt < free_cnt) {
-			if (i40e_xmit_cleanup(txq))
+			if (ci_tx_xmit_cleanup(txq))
 				break;
 
 			nb_tx_to_clean = txq->nb_tx_free - nb_tx_free_last;
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index 807bc92a45..560abfc1ef 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -2324,46 +2324,6 @@ iavf_recv_pkts_bulk_alloc(void *rx_queue,
 	return nb_rx;
 }
 
-static inline int
-iavf_xmit_cleanup(struct ci_tx_queue *txq)
-{
-	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
-	uint16_t nb_tx_desc = txq->nb_tx_desc;
-	uint16_t desc_to_clean_to;
-	uint16_t nb_tx_to_clean;
-
-	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
-
-	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
-	if (desc_to_clean_to >= nb_tx_desc)
-		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
-
-	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-	if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(IAVF_TXD_QW1_DTYPE_MASK)) !=
-			rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE)) {
-		PMD_TX_LOG(DEBUG, "TX descriptor %4u is not done "
-			   "(port=%d queue=%d)", desc_to_clean_to,
-			   txq->port_id, txq->queue_id);
-		return -1;
-	}
-
-	if (last_desc_cleaned > desc_to_clean_to)
-		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) +
-							desc_to_clean_to);
-	else
-		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
-					last_desc_cleaned);
-
-	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
-
-	txq->last_desc_cleaned = desc_to_clean_to;
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
-
-	return 0;
-}
-
 /* Check if the context descriptor is needed for TX offloading */
 static inline uint16_t
 iavf_calc_context_desc(struct rte_mbuf *mb, uint8_t vlan_flag)
@@ -2768,7 +2728,7 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 	/* Check if the descriptor ring needs to be cleaned. */
 	if (txq->nb_tx_free < txq->tx_free_thresh)
-		iavf_xmit_cleanup(txq);
+		ci_tx_xmit_cleanup(txq);
 
 	desc_idx = txq->tx_tail;
 	txe = &txe_ring[desc_idx];
@@ -2823,14 +2783,14 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			txq->port_id, txq->queue_id, desc_idx, desc_idx_last);
 
 		if (nb_desc_required > txq->nb_tx_free) {
-			if (iavf_xmit_cleanup(txq)) {
+			if (ci_tx_xmit_cleanup(txq)) {
 				if (idx == 0)
 					return 0;
 				goto end_of_tx;
 			}
 			if (unlikely(nb_desc_required > txq->tx_rs_thresh)) {
 				while (nb_desc_required > txq->nb_tx_free) {
-					if (iavf_xmit_cleanup(txq)) {
+					if (ci_tx_xmit_cleanup(txq)) {
 						if (idx == 0)
 							return 0;
 						goto end_of_tx;
@@ -4300,7 +4260,7 @@ iavf_tx_done_cleanup_full(struct ci_tx_queue *txq,
 	tx_id = txq->tx_tail;
 	tx_last = tx_id;
 
-	if (txq->nb_tx_free == 0 && iavf_xmit_cleanup(txq))
+	if (txq->nb_tx_free == 0 && ci_tx_xmit_cleanup(txq))
 		return 0;
 
 	nb_tx_to_clean = txq->nb_tx_free;
@@ -4332,7 +4292,7 @@ iavf_tx_done_cleanup_full(struct ci_tx_queue *txq,
 			break;
 
 		if (pkt_cnt < free_cnt) {
-			if (iavf_xmit_cleanup(txq))
+			if (ci_tx_xmit_cleanup(txq))
 				break;
 
 			nb_tx_to_clean = txq->nb_tx_free - nb_tx_free_last;
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index e3ffbdb587..7a33e1e980 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3023,56 +3023,6 @@ ice_txd_enable_checksum(uint64_t ol_flags,
 	}
 }
 
-static inline int
-ice_xmit_cleanup(struct ci_tx_queue *txq)
-{
-	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
-	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
-	uint16_t nb_tx_desc = txq->nb_tx_desc;
-	uint16_t desc_to_clean_to;
-	uint16_t nb_tx_to_clean;
-
-	/* Determine the last descriptor needing to be cleaned */
-	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
-	if (desc_to_clean_to >= nb_tx_desc)
-		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
-
-	/* Check to make sure the last descriptor to clean is done */
-	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-	if (!(txd[desc_to_clean_to].cmd_type_offset_bsz &
-	    rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE))) {
-		PMD_TX_LOG(DEBUG, "TX descriptor %4u is not done "
-			   "(port=%d queue=%d) value=0x%"PRIx64,
-			   desc_to_clean_to,
-			   txq->port_id, txq->queue_id,
-			   txd[desc_to_clean_to].cmd_type_offset_bsz);
-		/* Failed to clean any descriptors */
-		return -1;
-	}
-
-	/* Figure out how many descriptors will be cleaned */
-	if (last_desc_cleaned > desc_to_clean_to)
-		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) +
-					    desc_to_clean_to);
-	else
-		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
-					    last_desc_cleaned);
-
-	/* The last descriptor to clean is done, so that means all the
-	 * descriptors from the last descriptor that was cleaned
-	 * up to the last descriptor with the RS bit set
-	 * are done. Only reset the threshold descriptor.
-	 */
-	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
-
-	/* Update the txq to reflect the last descriptor that was cleaned */
-	txq->last_desc_cleaned = desc_to_clean_to;
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
-
-	return 0;
-}
-
 /* Construct the tx flags */
 static inline uint64_t
 ice_build_ctob(uint32_t td_cmd,
@@ -3180,7 +3130,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 	/* Check if the descriptor ring needs to be cleaned. */
 	if (txq->nb_tx_free < txq->tx_free_thresh)
-		(void)ice_xmit_cleanup(txq);
+		(void)ci_tx_xmit_cleanup(txq);
 
 	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
 		tx_pkt = *tx_pkts++;
@@ -3217,14 +3167,14 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
 
 		if (nb_used > txq->nb_tx_free) {
-			if (ice_xmit_cleanup(txq) != 0) {
+			if (ci_tx_xmit_cleanup(txq) != 0) {
 				if (nb_tx == 0)
 					return 0;
 				goto end_of_tx;
 			}
 			if (unlikely(nb_used > txq->tx_rs_thresh)) {
 				while (nb_used > txq->nb_tx_free) {
-					if (ice_xmit_cleanup(txq) != 0) {
+					if (ci_tx_xmit_cleanup(txq) != 0) {
 						if (nb_tx == 0)
 							return 0;
 						goto end_of_tx;
@@ -3459,7 +3409,7 @@ ice_tx_done_cleanup_full(struct ci_tx_queue *txq,
 	tx_last = txq->tx_tail;
 	tx_id  = swr_ring[tx_last].next_id;
 
-	if (txq->nb_tx_free == 0 && ice_xmit_cleanup(txq))
+	if (txq->nb_tx_free == 0 && ci_tx_xmit_cleanup(txq))
 		return 0;
 
 	nb_tx_to_clean = txq->nb_tx_free;
@@ -3493,7 +3443,7 @@ ice_tx_done_cleanup_full(struct ci_tx_queue *txq,
 			break;
 
 		if (pkt_cnt < free_cnt) {
-			if (ice_xmit_cleanup(txq))
+			if (ci_tx_xmit_cleanup(txq))
 				break;
 
 			nb_tx_to_clean = txq->nb_tx_free - nb_tx_free_last;
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 51074bda3a..23666539ab 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -1326,46 +1326,6 @@ idpf_dp_singleq_recv_scatter_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 	return nb_rx;
 }
 
-static inline int
-idpf_xmit_cleanup(struct ci_tx_queue *txq)
-{
-	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
-	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	uint16_t nb_tx_desc = txq->nb_tx_desc;
-	uint16_t desc_to_clean_to;
-	uint16_t nb_tx_to_clean;
-
-	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
-
-	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
-	if (desc_to_clean_to >= nb_tx_desc)
-		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
-
-	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-	if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
-	     rte_cpu_to_le_64(IDPF_TXD_QW1_DTYPE_M)) !=
-	    rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE)) {
-		TX_LOG(DEBUG, "TX descriptor %4u is not done "
-		       "(port=%d queue=%d)", desc_to_clean_to,
-		       txq->port_id, txq->queue_id);
-		return -1;
-	}
-
-	if (last_desc_cleaned > desc_to_clean_to)
-		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) +
-					    desc_to_clean_to);
-	else
-		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
-					    last_desc_cleaned);
-
-	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
-
-	txq->last_desc_cleaned = desc_to_clean_to;
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
-
-	return 0;
-}
-
 /* TX function */
 RTE_EXPORT_INTERNAL_SYMBOL(idpf_dp_singleq_xmit_pkts)
 uint16_t
@@ -1404,7 +1364,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	/* Check if the descriptor ring needs to be cleaned. */
 	if (txq->nb_tx_free < txq->tx_free_thresh)
-		(void)idpf_xmit_cleanup(txq);
+		(void)ci_tx_xmit_cleanup(txq);
 
 	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
 		td_cmd = 0;
@@ -1437,14 +1397,14 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		       txq->port_id, txq->queue_id, tx_id, tx_last);
 
 		if (nb_used > txq->nb_tx_free) {
-			if (idpf_xmit_cleanup(txq) != 0) {
+			if (ci_tx_xmit_cleanup(txq) != 0) {
 				if (nb_tx == 0)
 					return 0;
 				goto end_of_tx;
 			}
 			if (unlikely(nb_used > txq->tx_rs_thresh)) {
 				while (nb_used > txq->nb_tx_free) {
-					if (idpf_xmit_cleanup(txq) != 0) {
+					if (ci_tx_xmit_cleanup(txq) != 0) {
 						if (nb_tx == 0)
 							return 0;
 						goto end_of_tx;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v3 04/36] net/intel: consolidate definitions for Tx desc fields
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
                     ` (2 preceding siblings ...)
  2026-01-30 11:41   ` [PATCH v3 03/36] net/intel: create common post-Tx cleanup function Bruce Richardson
@ 2026-01-30 11:41   ` Bruce Richardson
  2026-02-06 10:14     ` Loftus, Ciara
  2026-01-30 11:41   ` [PATCH v3 05/36] net/intel: create separate header for Tx scalar fns Bruce Richardson
                     ` (31 subsequent siblings)
  35 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-01-30 11:41 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Vladimir Medvedkin, Anatoly Burakov,
	Jingjing Wu, Praveen Shetty

The offsets of the various fields within the Tx descriptors are common
for i40e, iavf, ice and idpf, so put a single set of defines in tx.h and
use those throughout all drivers. (NOTE: there was a small difference in
mask of CMD field between drivers depending on whether reserved fields
or not were included. Those can be ignored as those bits are unused in
the drivers for which they are reserved). Similarly, the various flag
fields, such as End-of-packet (EOP) and Report-status (RS) are the same,
as are offload definitions so consolidate them.

Original definitions are in base code, and are left in place because of
that, but are unused.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h                 |  64 +++++++-
 drivers/net/intel/i40e/i40e_fdir.c            |  24 +--
 drivers/net/intel/i40e/i40e_rxtx.c            |  92 ++++++------
 drivers/net/intel/i40e/i40e_rxtx.h            |  17 +--
 .../net/intel/i40e/i40e_rxtx_vec_altivec.c    |  11 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c   |  22 ++-
 drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c |  38 ++---
 drivers/net/intel/i40e/i40e_rxtx_vec_common.h |   4 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_neon.c   |  11 +-
 drivers/net/intel/iavf/iavf_rxtx.c            |  68 +++++----
 drivers/net/intel/iavf/iavf_rxtx.h            |  20 +--
 drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c   |  41 ++----
 drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c |  80 ++++------
 drivers/net/intel/iavf/iavf_rxtx_vec_common.h |  34 ++---
 drivers/net/intel/ice/ice_dcf_ethdev.c        |   2 +-
 drivers/net/intel/ice/ice_rxtx.c              | 137 ++++++++----------
 drivers/net/intel/ice/ice_rxtx.h              |  15 +-
 drivers/net/intel/ice/ice_rxtx_vec_avx2.c     |  41 ++----
 drivers/net/intel/ice/ice_rxtx_vec_avx512.c   |  39 ++---
 drivers/net/intel/ice/ice_rxtx_vec_common.h   |  41 +++---
 drivers/net/intel/idpf/idpf_common_rxtx.c     |  22 +--
 drivers/net/intel/idpf/idpf_common_rxtx.h     |  12 --
 .../net/intel/idpf/idpf_common_rxtx_avx2.c    |  41 ++----
 .../net/intel/idpf/idpf_common_rxtx_avx512.c  |  41 ++----
 drivers/net/intel/idpf/idpf_rxtx_vec_common.h |   4 +-
 25 files changed, 408 insertions(+), 513 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index a89412c195..03245d4fba 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -10,6 +10,66 @@
 #include <rte_ethdev.h>
 #include <rte_vect.h>
 
+/* Common TX Descriptor QW1 Field Definitions */
+#define CI_TXD_QW1_DTYPE_S      0
+#define CI_TXD_QW1_DTYPE_M      (0xFUL << CI_TXD_QW1_DTYPE_S)
+#define CI_TXD_QW1_CMD_S        4
+#define CI_TXD_QW1_CMD_M        (0xFFFUL << CI_TXD_QW1_CMD_S)
+#define CI_TXD_QW1_OFFSET_S     16
+#define CI_TXD_QW1_OFFSET_M     (0x3FFFFULL << CI_TXD_QW1_OFFSET_S)
+#define CI_TXD_QW1_TX_BUF_SZ_S  34
+#define CI_TXD_QW1_TX_BUF_SZ_M  (0x3FFFULL << CI_TXD_QW1_TX_BUF_SZ_S)
+#define CI_TXD_QW1_L2TAG1_S     48
+#define CI_TXD_QW1_L2TAG1_M     (0xFFFFULL << CI_TXD_QW1_L2TAG1_S)
+
+/* Common Descriptor Types */
+#define CI_TX_DESC_DTYPE_DATA           0x0
+#define CI_TX_DESC_DTYPE_CTX            0x1
+#define CI_TX_DESC_DTYPE_DESC_DONE      0xF
+
+/* Common TX Descriptor Command Flags */
+#define CI_TX_DESC_CMD_EOP              0x0001
+#define CI_TX_DESC_CMD_RS               0x0002
+#define CI_TX_DESC_CMD_ICRC             0x0004
+#define CI_TX_DESC_CMD_IL2TAG1          0x0008
+#define CI_TX_DESC_CMD_DUMMY            0x0010
+#define CI_TX_DESC_CMD_IIPT_IPV6        0x0020
+#define CI_TX_DESC_CMD_IIPT_IPV4        0x0040
+#define CI_TX_DESC_CMD_IIPT_IPV4_CSUM   0x0060
+#define CI_TX_DESC_CMD_L4T_EOFT_TCP     0x0100
+#define CI_TX_DESC_CMD_L4T_EOFT_SCTP    0x0200
+#define CI_TX_DESC_CMD_L4T_EOFT_UDP     0x0300
+
+/* Common TX Context Descriptor Commands */
+#define CI_TX_CTX_DESC_TSO              0x01
+#define CI_TX_CTX_DESC_TSYN             0x02
+#define CI_TX_CTX_DESC_IL2TAG2          0x04
+
+/* Common TX Descriptor Length Field Shifts */
+#define CI_TX_DESC_LEN_MACLEN_S         0  /* 7 BITS */
+#define CI_TX_DESC_LEN_IPLEN_S          7  /* 7 BITS */
+#define CI_TX_DESC_LEN_L4_LEN_S         14 /* 4 BITS */
+
+/* Common maximum data per TX descriptor */
+#define CI_MAX_DATA_PER_TXD     (CI_TXD_QW1_TX_BUF_SZ_M >> CI_TXD_QW1_TX_BUF_SZ_S)
+
+/**
+ * Common TX offload union for Intel drivers.
+ * Supports both basic offloads (l2_len, l3_len, l4_len, tso_segsz) and
+ * extended offloads (outer_l2_len, outer_l3_len) for tunneling support.
+ */
+union ci_tx_offload {
+	uint64_t data;
+	struct {
+		uint64_t l2_len:7;        /**< L2 (MAC) Header Length. */
+		uint64_t l3_len:9;        /**< L3 (IP) Header Length. */
+		uint64_t l4_len:8;        /**< L4 Header Length. */
+		uint64_t tso_segsz:16;    /**< TCP TSO segment size */
+		uint64_t outer_l2_len:8;  /**< outer L2 Header Length */
+		uint64_t outer_l3_len:16; /**< outer L3 Header Length */
+	};
+};
+
 /*
  * Structure of a 16-byte Tx descriptor common across i40e, ice, iavf and idpf drivers
  */
@@ -286,8 +346,8 @@ ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
 	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
 
 	/* Check if descriptor is done - all drivers use 0xF as done value in bits 3:0 */
-	if ((txd[desc_to_clean_to].cmd_type_offset_bsz & rte_cpu_to_le_64(0xFUL)) !=
-			rte_cpu_to_le_64(0xFUL)) {
+	if ((txd[desc_to_clean_to].cmd_type_offset_bsz & rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE)) {
 		/* Descriptor not yet processed by hardware */
 		return -1;
 	}
diff --git a/drivers/net/intel/i40e/i40e_fdir.c b/drivers/net/intel/i40e/i40e_fdir.c
index 8a01aec0e2..3b099d5a9e 100644
--- a/drivers/net/intel/i40e/i40e_fdir.c
+++ b/drivers/net/intel/i40e/i40e_fdir.c
@@ -916,11 +916,11 @@ i40e_build_ctob(uint32_t td_cmd,
 		unsigned int size,
 		uint32_t td_tag)
 {
-	return rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DATA |
-			((uint64_t)td_cmd  << I40E_TXD_QW1_CMD_SHIFT) |
-			((uint64_t)td_offset << I40E_TXD_QW1_OFFSET_SHIFT) |
-			((uint64_t)size  << I40E_TXD_QW1_TX_BUF_SZ_SHIFT) |
-			((uint64_t)td_tag  << I40E_TXD_QW1_L2TAG1_SHIFT));
+	return rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+			((uint64_t)td_cmd  << CI_TXD_QW1_CMD_S) |
+			((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+			((uint64_t)size  << CI_TXD_QW1_TX_BUF_SZ_S) |
+			((uint64_t)td_tag  << CI_TXD_QW1_L2TAG1_S));
 }
 
 /*
@@ -1384,8 +1384,8 @@ i40e_find_available_buffer(struct rte_eth_dev *dev)
 
 		do {
 			if ((tmp_txdp->cmd_type_offset_bsz &
-					rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) ==
-					rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE))
+					rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+					rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
 				fdir_info->txq_available_buf_count++;
 			else
 				break;
@@ -1710,9 +1710,9 @@ i40e_flow_fdir_filter_programming(struct i40e_pf *pf,
 	txdp = &txq->ci_tx_ring[txq->tx_tail + 1];
 	txdp->buffer_addr = rte_cpu_to_le_64(pf->fdir.dma_addr[txq->tx_tail >> 1]);
 
-	td_cmd = I40E_TX_DESC_CMD_EOP |
-		 I40E_TX_DESC_CMD_RS  |
-		 I40E_TX_DESC_CMD_DUMMY;
+	td_cmd = CI_TX_DESC_CMD_EOP |
+		 CI_TX_DESC_CMD_RS  |
+		 CI_TX_DESC_CMD_DUMMY;
 
 	txdp->cmd_type_offset_bsz =
 		i40e_build_ctob(td_cmd, 0, I40E_FDIR_PKT_LEN, 0);
@@ -1731,8 +1731,8 @@ i40e_flow_fdir_filter_programming(struct i40e_pf *pf,
 	if (wait_status) {
 		for (i = 0; i < I40E_FDIR_MAX_WAIT_US; i++) {
 			if ((txdp->cmd_type_offset_bsz &
-					rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) ==
-					rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE))
+					rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+					rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
 				break;
 			rte_delay_us(1);
 		}
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 2760e76e99..f96c5c7f1e 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -45,7 +45,7 @@
 /* Base address of the HW descriptor ring should be 128B aligned. */
 #define I40E_RING_BASE_ALIGN	128
 
-#define I40E_TXD_CMD (I40E_TX_DESC_CMD_EOP | I40E_TX_DESC_CMD_RS)
+#define I40E_TXD_CMD (CI_TX_DESC_CMD_EOP | CI_TX_DESC_CMD_RS)
 
 #ifdef RTE_LIBRTE_IEEE1588
 #define I40E_TX_IEEE1588_TMST RTE_MBUF_F_TX_IEEE1588_TMST
@@ -260,7 +260,7 @@ i40e_rxd_build_fdir(volatile union ci_rx_desc *rxdp, struct rte_mbuf *mb)
 
 static inline void
 i40e_parse_tunneling_params(uint64_t ol_flags,
-			    union i40e_tx_offload tx_offload,
+			    union ci_tx_offload tx_offload,
 			    uint32_t *cd_tunneling)
 {
 	/* EIPT: External (outer) IP header type */
@@ -319,51 +319,51 @@ static inline void
 i40e_txd_enable_checksum(uint64_t ol_flags,
 			uint32_t *td_cmd,
 			uint32_t *td_offset,
-			union i40e_tx_offload tx_offload)
+			union ci_tx_offload tx_offload)
 {
 	/* Set MACLEN */
 	if (!(ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK))
 		*td_offset |= (tx_offload.l2_len >> 1)
-			<< I40E_TX_DESC_LENGTH_MACLEN_SHIFT;
+			<< CI_TX_DESC_LEN_MACLEN_S;
 
 	/* Enable L3 checksum offloads */
 	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
-		*td_cmd |= I40E_TX_DESC_CMD_IIPT_IPV4_CSUM;
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
 		*td_offset |= (tx_offload.l3_len >> 2)
-				<< I40E_TX_DESC_LENGTH_IPLEN_SHIFT;
+				<< CI_TX_DESC_LEN_IPLEN_S;
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
-		*td_cmd |= I40E_TX_DESC_CMD_IIPT_IPV4;
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
 		*td_offset |= (tx_offload.l3_len >> 2)
-				<< I40E_TX_DESC_LENGTH_IPLEN_SHIFT;
+				<< CI_TX_DESC_LEN_IPLEN_S;
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
-		*td_cmd |= I40E_TX_DESC_CMD_IIPT_IPV6;
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
 		*td_offset |= (tx_offload.l3_len >> 2)
-				<< I40E_TX_DESC_LENGTH_IPLEN_SHIFT;
+				<< CI_TX_DESC_LEN_IPLEN_S;
 	}
 
 	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
-		*td_cmd |= I40E_TX_DESC_CMD_L4T_EOFT_TCP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
 		*td_offset |= (tx_offload.l4_len >> 2)
-			<< I40E_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+			<< CI_TX_DESC_LEN_L4_LEN_S;
 		return;
 	}
 
 	/* Enable L4 checksum offloads */
 	switch (ol_flags & RTE_MBUF_F_TX_L4_MASK) {
 	case RTE_MBUF_F_TX_TCP_CKSUM:
-		*td_cmd |= I40E_TX_DESC_CMD_L4T_EOFT_TCP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
 		*td_offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
-				I40E_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+				CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_SCTP_CKSUM:
-		*td_cmd |= I40E_TX_DESC_CMD_L4T_EOFT_SCTP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
 		*td_offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
-				I40E_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+				CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_UDP_CKSUM:
-		*td_cmd |= I40E_TX_DESC_CMD_L4T_EOFT_UDP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
 		*td_offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
-				I40E_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+				CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	default:
 		break;
@@ -377,11 +377,11 @@ i40e_build_ctob(uint32_t td_cmd,
 		unsigned int size,
 		uint32_t td_tag)
 {
-	return rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DATA |
-			((uint64_t)td_cmd  << I40E_TXD_QW1_CMD_SHIFT) |
-			((uint64_t)td_offset << I40E_TXD_QW1_OFFSET_SHIFT) |
-			((uint64_t)size  << I40E_TXD_QW1_TX_BUF_SZ_SHIFT) |
-			((uint64_t)td_tag  << I40E_TXD_QW1_L2TAG1_SHIFT));
+	return rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+			((uint64_t)td_cmd  << CI_TXD_QW1_CMD_S) |
+			((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+			((uint64_t)size  << CI_TXD_QW1_TX_BUF_SZ_S) |
+			((uint64_t)td_tag  << CI_TXD_QW1_L2TAG1_S));
 }
 
 static inline int
@@ -1004,7 +1004,7 @@ i40e_calc_context_desc(uint64_t flags)
 
 /* set i40e TSO context descriptor */
 static inline uint64_t
-i40e_set_tso_ctx(struct rte_mbuf *mbuf, union i40e_tx_offload tx_offload)
+i40e_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 {
 	uint64_t ctx_desc = 0;
 	uint32_t cd_cmd, hdr_len, cd_tso_len;
@@ -1029,9 +1029,6 @@ i40e_set_tso_ctx(struct rte_mbuf *mbuf, union i40e_tx_offload tx_offload)
 	return ctx_desc;
 }
 
-/* HW requires that Tx buffer size ranges from 1B up to (16K-1)B. */
-#define I40E_MAX_DATA_PER_TXD \
-	(I40E_TXD_QW1_TX_BUF_SZ_MASK >> I40E_TXD_QW1_TX_BUF_SZ_SHIFT)
 /* Calculate the number of TX descriptors needed for each pkt */
 static inline uint16_t
 i40e_calc_pkt_desc(struct rte_mbuf *tx_pkt)
@@ -1040,7 +1037,7 @@ i40e_calc_pkt_desc(struct rte_mbuf *tx_pkt)
 	uint16_t count = 0;
 
 	while (txd != NULL) {
-		count += DIV_ROUND_UP(txd->data_len, I40E_MAX_DATA_PER_TXD);
+		count += DIV_ROUND_UP(txd->data_len, CI_MAX_DATA_PER_TXD);
 		txd = txd->next;
 	}
 
@@ -1069,7 +1066,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	uint16_t tx_last;
 	uint16_t slen;
 	uint64_t buf_dma_addr;
-	union i40e_tx_offload tx_offload = {0};
+	union ci_tx_offload tx_offload = {0};
 
 	txq = tx_queue;
 	sw_ring = txq->sw_ring;
@@ -1138,18 +1135,18 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		/* Descriptor based VLAN insertion */
 		if (ol_flags & (RTE_MBUF_F_TX_VLAN | RTE_MBUF_F_TX_QINQ)) {
-			td_cmd |= I40E_TX_DESC_CMD_IL2TAG1;
+			td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
 			td_tag = tx_pkt->vlan_tci;
 		}
 
 		/* Always enable CRC offload insertion */
-		td_cmd |= I40E_TX_DESC_CMD_ICRC;
+		td_cmd |= CI_TX_DESC_CMD_ICRC;
 
 		/* Fill in tunneling parameters if necessary */
 		cd_tunneling_params = 0;
 		if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) {
 			td_offset |= (tx_offload.outer_l2_len >> 1)
-					<< I40E_TX_DESC_LENGTH_MACLEN_SHIFT;
+					<< CI_TX_DESC_LEN_MACLEN_S;
 			i40e_parse_tunneling_params(ol_flags, tx_offload,
 						    &cd_tunneling_params);
 		}
@@ -1229,16 +1226,16 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			buf_dma_addr = rte_mbuf_data_iova(m_seg);
 
 			while ((ol_flags & RTE_MBUF_F_TX_TCP_SEG) &&
-				unlikely(slen > I40E_MAX_DATA_PER_TXD)) {
+				unlikely(slen > CI_MAX_DATA_PER_TXD)) {
 				txd->buffer_addr =
 					rte_cpu_to_le_64(buf_dma_addr);
 				txd->cmd_type_offset_bsz =
 					i40e_build_ctob(td_cmd,
-					td_offset, I40E_MAX_DATA_PER_TXD,
+					td_offset, CI_MAX_DATA_PER_TXD,
 					td_tag);
 
-				buf_dma_addr += I40E_MAX_DATA_PER_TXD;
-				slen -= I40E_MAX_DATA_PER_TXD;
+				buf_dma_addr += CI_MAX_DATA_PER_TXD;
+				slen -= CI_MAX_DATA_PER_TXD;
 
 				txe->last_id = tx_last;
 				tx_id = txe->next_id;
@@ -1265,7 +1262,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		} while (m_seg != NULL);
 
 		/* The last packet data descriptor needs End Of Packet (EOP) */
-		td_cmd |= I40E_TX_DESC_CMD_EOP;
+		td_cmd |= CI_TX_DESC_CMD_EOP;
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
@@ -1275,15 +1272,14 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 				   "%4u (port=%d queue=%d)",
 				   tx_last, txq->port_id, txq->queue_id);
 
-			td_cmd |= I40E_TX_DESC_CMD_RS;
+			td_cmd |= CI_TX_DESC_CMD_RS;
 
 			/* Update txq RS bit counters */
 			txq->nb_tx_used = 0;
 		}
 
 		txd->cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)td_cmd) <<
-					I40E_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S);
 	}
 
 end_of_tx:
@@ -1309,8 +1305,8 @@ i40e_tx_free_bufs(struct ci_tx_queue *txq)
 	const uint16_t m = tx_rs_thresh % I40E_TX_MAX_FREE_BUF_SZ;
 
 	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) !=
-			rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE))
+			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
 		return 0;
 
 	txep = &txq->sw_ring[txq->tx_next_dd - (tx_rs_thresh - 1)];
@@ -1441,8 +1437,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 		n = (uint16_t)(txq->nb_tx_desc - txq->tx_tail);
 		i40e_tx_fill_hw_ring(txq, tx_pkts, n);
 		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
-						I40E_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 		txq->tx_tail = 0;
 	}
@@ -1454,8 +1449,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	/* Determine if RS bit needs to be set */
 	if (txq->tx_tail > txq->tx_next_rs) {
 		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
-						I40E_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 		if (txq->tx_next_rs >= txq->nb_tx_desc)
@@ -2383,9 +2377,9 @@ i40e_dev_tx_descriptor_status(void *tx_queue, uint16_t offset)
 	}
 
 	status = &txq->ci_tx_ring[desc].cmd_type_offset_bsz;
-	mask = rte_le_to_cpu_64(I40E_TXD_QW1_DTYPE_MASK);
+	mask = rte_le_to_cpu_64(CI_TXD_QW1_DTYPE_M);
 	expect = rte_cpu_to_le_64(
-		I40E_TX_DESC_DTYPE_DESC_DONE << I40E_TXD_QW1_DTYPE_SHIFT);
+		CI_TX_DESC_DTYPE_DESC_DONE << CI_TXD_QW1_DTYPE_S);
 	if ((*status & mask) == expect)
 		return RTE_ETH_TX_DESC_DONE;
 
@@ -2883,7 +2877,7 @@ i40e_reset_tx_queue(struct ci_tx_queue *txq)
 		volatile struct ci_tx_desc *txd = &txq->ci_tx_ring[i];
 
 		txd->cmd_type_offset_bsz =
-			rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
 		txe[prev].next_id = i;
diff --git a/drivers/net/intel/i40e/i40e_rxtx.h b/drivers/net/intel/i40e/i40e_rxtx.h
index ed173d8f17..307ffa3049 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.h
+++ b/drivers/net/intel/i40e/i40e_rxtx.h
@@ -47,8 +47,8 @@
 #define I40E_RX_DESC_EXT_STATUS_FLEXBL_MASK   0x03
 #define I40E_RX_DESC_EXT_STATUS_FLEXBL_FLEX   0x01
 
-#define I40E_TD_CMD (I40E_TX_DESC_CMD_ICRC |\
-		     I40E_TX_DESC_CMD_EOP)
+#define I40E_TD_CMD (CI_TX_DESC_CMD_ICRC |\
+		     CI_TX_DESC_CMD_EOP)
 
 enum i40e_header_split_mode {
 	i40e_header_split_none = 0,
@@ -110,19 +110,6 @@ enum i40e_header_split_mode {
 
 #define I40E_TX_VECTOR_OFFLOADS RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE
 
-/** Offload features */
-union i40e_tx_offload {
-	uint64_t data;
-	struct {
-		uint64_t l2_len:7; /**< L2 (MAC) Header Length. */
-		uint64_t l3_len:9; /**< L3 (IP) Header Length. */
-		uint64_t l4_len:8; /**< L4 Header Length. */
-		uint64_t tso_segsz:16; /**< TCP TSO segment size */
-		uint64_t outer_l2_len:8; /**< outer L2 Header Length */
-		uint64_t outer_l3_len:16; /**< outer L3 Header Length */
-	};
-};
-
 int i40e_dev_rx_queue_start(struct rte_eth_dev *dev, uint16_t rx_queue_id);
 int i40e_dev_rx_queue_stop(struct rte_eth_dev *dev, uint16_t rx_queue_id);
 int i40e_dev_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id);
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
index 81e9e2bc0b..4c36748d94 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
@@ -449,9 +449,9 @@ static inline void
 vtx1(volatile struct ci_tx_desc *txdp,
 	struct rte_mbuf *pkt, uint64_t flags)
 {
-	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
-		((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT) |
-		((uint64_t)pkt->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+		((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	__vector unsigned long descriptor = (__vector unsigned long){
 		pkt->buf_iova + pkt->data_off, high_qw};
@@ -477,7 +477,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
-	uint64_t rs = I40E_TX_DESC_CMD_RS | I40E_TD_CMD;
+	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
 	int i;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
@@ -520,8 +520,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
-						I40E_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
index f054bd41bf..502a1842c6 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
@@ -684,9 +684,9 @@ static inline void
 vtx1(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf *pkt, uint64_t flags)
 {
-	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT) |
-			((uint64_t)pkt->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+			((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+			((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	__m128i descriptor = _mm_set_epi64x(high_qw,
 				pkt->buf_iova + pkt->data_off);
@@ -697,8 +697,7 @@ static inline void
 vtx(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
-	const uint64_t hi_qw_tmpl = (I40E_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags << CI_TXD_QW1_CMD_S));
 
 	/* if unaligned on 32-bit boundary, do one to align */
 	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
@@ -709,13 +708,13 @@ vtx(volatile struct ci_tx_desc *txdp,
 	/* do two at a time while possible, in bursts */
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
 		uint64_t hi_qw3 = hi_qw_tmpl |
-				((uint64_t)pkt[3]->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
+				((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		uint64_t hi_qw2 = hi_qw_tmpl |
-				((uint64_t)pkt[2]->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
+				((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		uint64_t hi_qw1 = hi_qw_tmpl |
-				((uint64_t)pkt[1]->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
+				((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		uint64_t hi_qw0 = hi_qw_tmpl |
-				((uint64_t)pkt[0]->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
+				((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 
 		__m256i desc2_3 = _mm256_set_epi64x(
 				hi_qw3, pkt[3]->buf_iova + pkt[3]->data_off,
@@ -743,7 +742,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
-	uint64_t rs = I40E_TX_DESC_CMD_RS | I40E_TD_CMD;
+	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
 		ci_tx_free_bufs_vec(txq, i40e_tx_desc_done, false);
@@ -785,8 +784,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
-						I40E_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
index 9a967faeee..d48ff9f51e 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
@@ -752,9 +752,9 @@ i40e_recv_scattered_pkts_vec_avx512(void *rx_queue,
 static inline void
 vtx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags)
 {
-	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
-		((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT) |
-		((uint64_t)pkt->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+		((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	__m128i descriptor = _mm_set_epi64x(high_qw,
 				pkt->buf_iova + pkt->data_off);
@@ -765,26 +765,17 @@ static inline void
 vtx(volatile struct ci_tx_desc *txdp,
 	struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
-	const uint64_t hi_qw_tmpl = (I40E_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags << CI_TXD_QW1_CMD_S));
 
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
-		uint64_t hi_qw3 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[3]->data_len <<
-			 I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
-		uint64_t hi_qw2 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[2]->data_len <<
-			 I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
-		uint64_t hi_qw1 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[1]->data_len <<
-			 I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
-		uint64_t hi_qw0 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[0]->data_len <<
-			 I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
+		uint64_t hi_qw3 = hi_qw_tmpl |
+			((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw2 = hi_qw_tmpl |
+			((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw1 = hi_qw_tmpl |
+			((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw0 = hi_qw_tmpl |
+			((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 
 		__m512i desc0_3 =
 			_mm512_set_epi64
@@ -811,7 +802,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
-	uint64_t rs = I40E_TX_DESC_CMD_RS | I40E_TD_CMD;
+	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
 		ci_tx_free_bufs_vec(txq, i40e_tx_desc_done, false);
@@ -854,8 +845,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
-						I40E_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_common.h b/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
index 1fd7fc75bf..292a39501e 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
@@ -16,8 +16,8 @@ static inline int
 i40e_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 {
 	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) ==
-				rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+				rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 }
 
 static inline void
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
index 0b95152232..be4c64942e 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
@@ -600,9 +600,9 @@ static inline void
 vtx1(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf *pkt, uint64_t flags)
 {
-	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT) |
-			((uint64_t)pkt->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+			((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+			((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off, high_qw};
 	vst1q_u64(RTE_CAST_PTR(uint64_t *, txdp), descriptor);
@@ -627,7 +627,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict tx_queue,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
-	uint64_t rs = I40E_TX_DESC_CMD_RS | I40E_TD_CMD;
+	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
 	int i;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
@@ -669,8 +669,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict tx_queue,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
-						I40E_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index 560abfc1ef..947b6c24d2 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -274,7 +274,7 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
 		txq->ci_tx_ring[i].cmd_type_offset_bsz =
-			rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
 		txe[prev].next_id = i;
@@ -2351,12 +2351,12 @@ iavf_fill_ctx_desc_cmd_field(volatile uint64_t *field, struct rte_mbuf *m,
 
 	/* TSO enabled */
 	if (m->ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
-		cmd = IAVF_TX_CTX_DESC_TSO << IAVF_TXD_CTX_QW1_CMD_SHIFT;
+		cmd = CI_TX_CTX_DESC_TSO << IAVF_TXD_CTX_QW1_CMD_SHIFT;
 
 	if ((m->ol_flags & RTE_MBUF_F_TX_VLAN &&
 			vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2) ||
 			m->ol_flags & RTE_MBUF_F_TX_QINQ) {
-		cmd |= IAVF_TX_CTX_DESC_IL2TAG2
+		cmd |= CI_TX_CTX_DESC_IL2TAG2
 			<< IAVF_TXD_CTX_QW1_CMD_SHIFT;
 	}
 
@@ -2577,20 +2577,20 @@ iavf_build_data_desc_cmd_offset_fields(volatile uint64_t *qw1,
 	uint64_t offset = 0;
 	uint64_t l2tag1 = 0;
 
-	*qw1 = IAVF_TX_DESC_DTYPE_DATA;
+	*qw1 = CI_TX_DESC_DTYPE_DATA;
 
-	command = (uint64_t)IAVF_TX_DESC_CMD_ICRC;
+	command = (uint64_t)CI_TX_DESC_CMD_ICRC;
 
 	/* Descriptor based VLAN insertion */
 	if ((vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1) &&
 			m->ol_flags & RTE_MBUF_F_TX_VLAN) {
-		command |= (uint64_t)IAVF_TX_DESC_CMD_IL2TAG1;
+		command |= (uint64_t)CI_TX_DESC_CMD_IL2TAG1;
 		l2tag1 |= m->vlan_tci;
 	}
 
 	/* Descriptor based QinQ insertion. vlan_flag specifies outer tag location. */
 	if (m->ol_flags & RTE_MBUF_F_TX_QINQ) {
-		command |= (uint64_t)IAVF_TX_DESC_CMD_IL2TAG1;
+		command |= (uint64_t)CI_TX_DESC_CMD_IL2TAG1;
 		l2tag1 = vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1 ? m->vlan_tci_outer :
 									m->vlan_tci;
 	}
@@ -2603,32 +2603,32 @@ iavf_build_data_desc_cmd_offset_fields(volatile uint64_t *qw1,
 	if (m->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK &&
 			!(m->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD))
 		offset |= (m->outer_l2_len >> 1)
-			<< IAVF_TX_DESC_LENGTH_MACLEN_SHIFT;
+			<< CI_TX_DESC_LEN_MACLEN_S;
 	else
 		offset |= (m->l2_len >> 1)
-			<< IAVF_TX_DESC_LENGTH_MACLEN_SHIFT;
+			<< CI_TX_DESC_LEN_MACLEN_S;
 
 	/* Enable L3 checksum offloading inner */
 	if (m->ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
 		if (m->ol_flags & RTE_MBUF_F_TX_IPV4) {
-			command |= IAVF_TX_DESC_CMD_IIPT_IPV4_CSUM;
-			offset |= (m->l3_len >> 2) << IAVF_TX_DESC_LENGTH_IPLEN_SHIFT;
+			command |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
+			offset |= (m->l3_len >> 2) << CI_TX_DESC_LEN_IPLEN_S;
 		}
 	} else if (m->ol_flags & RTE_MBUF_F_TX_IPV4) {
-		command |= IAVF_TX_DESC_CMD_IIPT_IPV4;
-		offset |= (m->l3_len >> 2) << IAVF_TX_DESC_LENGTH_IPLEN_SHIFT;
+		command |= CI_TX_DESC_CMD_IIPT_IPV4;
+		offset |= (m->l3_len >> 2) << CI_TX_DESC_LEN_IPLEN_S;
 	} else if (m->ol_flags & RTE_MBUF_F_TX_IPV6) {
-		command |= IAVF_TX_DESC_CMD_IIPT_IPV6;
-		offset |= (m->l3_len >> 2) << IAVF_TX_DESC_LENGTH_IPLEN_SHIFT;
+		command |= CI_TX_DESC_CMD_IIPT_IPV6;
+		offset |= (m->l3_len >> 2) << CI_TX_DESC_LEN_IPLEN_S;
 	}
 
 	if (m->ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) {
 		if (m->ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-			command |= IAVF_TX_DESC_CMD_L4T_EOFT_TCP;
+			command |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
 		else
-			command |= IAVF_TX_DESC_CMD_L4T_EOFT_UDP;
+			command |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
 		offset |= (m->l4_len >> 2) <<
-			      IAVF_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+			      CI_TX_DESC_LEN_L4_LEN_S;
 
 		*qw1 = rte_cpu_to_le_64((((uint64_t)command <<
 			IAVF_TXD_DATA_QW1_CMD_SHIFT) & IAVF_TXD_DATA_QW1_CMD_MASK) |
@@ -2642,19 +2642,19 @@ iavf_build_data_desc_cmd_offset_fields(volatile uint64_t *qw1,
 	/* Enable L4 checksum offloads */
 	switch (m->ol_flags & RTE_MBUF_F_TX_L4_MASK) {
 	case RTE_MBUF_F_TX_TCP_CKSUM:
-		command |= IAVF_TX_DESC_CMD_L4T_EOFT_TCP;
+		command |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
 		offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
-				IAVF_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+				CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_SCTP_CKSUM:
-		command |= IAVF_TX_DESC_CMD_L4T_EOFT_SCTP;
+		command |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
 		offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
-				IAVF_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+				CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_UDP_CKSUM:
-		command |= IAVF_TX_DESC_CMD_L4T_EOFT_UDP;
+		command |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
 		offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
-				IAVF_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+				CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	}
 
@@ -2674,8 +2674,7 @@ iavf_calc_pkt_desc(struct rte_mbuf *tx_pkt)
 	uint16_t count = 0;
 
 	while (txd != NULL) {
-		count += (txd->data_len + IAVF_MAX_DATA_PER_TXD - 1) /
-			IAVF_MAX_DATA_PER_TXD;
+		count += (txd->data_len + CI_MAX_DATA_PER_TXD - 1) / CI_MAX_DATA_PER_TXD;
 		txd = txd->next;
 	}
 
@@ -2881,14 +2880,14 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			buf_dma_addr = rte_mbuf_data_iova(mb_seg);
 			while ((mb_seg->ol_flags & (RTE_MBUF_F_TX_TCP_SEG |
 					RTE_MBUF_F_TX_UDP_SEG)) &&
-					unlikely(slen > IAVF_MAX_DATA_PER_TXD)) {
+					unlikely(slen > CI_MAX_DATA_PER_TXD)) {
 				iavf_fill_data_desc(ddesc, ddesc_template,
-					IAVF_MAX_DATA_PER_TXD, buf_dma_addr);
+					CI_MAX_DATA_PER_TXD, buf_dma_addr);
 
 				IAVF_DUMP_TX_DESC(txq, ddesc, desc_idx);
 
-				buf_dma_addr += IAVF_MAX_DATA_PER_TXD;
-				slen -= IAVF_MAX_DATA_PER_TXD;
+				buf_dma_addr += CI_MAX_DATA_PER_TXD;
+				slen -= CI_MAX_DATA_PER_TXD;
 
 				txe->last_id = desc_idx_last;
 				desc_idx = txe->next_id;
@@ -2909,7 +2908,7 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		} while (mb_seg);
 
 		/* The last packet data descriptor needs End Of Packet (EOP) */
-		ddesc_cmd = IAVF_TX_DESC_CMD_EOP;
+		ddesc_cmd = CI_TX_DESC_CMD_EOP;
 
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_desc_required);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_desc_required);
@@ -2919,7 +2918,7 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 				   "%4u (port=%d queue=%d)",
 				   desc_idx_last, txq->port_id, txq->queue_id);
 
-			ddesc_cmd |= IAVF_TX_DESC_CMD_RS;
+			ddesc_cmd |= CI_TX_DESC_CMD_RS;
 
 			/* Update txq RS bit counters */
 			txq->nb_tx_used = 0;
@@ -4423,9 +4422,8 @@ iavf_dev_tx_desc_status(void *tx_queue, uint16_t offset)
 	}
 
 	status = &txq->ci_tx_ring[desc].cmd_type_offset_bsz;
-	mask = rte_le_to_cpu_64(IAVF_TXD_QW1_DTYPE_MASK);
-	expect = rte_cpu_to_le_64(
-		 IAVF_TX_DESC_DTYPE_DESC_DONE << IAVF_TXD_QW1_DTYPE_SHIFT);
+	mask = rte_le_to_cpu_64(CI_TXD_QW1_DTYPE_M);
+	expect = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE << CI_TXD_QW1_DTYPE_S);
 	if ((*status & mask) == expect)
 		return RTE_ETH_TX_DESC_DONE;
 
diff --git a/drivers/net/intel/iavf/iavf_rxtx.h b/drivers/net/intel/iavf/iavf_rxtx.h
index dd6d884fc1..395d97b4ee 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.h
+++ b/drivers/net/intel/iavf/iavf_rxtx.h
@@ -162,10 +162,6 @@
 #define IAVF_TX_OFFLOAD_NOTSUP_MASK \
 		(RTE_MBUF_F_TX_OFFLOAD_MASK ^ IAVF_TX_OFFLOAD_MASK)
 
-/* HW requires that TX buffer size ranges from 1B up to (16K-1)B. */
-#define IAVF_MAX_DATA_PER_TXD \
-	(IAVF_TXD_QW1_TX_BUF_SZ_MASK >> IAVF_TXD_QW1_TX_BUF_SZ_SHIFT)
-
 #define IAVF_TX_LLDP_DYNFIELD "intel_pmd_dynfield_tx_lldp"
 #define IAVF_CHECK_TX_LLDP(m) \
 	((rte_pmd_iavf_tx_lldp_dynfield_offset > 0) && \
@@ -195,18 +191,6 @@ struct iavf_rx_queue_stats {
 	struct iavf_ipsec_crypto_stats ipsec_crypto;
 };
 
-/* Offload features */
-union iavf_tx_offload {
-	uint64_t data;
-	struct {
-		uint64_t l2_len:7; /* L2 (MAC) Header Length. */
-		uint64_t l3_len:9; /* L3 (IP) Header Length. */
-		uint64_t l4_len:8; /* L4 Header Length. */
-		uint64_t tso_segsz:16; /* TCP TSO segment size */
-		/* uint64_t unused : 24; */
-	};
-};
-
 /* Rx Flex Descriptor
  * RxDID Profile ID 16-21
  * Flex-field 0: RSS hash lower 16-bits
@@ -409,7 +393,7 @@ enum iavf_rx_flex_desc_ipsec_crypto_status {
 
 
 #define IAVF_TXD_DATA_QW1_DTYPE_SHIFT	(0)
-#define IAVF_TXD_DATA_QW1_DTYPE_MASK	(0xFUL << IAVF_TXD_QW1_DTYPE_SHIFT)
+#define IAVF_TXD_DATA_QW1_DTYPE_MASK	(0xFUL << CI_TXD_QW1_DTYPE_S)
 
 #define IAVF_TXD_DATA_QW1_CMD_SHIFT	(4)
 #define IAVF_TXD_DATA_QW1_CMD_MASK	(0x3FFUL << IAVF_TXD_DATA_QW1_CMD_SHIFT)
@@ -686,7 +670,7 @@ void iavf_dump_tx_descriptor(const struct ci_tx_queue *txq,
 		rte_le_to_cpu_64(tx_desc->cmd_type_offset_bsz &
 			rte_cpu_to_le_64(IAVF_TXD_DATA_QW1_DTYPE_MASK));
 	switch (type) {
-	case IAVF_TX_DESC_DTYPE_DATA:
+	case CI_TX_DESC_DTYPE_DATA:
 		name = "Tx_data_desc";
 		break;
 	case IAVF_TX_DESC_DTYPE_CONTEXT:
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
index 89ce841b9e..cea4ee9863 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
@@ -1633,10 +1633,9 @@ static __rte_always_inline void
 iavf_vtx1(volatile struct ci_tx_desc *txdp,
 	  struct rte_mbuf *pkt, uint64_t flags, bool offload, uint8_t vlan_flag)
 {
-	uint64_t high_qw =
-		(IAVF_TX_DESC_DTYPE_DATA |
-		 ((uint64_t)flags  << IAVF_TXD_QW1_CMD_SHIFT) |
-		 ((uint64_t)pkt->data_len << IAVF_TXD_QW1_TX_BUF_SZ_SHIFT));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		 ((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+		 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 	if (offload)
 		iavf_txd_enable_offload(pkt, &high_qw, vlan_flag);
 
@@ -1649,8 +1648,7 @@ static __rte_always_inline void
 iavf_vtx(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags, bool offload, uint8_t vlan_flag)
 {
-	const uint64_t hi_qw_tmpl = (IAVF_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << IAVF_TXD_QW1_CMD_SHIFT));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags << CI_TXD_QW1_CMD_S));
 
 	/* if unaligned on 32-bit boundary, do one to align */
 	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
@@ -1660,28 +1658,20 @@ iavf_vtx(volatile struct ci_tx_desc *txdp,
 
 	/* do two at a time while possible, in bursts */
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
-		uint64_t hi_qw3 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[3]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
+		uint64_t hi_qw3 = hi_qw_tmpl |
+			((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			iavf_txd_enable_offload(pkt[3], &hi_qw3, vlan_flag);
-		uint64_t hi_qw2 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[2]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
+		uint64_t hi_qw2 = hi_qw_tmpl |
+			((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			iavf_txd_enable_offload(pkt[2], &hi_qw2, vlan_flag);
-		uint64_t hi_qw1 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[1]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
+		uint64_t hi_qw1 = hi_qw_tmpl |
+			((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			iavf_txd_enable_offload(pkt[1], &hi_qw1, vlan_flag);
-		uint64_t hi_qw0 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[0]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
+		uint64_t hi_qw0 = hi_qw_tmpl |
+			((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			iavf_txd_enable_offload(pkt[0], &hi_qw0, vlan_flag);
 
@@ -1717,8 +1707,8 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	/* bit2 is reserved and must be set to 1 according to Spec */
-	uint64_t flags = IAVF_TX_DESC_CMD_EOP | IAVF_TX_DESC_CMD_ICRC;
-	uint64_t rs = IAVF_TX_DESC_CMD_RS | flags;
+	uint64_t flags = CI_TX_DESC_CMD_EOP | CI_TX_DESC_CMD_ICRC;
+	uint64_t rs = CI_TX_DESC_CMD_RS | flags;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
 		ci_tx_free_bufs_vec(txq, iavf_tx_desc_done, false);
@@ -1761,8 +1751,7 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
-					 IAVF_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c b/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
index ad1b0b90cd..01477fd501 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
@@ -1844,10 +1844,9 @@ iavf_vtx1(volatile struct ci_tx_desc *txdp,
 	  struct rte_mbuf *pkt, uint64_t flags,
 	  bool offload, uint8_t vlan_flag)
 {
-	uint64_t high_qw =
-		(IAVF_TX_DESC_DTYPE_DATA |
-		 ((uint64_t)flags  << IAVF_TXD_QW1_CMD_SHIFT) |
-		 ((uint64_t)pkt->data_len << IAVF_TXD_QW1_TX_BUF_SZ_SHIFT));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		 ((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+		 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 	if (offload)
 		iavf_txd_enable_offload(pkt, &high_qw, vlan_flag);
 
@@ -1863,8 +1862,7 @@ iavf_vtx(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags,
 		bool offload, uint8_t vlan_flag)
 {
-	const uint64_t hi_qw_tmpl = (IAVF_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << IAVF_TXD_QW1_CMD_SHIFT));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags << CI_TXD_QW1_CMD_S));
 
 	/* if unaligned on 32-bit boundary, do one to align */
 	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
@@ -1874,22 +1872,14 @@ iavf_vtx(volatile struct ci_tx_desc *txdp,
 
 	/* do 4 at a time while possible, in bursts */
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
-		uint64_t hi_qw3 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[3]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
-		uint64_t hi_qw2 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[2]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
-		uint64_t hi_qw1 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[1]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
-		uint64_t hi_qw0 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[0]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
+		uint64_t hi_qw3 = hi_qw_tmpl |
+			((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw2 = hi_qw_tmpl |
+			((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw1 = hi_qw_tmpl |
+			((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw0 = hi_qw_tmpl |
+			((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload) {
 			iavf_txd_enable_offload(pkt[3], &hi_qw3, vlan_flag);
 			iavf_txd_enable_offload(pkt[2], &hi_qw2, vlan_flag);
@@ -2093,9 +2083,9 @@ ctx_vtx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf *pkt,
 	if (IAVF_CHECK_TX_LLDP(pkt))
 		high_ctx_qw |= IAVF_TX_CTX_DESC_SWTCH_UPLINK
 			<< IAVF_TXD_CTX_QW1_CMD_SHIFT;
-	uint64_t high_data_qw = (IAVF_TX_DESC_DTYPE_DATA |
-				((uint64_t)flags  << IAVF_TXD_QW1_CMD_SHIFT) |
-				((uint64_t)pkt->data_len << IAVF_TXD_QW1_TX_BUF_SZ_SHIFT));
+	uint64_t high_data_qw = (CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+				((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 	if (offload)
 		iavf_txd_enable_offload(pkt, &high_data_qw, vlan_flag);
 
@@ -2110,8 +2100,7 @@ ctx_vtx(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags,
 		bool offload, uint8_t vlan_flag)
 {
-	uint64_t hi_data_qw_tmpl = (IAVF_TX_DESC_DTYPE_DATA |
-					((uint64_t)flags  << IAVF_TXD_QW1_CMD_SHIFT));
+	uint64_t hi_data_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags << CI_TXD_QW1_CMD_S));
 
 	/* if unaligned on 32-bit boundary, do one to align */
 	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
@@ -2128,11 +2117,9 @@ ctx_vtx(volatile struct ci_tx_desc *txdp,
 		uint64_t hi_data_qw0 = 0;
 
 		hi_data_qw1 = hi_data_qw_tmpl |
-				((uint64_t)pkt[1]->data_len <<
-					IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
+				((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		hi_data_qw0 = hi_data_qw_tmpl |
-				((uint64_t)pkt[0]->data_len <<
-					IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
+				((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 
 #ifdef IAVF_TX_VLAN_QINQ_OFFLOAD
 		if (offload) {
@@ -2140,13 +2127,11 @@ ctx_vtx(volatile struct ci_tx_desc *txdp,
 				uint64_t qinq_tag = vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2 ?
 					(uint64_t)pkt[1]->vlan_tci :
 					(uint64_t)pkt[1]->vlan_tci_outer;
-				hi_ctx_qw1 |= IAVF_TX_CTX_DESC_IL2TAG2 <<
-						IAVF_TXD_CTX_QW1_CMD_SHIFT;
+				hi_ctx_qw1 |= CI_TX_CTX_DESC_IL2TAG2 << CI_TXD_QW1_CMD_S;
 				low_ctx_qw1 |= qinq_tag << IAVF_TXD_CTX_QW0_L2TAG2_PARAM;
 			} else if (pkt[1]->ol_flags & RTE_MBUF_F_TX_VLAN &&
 					vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2) {
-				hi_ctx_qw1 |=
-					IAVF_TX_CTX_DESC_IL2TAG2 << IAVF_TXD_CTX_QW1_CMD_SHIFT;
+				hi_ctx_qw1 |= IAVF_TX_CTX_DESC_IL2TAG2 << CI_TXD_QW1_CMD_S;
 				low_ctx_qw1 |=
 					(uint64_t)pkt[1]->vlan_tci << IAVF_TXD_CTX_QW0_L2TAG2_PARAM;
 			}
@@ -2154,7 +2139,7 @@ ctx_vtx(volatile struct ci_tx_desc *txdp,
 #endif
 		if (IAVF_CHECK_TX_LLDP(pkt[1]))
 			hi_ctx_qw1 |= IAVF_TX_CTX_DESC_SWTCH_UPLINK
-				<< IAVF_TXD_CTX_QW1_CMD_SHIFT;
+				<< CI_TXD_QW1_CMD_S;
 
 #ifdef IAVF_TX_VLAN_QINQ_OFFLOAD
 		if (offload) {
@@ -2162,21 +2147,18 @@ ctx_vtx(volatile struct ci_tx_desc *txdp,
 				uint64_t qinq_tag = vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2 ?
 					(uint64_t)pkt[0]->vlan_tci :
 					(uint64_t)pkt[0]->vlan_tci_outer;
-				hi_ctx_qw0 |= IAVF_TX_CTX_DESC_IL2TAG2 <<
-						IAVF_TXD_CTX_QW1_CMD_SHIFT;
+				hi_ctx_qw0 |= IAVF_TX_CTX_DESC_IL2TAG2 << CI_TXD_QW1_CMD_S;
 				low_ctx_qw0 |= qinq_tag << IAVF_TXD_CTX_QW0_L2TAG2_PARAM;
 			} else if (pkt[0]->ol_flags & RTE_MBUF_F_TX_VLAN &&
 					vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2) {
-				hi_ctx_qw0 |=
-					IAVF_TX_CTX_DESC_IL2TAG2 << IAVF_TXD_CTX_QW1_CMD_SHIFT;
+				hi_ctx_qw0 |= IAVF_TX_CTX_DESC_IL2TAG2 << CI_TXD_QW1_CMD_S;
 				low_ctx_qw0 |=
 					(uint64_t)pkt[0]->vlan_tci << IAVF_TXD_CTX_QW0_L2TAG2_PARAM;
 			}
 		}
 #endif
 		if (IAVF_CHECK_TX_LLDP(pkt[0]))
-			hi_ctx_qw0 |= IAVF_TX_CTX_DESC_SWTCH_UPLINK
-				<< IAVF_TXD_CTX_QW1_CMD_SHIFT;
+			hi_ctx_qw0 |= IAVF_TX_CTX_DESC_SWTCH_UPLINK << CI_TXD_QW1_CMD_S;
 
 		if (offload) {
 			iavf_txd_enable_offload(pkt[1], &hi_data_qw1, vlan_flag);
@@ -2207,8 +2189,8 @@ iavf_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	/* bit2 is reserved and must be set to 1 according to Spec */
-	uint64_t flags = IAVF_TX_DESC_CMD_EOP | IAVF_TX_DESC_CMD_ICRC;
-	uint64_t rs = IAVF_TX_DESC_CMD_RS | flags;
+	uint64_t flags = CI_TX_DESC_CMD_EOP | CI_TX_DESC_CMD_ICRC;
+	uint64_t rs = CI_TX_DESC_CMD_RS | flags;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
 		ci_tx_free_bufs_vec(txq, iavf_tx_desc_done, false);
@@ -2253,8 +2235,7 @@ iavf_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
-					 IAVF_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
@@ -2275,8 +2256,8 @@ iavf_xmit_fixed_burst_vec_avx512_ctx(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, nb_mbuf, tx_id;
 	/* bit2 is reserved and must be set to 1 according to Spec */
-	uint64_t flags = IAVF_TX_DESC_CMD_EOP | IAVF_TX_DESC_CMD_ICRC;
-	uint64_t rs = IAVF_TX_DESC_CMD_RS | flags;
+	uint64_t flags = CI_TX_DESC_CMD_EOP | CI_TX_DESC_CMD_ICRC;
+	uint64_t rs = CI_TX_DESC_CMD_RS | flags;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
 		ci_tx_free_bufs_vec(txq, iavf_tx_desc_done, true);
@@ -2321,8 +2302,7 @@ iavf_xmit_fixed_burst_vec_avx512_ctx(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
-					 IAVF_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_common.h b/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
index 1832b76f89..1538a44892 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
@@ -15,8 +15,8 @@ static inline int
 iavf_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 {
 	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(IAVF_TXD_QW1_DTYPE_MASK)) ==
-				rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+				rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 }
 
 static inline void
@@ -147,26 +147,26 @@ iavf_txd_enable_offload(__rte_unused struct rte_mbuf *tx_pkt,
 	/* Set MACLEN */
 	if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)
 		td_offset |= (tx_pkt->outer_l2_len >> 1)
-			<< IAVF_TX_DESC_LENGTH_MACLEN_SHIFT;
+			<< CI_TX_DESC_LEN_MACLEN_S;
 	else
 		td_offset |= (tx_pkt->l2_len >> 1)
-			<< IAVF_TX_DESC_LENGTH_MACLEN_SHIFT;
+			<< CI_TX_DESC_LEN_MACLEN_S;
 
 	/* Enable L3 checksum offloads */
 	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
 		if (ol_flags & RTE_MBUF_F_TX_IPV4) {
-			td_cmd |= IAVF_TX_DESC_CMD_IIPT_IPV4_CSUM;
+			td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
 			td_offset |= (tx_pkt->l3_len >> 2) <<
-				     IAVF_TX_DESC_LENGTH_IPLEN_SHIFT;
+				     CI_TX_DESC_LEN_IPLEN_S;
 		}
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
-		td_cmd |= IAVF_TX_DESC_CMD_IIPT_IPV4;
+		td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
 		td_offset |= (tx_pkt->l3_len >> 2) <<
-			     IAVF_TX_DESC_LENGTH_IPLEN_SHIFT;
+			     CI_TX_DESC_LEN_IPLEN_S;
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
-		td_cmd |= IAVF_TX_DESC_CMD_IIPT_IPV6;
+		td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
 		td_offset |= (tx_pkt->l3_len >> 2) <<
-			     IAVF_TX_DESC_LENGTH_IPLEN_SHIFT;
+			     CI_TX_DESC_LEN_IPLEN_S;
 	}
 
 	/* Enable L4 checksum offloads */
@@ -190,7 +190,7 @@ iavf_txd_enable_offload(__rte_unused struct rte_mbuf *tx_pkt,
 		break;
 	}
 
-	*txd_hi |= ((uint64_t)td_offset) << IAVF_TXD_QW1_OFFSET_SHIFT;
+	*txd_hi |= ((uint64_t)td_offset) << CI_TXD_QW1_OFFSET_S;
 #endif
 
 #ifdef IAVF_TX_VLAN_QINQ_OFFLOAD
@@ -198,17 +198,15 @@ iavf_txd_enable_offload(__rte_unused struct rte_mbuf *tx_pkt,
 		td_cmd |= IAVF_TX_DESC_CMD_IL2TAG1;
 		/* vlan_flag specifies outer tag location for QinQ. */
 		if (vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1)
-			*txd_hi |= ((uint64_t)tx_pkt->vlan_tci_outer <<
-					IAVF_TXD_QW1_L2TAG1_SHIFT);
+			*txd_hi |= ((uint64_t)tx_pkt->vlan_tci_outer << CI_TXD_QW1_L2TAG1_S);
 		else
-			*txd_hi |= ((uint64_t)tx_pkt->vlan_tci <<
-					IAVF_TXD_QW1_L2TAG1_SHIFT);
+			*txd_hi |= ((uint64_t)tx_pkt->vlan_tci << CI_TXD_QW1_L2TAG1_S);
 	} else if (ol_flags & RTE_MBUF_F_TX_VLAN && vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1) {
-		td_cmd |= IAVF_TX_DESC_CMD_IL2TAG1;
-		*txd_hi |= ((uint64_t)tx_pkt->vlan_tci << IAVF_TXD_QW1_L2TAG1_SHIFT);
+		td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
+		*txd_hi |= ((uint64_t)tx_pkt->vlan_tci << CI_TXD_QW1_L2TAG1_S);
 	}
 #endif
 
-	*txd_hi |= ((uint64_t)td_cmd) << IAVF_TXD_QW1_CMD_SHIFT;
+	*txd_hi |= ((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S;
 }
 #endif
diff --git a/drivers/net/intel/ice/ice_dcf_ethdev.c b/drivers/net/intel/ice/ice_dcf_ethdev.c
index 5f537b4c12..4ceecc15c6 100644
--- a/drivers/net/intel/ice/ice_dcf_ethdev.c
+++ b/drivers/net/intel/ice/ice_dcf_ethdev.c
@@ -406,7 +406,7 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
 		txq->ci_tx_ring[i].cmd_type_offset_bsz =
-			rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
 		txe[prev].next_id = i;
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 7a33e1e980..52bbf95967 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -1124,7 +1124,7 @@ ice_reset_tx_queue(struct ci_tx_queue *txq)
 		volatile struct ci_tx_desc *txd = &txq->ci_tx_ring[i];
 
 		txd->cmd_type_offset_bsz =
-			rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
 		txe[prev].next_id = i;
@@ -2556,9 +2556,8 @@ ice_tx_descriptor_status(void *tx_queue, uint16_t offset)
 	}
 
 	status = &txq->ci_tx_ring[desc].cmd_type_offset_bsz;
-	mask = rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M);
-	expect = rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE <<
-				  ICE_TXD_QW1_DTYPE_S);
+	mask = rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M);
+	expect = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE << CI_TXD_QW1_DTYPE_S);
 	if ((*status & mask) == expect)
 		return RTE_ETH_TX_DESC_DONE;
 
@@ -2904,7 +2903,7 @@ ice_recv_pkts(void *rx_queue,
 
 static inline void
 ice_parse_tunneling_params(uint64_t ol_flags,
-			    union ice_tx_offload tx_offload,
+			    union ci_tx_offload tx_offload,
 			    uint32_t *cd_tunneling)
 {
 	/* EIPT: External (outer) IP header type */
@@ -2965,58 +2964,58 @@ static inline void
 ice_txd_enable_checksum(uint64_t ol_flags,
 			uint32_t *td_cmd,
 			uint32_t *td_offset,
-			union ice_tx_offload tx_offload)
+			union ci_tx_offload tx_offload)
 {
 	/* Set MACLEN */
 	if (!(ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK))
 		*td_offset |= (tx_offload.l2_len >> 1)
-			<< ICE_TX_DESC_LEN_MACLEN_S;
+			<< CI_TX_DESC_LEN_MACLEN_S;
 
 	/* Enable L3 checksum offloads */
 	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
-		*td_cmd |= ICE_TX_DESC_CMD_IIPT_IPV4_CSUM;
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
 		*td_offset |= (tx_offload.l3_len >> 2) <<
-			ICE_TX_DESC_LEN_IPLEN_S;
+			CI_TX_DESC_LEN_IPLEN_S;
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
-		*td_cmd |= ICE_TX_DESC_CMD_IIPT_IPV4;
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
 		*td_offset |= (tx_offload.l3_len >> 2) <<
-			ICE_TX_DESC_LEN_IPLEN_S;
+			CI_TX_DESC_LEN_IPLEN_S;
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
-		*td_cmd |= ICE_TX_DESC_CMD_IIPT_IPV6;
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
 		*td_offset |= (tx_offload.l3_len >> 2) <<
-			ICE_TX_DESC_LEN_IPLEN_S;
+			CI_TX_DESC_LEN_IPLEN_S;
 	}
 
 	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
-		*td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_TCP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
 		*td_offset |= (tx_offload.l4_len >> 2) <<
-			      ICE_TX_DESC_LEN_L4_LEN_S;
+			      CI_TX_DESC_LEN_L4_LEN_S;
 		return;
 	}
 
 	if (ol_flags & RTE_MBUF_F_TX_UDP_SEG) {
-		*td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_UDP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
 		*td_offset |= (tx_offload.l4_len >> 2) <<
-			      ICE_TX_DESC_LEN_L4_LEN_S;
+			      CI_TX_DESC_LEN_L4_LEN_S;
 		return;
 	}
 
 	/* Enable L4 checksum offloads */
 	switch (ol_flags & RTE_MBUF_F_TX_L4_MASK) {
 	case RTE_MBUF_F_TX_TCP_CKSUM:
-		*td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_TCP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
 		*td_offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
-			      ICE_TX_DESC_LEN_L4_LEN_S;
+			      CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_SCTP_CKSUM:
-		*td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_SCTP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
 		*td_offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
-			      ICE_TX_DESC_LEN_L4_LEN_S;
+			      CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_UDP_CKSUM:
-		*td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_UDP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
 		*td_offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
-			      ICE_TX_DESC_LEN_L4_LEN_S;
+			      CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	default:
 		break;
@@ -3030,11 +3029,11 @@ ice_build_ctob(uint32_t td_cmd,
 	       uint16_t size,
 	       uint32_t td_tag)
 {
-	return rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DATA |
-				((uint64_t)td_cmd << ICE_TXD_QW1_CMD_S) |
-				((uint64_t)td_offset << ICE_TXD_QW1_OFFSET_S) |
-				((uint64_t)size << ICE_TXD_QW1_TX_BUF_SZ_S) |
-				((uint64_t)td_tag << ICE_TXD_QW1_L2TAG1_S));
+	return rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
+				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+				((uint64_t)size << CI_TXD_QW1_TX_BUF_SZ_S) |
+				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
 }
 
 /* Check if the context descriptor is needed for TX offloading */
@@ -3053,7 +3052,7 @@ ice_calc_context_desc(uint64_t flags)
 
 /* set ice TSO context descriptor */
 static inline uint64_t
-ice_set_tso_ctx(struct rte_mbuf *mbuf, union ice_tx_offload tx_offload)
+ice_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 {
 	uint64_t ctx_desc = 0;
 	uint32_t cd_cmd, hdr_len, cd_tso_len;
@@ -3067,18 +3066,15 @@ ice_set_tso_ctx(struct rte_mbuf *mbuf, union ice_tx_offload tx_offload)
 	hdr_len += (mbuf->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) ?
 		   tx_offload.outer_l2_len + tx_offload.outer_l3_len : 0;
 
-	cd_cmd = ICE_TX_CTX_DESC_TSO;
+	cd_cmd = CI_TX_CTX_DESC_TSO;
 	cd_tso_len = mbuf->pkt_len - hdr_len;
-	ctx_desc |= ((uint64_t)cd_cmd << ICE_TXD_CTX_QW1_CMD_S) |
+	ctx_desc |= ((uint64_t)cd_cmd << CI_TXD_QW1_CMD_S) |
 		    ((uint64_t)cd_tso_len << ICE_TXD_CTX_QW1_TSO_LEN_S) |
 		    ((uint64_t)mbuf->tso_segsz << ICE_TXD_CTX_QW1_MSS_S);
 
 	return ctx_desc;
 }
 
-/* HW requires that TX buffer size ranges from 1B up to (16K-1)B. */
-#define ICE_MAX_DATA_PER_TXD \
-	(ICE_TXD_QW1_TX_BUF_SZ_M >> ICE_TXD_QW1_TX_BUF_SZ_S)
 /* Calculate the number of TX descriptors needed for each pkt */
 static inline uint16_t
 ice_calc_pkt_desc(struct rte_mbuf *tx_pkt)
@@ -3087,7 +3083,7 @@ ice_calc_pkt_desc(struct rte_mbuf *tx_pkt)
 	uint16_t count = 0;
 
 	while (txd != NULL) {
-		count += DIV_ROUND_UP(txd->data_len, ICE_MAX_DATA_PER_TXD);
+		count += DIV_ROUND_UP(txd->data_len, CI_MAX_DATA_PER_TXD);
 		txd = txd->next;
 	}
 
@@ -3117,7 +3113,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	uint16_t slen;
 	uint64_t buf_dma_addr;
 	uint64_t ol_flags;
-	union ice_tx_offload tx_offload = {0};
+	union ci_tx_offload tx_offload = {0};
 
 	txq = tx_queue;
 	sw_ring = txq->sw_ring;
@@ -3185,7 +3181,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		/* Descriptor based VLAN insertion */
 		if (ol_flags & (RTE_MBUF_F_TX_VLAN | RTE_MBUF_F_TX_QINQ)) {
-			td_cmd |= ICE_TX_DESC_CMD_IL2TAG1;
+			td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
 			td_tag = tx_pkt->vlan_tci;
 		}
 
@@ -3193,7 +3189,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		cd_tunneling_params = 0;
 		if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) {
 			td_offset |= (tx_offload.outer_l2_len >> 1)
-				<< ICE_TX_DESC_LEN_MACLEN_S;
+				<< CI_TX_DESC_LEN_MACLEN_S;
 			ice_parse_tunneling_params(ol_flags, tx_offload,
 						   &cd_tunneling_params);
 		}
@@ -3223,8 +3219,8 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 					ice_set_tso_ctx(tx_pkt, tx_offload);
 			else if (ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST)
 				cd_type_cmd_tso_mss |=
-					((uint64_t)ICE_TX_CTX_DESC_TSYN <<
-					ICE_TXD_CTX_QW1_CMD_S) |
+					((uint64_t)CI_TX_CTX_DESC_TSYN <<
+					CI_TXD_QW1_CMD_S) |
 					 (((uint64_t)txq->ice_vsi->adapter->ptp_tx_index <<
 					 ICE_TXD_CTX_QW1_TSYN_S) & ICE_TXD_CTX_QW1_TSYN_M);
 
@@ -3235,8 +3231,8 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			if (ol_flags & RTE_MBUF_F_TX_QINQ) {
 				cd_l2tag2 = tx_pkt->vlan_tci_outer;
 				cd_type_cmd_tso_mss |=
-					((uint64_t)ICE_TX_CTX_DESC_IL2TAG2 <<
-					 ICE_TXD_CTX_QW1_CMD_S);
+					((uint64_t)CI_TX_CTX_DESC_IL2TAG2 <<
+					 CI_TXD_QW1_CMD_S);
 			}
 			ctx_txd->l2tag2 = rte_cpu_to_le_16(cd_l2tag2);
 			ctx_txd->qw1 =
@@ -3261,18 +3257,16 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			buf_dma_addr = rte_mbuf_data_iova(m_seg);
 
 			while ((ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) &&
-				unlikely(slen > ICE_MAX_DATA_PER_TXD)) {
+					unlikely(slen > CI_MAX_DATA_PER_TXD)) {
 				txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-				txd->cmd_type_offset_bsz =
-				rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DATA |
-				((uint64_t)td_cmd << ICE_TXD_QW1_CMD_S) |
-				((uint64_t)td_offset << ICE_TXD_QW1_OFFSET_S) |
-				((uint64_t)ICE_MAX_DATA_PER_TXD <<
-				 ICE_TXD_QW1_TX_BUF_SZ_S) |
-				((uint64_t)td_tag << ICE_TXD_QW1_L2TAG1_S));
+				txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+					((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
+					((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+					((uint64_t)CI_MAX_DATA_PER_TXD << CI_TXD_QW1_TX_BUF_SZ_S) |
+					((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
 
-				buf_dma_addr += ICE_MAX_DATA_PER_TXD;
-				slen -= ICE_MAX_DATA_PER_TXD;
+				buf_dma_addr += CI_MAX_DATA_PER_TXD;
+				slen -= CI_MAX_DATA_PER_TXD;
 
 				txe->last_id = tx_last;
 				tx_id = txe->next_id;
@@ -3282,12 +3276,11 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			}
 
 			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-			txd->cmd_type_offset_bsz =
-				rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DATA |
-				((uint64_t)td_cmd << ICE_TXD_QW1_CMD_S) |
-				((uint64_t)td_offset << ICE_TXD_QW1_OFFSET_S) |
-				((uint64_t)slen << ICE_TXD_QW1_TX_BUF_SZ_S) |
-				((uint64_t)td_tag << ICE_TXD_QW1_L2TAG1_S));
+			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
+				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+				((uint64_t)slen << CI_TXD_QW1_TX_BUF_SZ_S) |
+				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
 
 			txe->last_id = tx_last;
 			tx_id = txe->next_id;
@@ -3296,7 +3289,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		} while (m_seg);
 
 		/* fill the last descriptor with End of Packet (EOP) bit */
-		td_cmd |= ICE_TX_DESC_CMD_EOP;
+		td_cmd |= CI_TX_DESC_CMD_EOP;
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
@@ -3307,14 +3300,13 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 				   "%4u (port=%d queue=%d)",
 				   tx_last, txq->port_id, txq->queue_id);
 
-			td_cmd |= ICE_TX_DESC_CMD_RS;
+			td_cmd |= CI_TX_DESC_CMD_RS;
 
 			/* Update txq RS bit counters */
 			txq->nb_tx_used = 0;
 		}
 		txd->cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)td_cmd) <<
-					 ICE_TXD_QW1_CMD_S);
+			rte_cpu_to_le_64(((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S);
 
 		if (txq->tsq != NULL && txq->tsq->ts_flag > 0) {
 			uint64_t txtime = *RTE_MBUF_DYNFIELD(tx_pkt,
@@ -3361,8 +3353,8 @@ ice_tx_free_bufs(struct ci_tx_queue *txq)
 	uint16_t i;
 
 	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
-	     rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M)) !=
-	    rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE))
+	     rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
+	    rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
 		return 0;
 
 	txep = &txq->sw_ring[txq->tx_next_dd - (txq->tx_rs_thresh - 1)];
@@ -3598,8 +3590,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 		n = (uint16_t)(txq->nb_tx_desc - txq->tx_tail);
 		ice_tx_fill_hw_ring(txq, tx_pkts, n);
 		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS) <<
-					 ICE_TXD_QW1_CMD_S);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 		txq->tx_tail = 0;
 	}
@@ -3611,8 +3602,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	/* Determine if RS bit needs to be set */
 	if (txq->tx_tail > txq->tx_next_rs) {
 		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS) <<
-					 ICE_TXD_QW1_CMD_S);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 		if (txq->tx_next_rs >= txq->nb_tx_desc)
@@ -4843,9 +4833,9 @@ ice_fdir_programming(struct ice_pf *pf, struct ice_fltr_desc *fdir_desc)
 
 	txdp = &txq->ci_tx_ring[txq->tx_tail + 1];
 	txdp->buffer_addr = rte_cpu_to_le_64(pf->fdir.dma_addr);
-	td_cmd = ICE_TX_DESC_CMD_EOP |
-		ICE_TX_DESC_CMD_RS  |
-		ICE_TX_DESC_CMD_DUMMY;
+	td_cmd = CI_TX_DESC_CMD_EOP |
+		CI_TX_DESC_CMD_RS  |
+		CI_TX_DESC_CMD_DUMMY;
 
 	txdp->cmd_type_offset_bsz =
 		ice_build_ctob(td_cmd, 0, ICE_FDIR_PKT_LEN, 0);
@@ -4856,9 +4846,8 @@ ice_fdir_programming(struct ice_pf *pf, struct ice_fltr_desc *fdir_desc)
 	/* Update the tx tail register */
 	ICE_PCI_REG_WRITE(txq->qtx_tail, txq->tx_tail);
 	for (i = 0; i < ICE_FDIR_MAX_WAIT_US; i++) {
-		if ((txdp->cmd_type_offset_bsz &
-		     rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M)) ==
-		    rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE))
+		if ((txdp->cmd_type_offset_bsz & rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+		    rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
 			break;
 		rte_delay_us(1);
 	}
diff --git a/drivers/net/intel/ice/ice_rxtx.h b/drivers/net/intel/ice/ice_rxtx.h
index c524e9f756..cd5fa93d1c 100644
--- a/drivers/net/intel/ice/ice_rxtx.h
+++ b/drivers/net/intel/ice/ice_rxtx.h
@@ -46,7 +46,7 @@
 
 #define ICE_SUPPORT_CHAIN_NUM 5
 
-#define ICE_TD_CMD                      ICE_TX_DESC_CMD_EOP
+#define ICE_TD_CMD                      CI_TX_DESC_CMD_EOP
 
 #define ICE_VPMD_RX_BURST            CI_VPMD_RX_BURST
 #define ICE_VPMD_TX_BURST            32
@@ -169,19 +169,6 @@ struct ice_txtime {
 	const struct rte_memzone *ts_mz;
 };
 
-/* Offload features */
-union ice_tx_offload {
-	uint64_t data;
-	struct {
-		uint64_t l2_len:7; /* L2 (MAC) Header Length. */
-		uint64_t l3_len:9; /* L3 (IP) Header Length. */
-		uint64_t l4_len:8; /* L4 Header Length. */
-		uint64_t tso_segsz:16; /* TCP TSO segment size */
-		uint64_t outer_l2_len:8; /* outer L2 Header Length */
-		uint64_t outer_l3_len:16; /* outer L3 Header Length */
-	};
-};
-
 /* Rx Flex Descriptor for Comms Package Profile
  * RxDID Profile ID 22 (swap Hash and FlowID)
  * Flex-field 0: Flow ID lower 16-bits
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
index 0a1df0b2f6..2922671158 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
@@ -777,10 +777,9 @@ static __rte_always_inline void
 ice_vtx1(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf *pkt, uint64_t flags, bool offload)
 {
-	uint64_t high_qw =
-		(ICE_TX_DESC_DTYPE_DATA |
-		 ((uint64_t)flags  << ICE_TXD_QW1_CMD_S) |
-		 ((uint64_t)pkt->data_len << ICE_TXD_QW1_TX_BUF_SZ_S));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		 ((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+		 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 	if (offload)
 		ice_txd_enable_offload(pkt, &high_qw);
 
@@ -792,8 +791,7 @@ static __rte_always_inline void
 ice_vtx(volatile struct ci_tx_desc *txdp,
 	struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags, bool offload)
 {
-	const uint64_t hi_qw_tmpl = (ICE_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << ICE_TXD_QW1_CMD_S));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags << CI_TXD_QW1_CMD_S));
 
 	/* if unaligned on 32-bit boundary, do one to align */
 	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
@@ -801,30 +799,22 @@ ice_vtx(volatile struct ci_tx_desc *txdp,
 		nb_pkts--, txdp++, pkt++;
 	}
 
-	/* do two at a time while possible, in bursts */
+	/* do four at a time while possible, in bursts */
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
-		uint64_t hi_qw3 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[3]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw3 = hi_qw_tmpl |
+			((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			ice_txd_enable_offload(pkt[3], &hi_qw3);
-		uint64_t hi_qw2 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[2]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw2 = hi_qw_tmpl |
+			((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			ice_txd_enable_offload(pkt[2], &hi_qw2);
-		uint64_t hi_qw1 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[1]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw1 = hi_qw_tmpl |
+			((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			ice_txd_enable_offload(pkt[1], &hi_qw1);
-		uint64_t hi_qw0 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[0]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw0 = hi_qw_tmpl |
+			((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			ice_txd_enable_offload(pkt[0], &hi_qw0);
 
@@ -856,7 +846,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = ICE_TD_CMD;
-	uint64_t rs = ICE_TX_DESC_CMD_RS | ICE_TD_CMD;
+	uint64_t rs = CI_TX_DESC_CMD_RS | ICE_TD_CMD;
 
 	/* cross rx_thresh boundary is not allowed */
 	nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);
@@ -901,8 +891,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS) <<
-					 ICE_TXD_QW1_CMD_S);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
index d42f41461f..e64b6e227b 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
@@ -850,10 +850,9 @@ static __rte_always_inline void
 ice_vtx1(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf *pkt, uint64_t flags, bool do_offload)
 {
-	uint64_t high_qw =
-		(ICE_TX_DESC_DTYPE_DATA |
-		 ((uint64_t)flags  << ICE_TXD_QW1_CMD_S) |
-		 ((uint64_t)pkt->data_len << ICE_TXD_QW1_TX_BUF_SZ_S));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		 ((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+		 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	if (do_offload)
 		ice_txd_enable_offload(pkt, &high_qw);
@@ -866,32 +865,23 @@ static __rte_always_inline void
 ice_vtx(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkt,
 	uint16_t nb_pkts,  uint64_t flags, bool do_offload)
 {
-	const uint64_t hi_qw_tmpl = (ICE_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << ICE_TXD_QW1_CMD_S));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags << CI_TXD_QW1_CMD_S));
 
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
-		uint64_t hi_qw3 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[3]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw3 = hi_qw_tmpl |
+			((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (do_offload)
 			ice_txd_enable_offload(pkt[3], &hi_qw3);
-		uint64_t hi_qw2 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[2]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw2 = hi_qw_tmpl |
+			((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (do_offload)
 			ice_txd_enable_offload(pkt[2], &hi_qw2);
-		uint64_t hi_qw1 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[1]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw1 = hi_qw_tmpl |
+			((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (do_offload)
 			ice_txd_enable_offload(pkt[1], &hi_qw1);
-		uint64_t hi_qw0 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[0]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw0 = hi_qw_tmpl |
+			((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (do_offload)
 			ice_txd_enable_offload(pkt[0], &hi_qw0);
 
@@ -920,7 +910,7 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = ICE_TD_CMD;
-	uint64_t rs = ICE_TX_DESC_CMD_RS | ICE_TD_CMD;
+	uint64_t rs = CI_TX_DESC_CMD_RS | ICE_TD_CMD;
 
 	/* cross rx_thresh boundary is not allowed */
 	nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);
@@ -966,8 +956,7 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS) <<
-					 ICE_TXD_QW1_CMD_S);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_common.h b/drivers/net/intel/ice/ice_rxtx_vec_common.h
index 8ba591e403..1d83a087cc 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_common.h
+++ b/drivers/net/intel/ice/ice_rxtx_vec_common.h
@@ -12,8 +12,8 @@ static inline int
 ice_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 {
 	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M)) ==
-				rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+				rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 }
 
 static inline void
@@ -124,53 +124,52 @@ ice_txd_enable_offload(struct rte_mbuf *tx_pkt,
 	/* Tx Checksum Offload */
 	/* SET MACLEN */
 	td_offset |= (tx_pkt->l2_len >> 1) <<
-		ICE_TX_DESC_LEN_MACLEN_S;
+		CI_TX_DESC_LEN_MACLEN_S;
 
 	/* Enable L3 checksum offload */
 	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
-		td_cmd |= ICE_TX_DESC_CMD_IIPT_IPV4_CSUM;
+		td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
 		td_offset |= (tx_pkt->l3_len >> 2) <<
-			ICE_TX_DESC_LEN_IPLEN_S;
+			CI_TX_DESC_LEN_IPLEN_S;
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
-		td_cmd |= ICE_TX_DESC_CMD_IIPT_IPV4;
+		td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
 		td_offset |= (tx_pkt->l3_len >> 2) <<
-			ICE_TX_DESC_LEN_IPLEN_S;
+			CI_TX_DESC_LEN_IPLEN_S;
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
-		td_cmd |= ICE_TX_DESC_CMD_IIPT_IPV6;
+		td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
 		td_offset |= (tx_pkt->l3_len >> 2) <<
-			ICE_TX_DESC_LEN_IPLEN_S;
+			CI_TX_DESC_LEN_IPLEN_S;
 	}
 
 	/* Enable L4 checksum offloads */
 	switch (ol_flags & RTE_MBUF_F_TX_L4_MASK) {
 	case RTE_MBUF_F_TX_TCP_CKSUM:
-		td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_TCP;
+		td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
 		td_offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
-			ICE_TX_DESC_LEN_L4_LEN_S;
+			CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_SCTP_CKSUM:
-		td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_SCTP;
+		td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
 		td_offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
-			ICE_TX_DESC_LEN_L4_LEN_S;
+			CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_UDP_CKSUM:
-		td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_UDP;
+		td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
 		td_offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
-			ICE_TX_DESC_LEN_L4_LEN_S;
+			CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	default:
 		break;
 	}
 
-	*txd_hi |= ((uint64_t)td_offset) << ICE_TXD_QW1_OFFSET_S;
+	*txd_hi |= ((uint64_t)td_offset) << CI_TXD_QW1_OFFSET_S;
 
-	/* Tx VLAN insertion Offload */
+	/* Tx VLAN/QINQ insertion Offload */
 	if (ol_flags & RTE_MBUF_F_TX_VLAN) {
-		td_cmd |= ICE_TX_DESC_CMD_IL2TAG1;
-		*txd_hi |= ((uint64_t)tx_pkt->vlan_tci <<
-				ICE_TXD_QW1_L2TAG1_S);
+		td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
+		*txd_hi |= ((uint64_t)tx_pkt->vlan_tci << CI_TXD_QW1_L2TAG1_S);
 	}
 
-	*txd_hi |= ((uint64_t)td_cmd) << ICE_TXD_QW1_CMD_S;
+	*txd_hi |= ((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S;
 }
 #endif
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 23666539ab..587871b54a 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -271,7 +271,7 @@ idpf_qc_single_tx_queue_reset(struct ci_tx_queue *txq)
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
 		txq->ci_tx_ring[i].cmd_type_offset_bsz =
-			rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
 		txe[prev].next_id = i;
@@ -849,7 +849,7 @@ idpf_calc_context_desc(uint64_t flags)
  */
 static inline void
 idpf_set_splitq_tso_ctx(struct rte_mbuf *mbuf,
-			union idpf_tx_offload tx_offload,
+			union ci_tx_offload tx_offload,
 			volatile union idpf_flex_tx_ctx_desc *ctx_desc)
 {
 	uint16_t cmd_dtype;
@@ -887,7 +887,7 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	volatile struct idpf_flex_tx_sched_desc *txr;
 	volatile struct idpf_flex_tx_sched_desc *txd;
 	struct ci_tx_entry *sw_ring;
-	union idpf_tx_offload tx_offload = {0};
+	union ci_tx_offload tx_offload = {0};
 	struct ci_tx_entry *txe, *txn;
 	uint16_t nb_used, tx_id, sw_id;
 	struct rte_mbuf *tx_pkt;
@@ -1334,7 +1334,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 {
 	volatile struct ci_tx_desc *txd;
 	volatile struct ci_tx_desc *txr;
-	union idpf_tx_offload tx_offload = {0};
+	union ci_tx_offload tx_offload = {0};
 	struct ci_tx_entry *txe, *txn;
 	struct ci_tx_entry *sw_ring;
 	struct ci_tx_queue *txq;
@@ -1452,10 +1452,10 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			slen = m_seg->data_len;
 			buf_dma_addr = rte_mbuf_data_iova(m_seg);
 			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DATA |
-				((uint64_t)td_cmd  << IDPF_TXD_QW1_CMD_S) |
-				((uint64_t)td_offset << IDPF_TXD_QW1_OFFSET_S) |
-				((uint64_t)slen << IDPF_TXD_QW1_TX_BUF_SZ_S));
+			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)td_cmd  << CI_TXD_QW1_CMD_S) |
+				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+				((uint64_t)slen << CI_TXD_QW1_TX_BUF_SZ_S));
 
 			txe->last_id = tx_last;
 			tx_id = txe->next_id;
@@ -1464,7 +1464,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		} while (m_seg);
 
 		/* The last packet data descriptor needs End Of Packet (EOP) */
-		td_cmd |= IDPF_TX_DESC_CMD_EOP;
+		td_cmd |= CI_TX_DESC_CMD_EOP;
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
@@ -1473,13 +1473,13 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			       "%4u (port=%d queue=%d)",
 			       tx_last, txq->port_id, txq->queue_id);
 
-			td_cmd |= IDPF_TX_DESC_CMD_RS;
+			td_cmd |= CI_TX_DESC_CMD_RS;
 
 			/* Update txq RS bit counters */
 			txq->nb_tx_used = 0;
 		}
 
-		txd->cmd_type_offset_bsz |= rte_cpu_to_le_16(td_cmd << IDPF_TXD_QW1_CMD_S);
+		txd->cmd_type_offset_bsz |= rte_cpu_to_le_16(td_cmd << CI_TXD_QW1_CMD_S);
 	}
 
 end_of_tx:
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.h b/drivers/net/intel/idpf/idpf_common_rxtx.h
index 2f2fa153b2..b88a87402d 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.h
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.h
@@ -169,18 +169,6 @@ struct idpf_rx_queue {
 	uint32_t hw_register_set;
 };
 
-/* Offload features */
-union idpf_tx_offload {
-	uint64_t data;
-	struct {
-		uint64_t l2_len:7; /* L2 (MAC) Header Length. */
-		uint64_t l3_len:9; /* L3 (IP) Header Length. */
-		uint64_t l4_len:8; /* L4 Header Length. */
-		uint64_t tso_segsz:16; /* TCP TSO segment size */
-		/* uint64_t unused : 24; */
-	};
-};
-
 union idpf_tx_desc {
 	struct ci_tx_desc *tx_ring;
 	struct idpf_flex_tx_sched_desc *desc_ring;
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c b/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
index 04efee3722..411b171b97 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
@@ -486,10 +486,9 @@ static inline void
 idpf_singleq_vtx1(volatile struct ci_tx_desc *txdp,
 		  struct rte_mbuf *pkt, uint64_t flags)
 {
-	uint64_t high_qw =
-		(IDPF_TX_DESC_DTYPE_DATA |
-		 ((uint64_t)flags  << IDPF_TXD_QW1_CMD_S) |
-		 ((uint64_t)pkt->data_len << IDPF_TXD_QW1_TX_BUF_SZ_S));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		 ((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+		 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	__m128i descriptor = _mm_set_epi64x(high_qw,
 				pkt->buf_iova + pkt->data_off);
@@ -500,8 +499,7 @@ static inline void
 idpf_singleq_vtx(volatile struct ci_tx_desc *txdp,
 		 struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
-	const uint64_t hi_qw_tmpl = (IDPF_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << IDPF_TXD_QW1_CMD_S));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags << CI_TXD_QW1_CMD_S));
 
 	/* if unaligned on 32-bit boundary, do one to align */
 	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
@@ -511,22 +509,14 @@ idpf_singleq_vtx(volatile struct ci_tx_desc *txdp,
 
 	/* do two at a time while possible, in bursts */
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
-		uint64_t hi_qw3 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[3]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
-		uint64_t hi_qw2 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[2]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
-		uint64_t hi_qw1 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[1]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
-		uint64_t hi_qw0 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[0]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw3 = hi_qw_tmpl |
+			((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw2 = hi_qw_tmpl |
+			((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw1 = hi_qw_tmpl |
+			((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw0 = hi_qw_tmpl |
+			((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 
 		__m256i desc2_3 =
 			_mm256_set_epi64x
@@ -559,8 +549,8 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts
 	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
-	uint64_t flags = IDPF_TX_DESC_CMD_EOP;
-	uint64_t rs = IDPF_TX_DESC_CMD_RS | flags;
+	uint64_t flags = CI_TX_DESC_CMD_EOP;
+	uint64_t rs = CI_TX_DESC_CMD_RS | flags;
 
 	/* cross rx_thresh boundary is not allowed */
 	nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);
@@ -605,8 +595,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)IDPF_TX_DESC_CMD_RS) <<
-					 IDPF_TXD_QW1_CMD_S);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
index d5e5a2ca5f..49ace35615 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
@@ -1003,10 +1003,9 @@ static __rte_always_inline void
 idpf_singleq_vtx1(volatile struct ci_tx_desc *txdp,
 	  struct rte_mbuf *pkt, uint64_t flags)
 {
-	uint64_t high_qw =
-		(IDPF_TX_DESC_DTYPE_DATA |
-		 ((uint64_t)flags  << IDPF_TXD_QW1_CMD_S) |
-		 ((uint64_t)pkt->data_len << IDPF_TXD_QW1_TX_BUF_SZ_S));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		 ((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+		 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	__m128i descriptor = _mm_set_epi64x(high_qw,
 					    pkt->buf_iova + pkt->data_off);
@@ -1019,8 +1018,7 @@ static __rte_always_inline void
 idpf_singleq_vtx(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
-	const uint64_t hi_qw_tmpl = (IDPF_TX_DESC_DTYPE_DATA  |
-			((uint64_t)flags  << IDPF_TXD_QW1_CMD_S));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA  | (flags << CI_TXD_QW1_CMD_S));
 
 	/* if unaligned on 32-bit boundary, do one to align */
 	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
@@ -1030,22 +1028,14 @@ idpf_singleq_vtx(volatile struct ci_tx_desc *txdp,
 
 	/* do 4 at a time while possible, in bursts */
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
-		uint64_t hi_qw3 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[3]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
-		uint64_t hi_qw2 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[2]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
-		uint64_t hi_qw1 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[1]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
-		uint64_t hi_qw0 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[0]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw3 = hi_qw_tmpl |
+			((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw2 = hi_qw_tmpl |
+			((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw1 = hi_qw_tmpl |
+			((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw0 = hi_qw_tmpl |
+			((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 
 		__m512i desc0_3 =
 			_mm512_set_epi64
@@ -1075,8 +1065,8 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pk
 	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
-	uint64_t flags = IDPF_TX_DESC_CMD_EOP;
-	uint64_t rs = IDPF_TX_DESC_CMD_RS | flags;
+	uint64_t flags = CI_TX_DESC_CMD_EOP;
+	uint64_t rs = CI_TX_DESC_CMD_RS | flags;
 
 	/* cross rx_thresh boundary is not allowed */
 	nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);
@@ -1124,8 +1114,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pk
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)IDPF_TX_DESC_CMD_RS) <<
-					 IDPF_TXD_QW1_CMD_S);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/idpf/idpf_rxtx_vec_common.h b/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
index b5e8574667..a43d8f78e2 100644
--- a/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
+++ b/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
@@ -32,8 +32,8 @@ idpf_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 		return 1;
 
 	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(IDPF_TXD_QW1_DTYPE_M)) ==
-				rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+				rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 }
 
 static inline int
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v3 05/36] net/intel: create separate header for Tx scalar fns
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
                     ` (3 preceding siblings ...)
  2026-01-30 11:41   ` [PATCH v3 04/36] net/intel: consolidate definitions for Tx desc fields Bruce Richardson
@ 2026-01-30 11:41   ` Bruce Richardson
  2026-02-06 10:23     ` Loftus, Ciara
  2026-01-30 11:41   ` [PATCH v3 06/36] net/intel: add common fn to calculate needed descriptors Bruce Richardson
                     ` (30 subsequent siblings)
  35 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-01-30 11:41 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

Rather than having all Tx code in the one file, which could start
getting rather long, move the scalar datapath functions to a new header
file.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h            | 58 ++------------------
 drivers/net/intel/common/tx_scalar_fns.h | 67 ++++++++++++++++++++++++
 2 files changed, 72 insertions(+), 53 deletions(-)
 create mode 100644 drivers/net/intel/common/tx_scalar_fns.h

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 03245d4fba..01e42303b4 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -319,59 +319,6 @@ ci_tx_free_bufs_vec(struct ci_tx_queue *txq, ci_desc_done_fn desc_done, bool ctx
 	return txq->tx_rs_thresh;
 }
 
-/*
- * Common transmit descriptor cleanup function for Intel drivers.
- * Used by ice, i40e, iavf, and idpf drivers.
- *
- * Returns:
- *   0 on success
- *  -1 if cleanup cannot proceed (descriptors not yet processed by HW)
- */
-static __rte_always_inline int
-ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
-{
-	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
-	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
-	uint16_t nb_tx_desc = txq->nb_tx_desc;
-	uint16_t desc_to_clean_to;
-	uint16_t nb_tx_to_clean;
-
-	/* Determine the last descriptor needing to be cleaned */
-	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
-	if (desc_to_clean_to >= nb_tx_desc)
-		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
-
-	/* Check to make sure the last descriptor to clean is done */
-	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-
-	/* Check if descriptor is done - all drivers use 0xF as done value in bits 3:0 */
-	if ((txd[desc_to_clean_to].cmd_type_offset_bsz & rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
-			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE)) {
-		/* Descriptor not yet processed by hardware */
-		return -1;
-	}
-
-	/* Figure out how many descriptors will be cleaned */
-	if (last_desc_cleaned > desc_to_clean_to)
-		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) + desc_to_clean_to);
-	else
-		nb_tx_to_clean = (uint16_t)(desc_to_clean_to - last_desc_cleaned);
-
-	/* The last descriptor to clean is done, so that means all the
-	 * descriptors from the last descriptor that was cleaned
-	 * up to the last descriptor with the RS bit set
-	 * are done. Only reset the threshold descriptor.
-	 */
-	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
-
-	/* Update the txq to reflect the last descriptor that was cleaned */
-	txq->last_desc_cleaned = desc_to_clean_to;
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
-
-	return 0;
-}
-
 static inline void
 ci_txq_release_all_mbufs(struct ci_tx_queue *txq, bool use_ctx)
 {
@@ -490,4 +437,9 @@ ci_tx_path_select(const struct ci_tx_path_features *req_features,
 	return idx;
 }
 
+/* include the scalar functions at the end, so they can use the common definitions.
+ * This is done so drivers can use all functions just by including tx.h
+ */
+#include "tx_scalar_fns.h"
+
 #endif /* _COMMON_INTEL_TX_H_ */
diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
new file mode 100644
index 0000000000..c79210d084
--- /dev/null
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -0,0 +1,67 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2025 Intel Corporation
+ */
+
+#ifndef _COMMON_INTEL_TX_SCALAR_FNS_H_
+#define _COMMON_INTEL_TX_SCALAR_FNS_H_
+
+#include <stdint.h>
+#include <rte_byteorder.h>
+
+/* depends on common Tx definitions. */
+#include "tx.h"
+
+/*
+ * Common transmit descriptor cleanup function for Intel drivers.
+ * Used by ice, i40e, iavf, and idpf drivers.
+ *
+ * Returns:
+ *   0 on success
+ *  -1 if cleanup cannot proceed (descriptors not yet processed by HW)
+ */
+static __rte_always_inline int
+ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
+{
+	struct ci_tx_entry *sw_ring = txq->sw_ring;
+	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
+	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
+	uint16_t nb_tx_desc = txq->nb_tx_desc;
+	uint16_t desc_to_clean_to;
+	uint16_t nb_tx_to_clean;
+
+	/* Determine the last descriptor needing to be cleaned */
+	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
+	if (desc_to_clean_to >= nb_tx_desc)
+		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
+
+	/* Check to make sure the last descriptor to clean is done */
+	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
+
+	/* Check if descriptor is done - all drivers use 0xF as done value in bits 3:0 */
+	if ((txd[desc_to_clean_to].cmd_type_offset_bsz & rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE)) {
+		/* Descriptor not yet processed by hardware */
+		return -1;
+	}
+
+	/* Figure out how many descriptors will be cleaned */
+	if (last_desc_cleaned > desc_to_clean_to)
+		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) + desc_to_clean_to);
+	else
+		nb_tx_to_clean = (uint16_t)(desc_to_clean_to - last_desc_cleaned);
+
+	/* The last descriptor to clean is done, so that means all the
+	 * descriptors from the last descriptor that was cleaned
+	 * up to the last descriptor with the RS bit set
+	 * are done. Only reset the threshold descriptor.
+	 */
+	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
+
+	/* Update the txq to reflect the last descriptor that was cleaned */
+	txq->last_desc_cleaned = desc_to_clean_to;
+	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
+
+	return 0;
+}
+
+#endif /* _COMMON_INTEL_TX_SCALAR_FNS_H_ */
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v3 06/36] net/intel: add common fn to calculate needed descriptors
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
                     ` (4 preceding siblings ...)
  2026-01-30 11:41   ` [PATCH v3 05/36] net/intel: create separate header for Tx scalar fns Bruce Richardson
@ 2026-01-30 11:41   ` Bruce Richardson
  2026-02-06 10:25     ` Loftus, Ciara
  2026-01-30 11:41   ` [PATCH v3 07/36] net/ice: refactor context descriptor handling Bruce Richardson
                     ` (29 subsequent siblings)
  35 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-01-30 11:41 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Vladimir Medvedkin, Anatoly Burakov,
	Jingjing Wu, Praveen Shetty

Multiple drivers used the same logic to calculate how many Tx data
descriptors were needed. Move that calculation to common code. In the
process of updating drivers, fix idpf driver calculation for the TSO
case.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar_fns.h  | 21 +++++++++++++++++++++
 drivers/net/intel/i40e/i40e_rxtx.c        | 18 +-----------------
 drivers/net/intel/iavf/iavf_rxtx.c        | 17 +----------------
 drivers/net/intel/ice/ice_rxtx.c          | 18 +-----------------
 drivers/net/intel/idpf/idpf_common_rxtx.c | 21 +++++++++++++++++----
 5 files changed, 41 insertions(+), 54 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index c79210d084..f894cea616 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -64,4 +64,25 @@ ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
 	return 0;
 }
 
+static inline uint16_t
+ci_div_roundup16(uint16_t x, uint16_t y)
+{
+	return (uint16_t)((x + y - 1) / y);
+}
+
+/* Calculate the number of TX descriptors needed for each pkt */
+static inline uint16_t
+ci_calc_pkt_desc(const struct rte_mbuf *tx_pkt)
+{
+	uint16_t count = 0;
+
+	while (tx_pkt != NULL) {
+		count += ci_div_roundup16(tx_pkt->data_len, CI_MAX_DATA_PER_TXD);
+		tx_pkt = tx_pkt->next;
+	}
+
+	return count;
+}
+
+
 #endif /* _COMMON_INTEL_TX_SCALAR_FNS_H_ */
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index f96c5c7f1e..b75306931a 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -1029,21 +1029,6 @@ i40e_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 	return ctx_desc;
 }
 
-/* Calculate the number of TX descriptors needed for each pkt */
-static inline uint16_t
-i40e_calc_pkt_desc(struct rte_mbuf *tx_pkt)
-{
-	struct rte_mbuf *txd = tx_pkt;
-	uint16_t count = 0;
-
-	while (txd != NULL) {
-		count += DIV_ROUND_UP(txd->data_len, CI_MAX_DATA_PER_TXD);
-		txd = txd->next;
-	}
-
-	return count;
-}
-
 uint16_t
 i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
@@ -1106,8 +1091,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		 * per tx desc.
 		 */
 		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-			nb_used = (uint16_t)(i40e_calc_pkt_desc(tx_pkt) +
-					     nb_ctx);
+			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
 		else
 			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
 		tx_last = (uint16_t)(tx_id + nb_used - 1);
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index 947b6c24d2..885d9309cc 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -2666,21 +2666,6 @@ iavf_build_data_desc_cmd_offset_fields(volatile uint64_t *qw1,
 		((uint64_t)l2tag1 << IAVF_TXD_DATA_QW1_L2TAG1_SHIFT));
 }
 
-/* Calculate the number of TX descriptors needed for each pkt */
-static inline uint16_t
-iavf_calc_pkt_desc(struct rte_mbuf *tx_pkt)
-{
-	struct rte_mbuf *txd = tx_pkt;
-	uint16_t count = 0;
-
-	while (txd != NULL) {
-		count += (txd->data_len + CI_MAX_DATA_PER_TXD - 1) / CI_MAX_DATA_PER_TXD;
-		txd = txd->next;
-	}
-
-	return count;
-}
-
 static inline void
 iavf_fill_data_desc(volatile struct ci_tx_desc *desc,
 	uint64_t desc_template,	uint16_t buffsz,
@@ -2766,7 +2751,7 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		 * per tx desc.
 		 */
 		if (mb->ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-			nb_desc_required = iavf_calc_pkt_desc(mb) + nb_desc_ctx + nb_desc_ipsec;
+			nb_desc_required = ci_calc_pkt_desc(mb) + nb_desc_ctx + nb_desc_ipsec;
 		else
 			nb_desc_required = nb_desc_data + nb_desc_ctx + nb_desc_ipsec;
 
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 52bbf95967..2a53b614b2 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3075,21 +3075,6 @@ ice_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 	return ctx_desc;
 }
 
-/* Calculate the number of TX descriptors needed for each pkt */
-static inline uint16_t
-ice_calc_pkt_desc(struct rte_mbuf *tx_pkt)
-{
-	struct rte_mbuf *txd = tx_pkt;
-	uint16_t count = 0;
-
-	while (txd != NULL) {
-		count += DIV_ROUND_UP(txd->data_len, CI_MAX_DATA_PER_TXD);
-		txd = txd->next;
-	}
-
-	return count;
-}
-
 uint16_t
 ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
@@ -3152,8 +3137,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		 * per tx desc.
 		 */
 		if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
-			nb_used = (uint16_t)(ice_calc_pkt_desc(tx_pkt) +
-					     nb_ctx);
+			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
 		else
 			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
 		tx_last = (uint16_t)(tx_id + nb_used - 1);
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 587871b54a..11d6848430 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -934,7 +934,16 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		tx_offload.tso_segsz = tx_pkt->tso_segsz;
 		/* Calculate the number of context descriptors needed. */
 		nb_ctx = idpf_calc_context_desc(ol_flags);
-		nb_used = tx_pkt->nb_segs + nb_ctx;
+
+		/* Calculate the number of TX descriptors needed for
+		 * each packet. For TSO packets, use ci_calc_pkt_desc as
+		 * the mbuf data size might exceed max data size that hw allows
+		 * per tx desc.
+		 */
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+			nb_used = ci_calc_pkt_desc(tx_pkt) + nb_ctx;
+		else
+			nb_used = tx_pkt->nb_segs + nb_ctx;
 
 		if (ol_flags & IDPF_TX_CKSUM_OFFLOAD_MASK)
 			cmd_dtype = IDPF_TXD_FLEX_FLOW_CMD_CS_EN;
@@ -1382,10 +1391,14 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		nb_ctx = idpf_calc_context_desc(ol_flags);
 
 		/* The number of descriptors that must be allocated for
-		 * a packet equals to the number of the segments of that
-		 * packet plus 1 context descriptor if needed.
+		 * a packet. For TSO packets, use ci_calc_pkt_desc as
+		 * the mbuf data size might exceed max data size that hw allows
+		 * per tx desc.
 		 */
-		nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
+		else
+			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
 		tx_last = (uint16_t)(tx_id + nb_used - 1);
 
 		/* Circular ring */
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v3 07/36] net/ice: refactor context descriptor handling
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
                     ` (5 preceding siblings ...)
  2026-01-30 11:41   ` [PATCH v3 06/36] net/intel: add common fn to calculate needed descriptors Bruce Richardson
@ 2026-01-30 11:41   ` Bruce Richardson
  2026-02-06 10:47     ` Loftus, Ciara
  2026-01-30 11:41   ` [PATCH v3 08/36] net/i40e: " Bruce Richardson
                     ` (28 subsequent siblings)
  35 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-01-30 11:41 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

Create a single function to manage all context descriptor handling,
which returns either 0 or 1 depending on whether a descriptor is needed
or not, as well as returning directly the descriptor contents if
relevant.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/ice/ice_rxtx.c | 104 +++++++++++++++++--------------
 1 file changed, 57 insertions(+), 47 deletions(-)

diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 2a53b614b2..cc442fed75 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -2966,10 +2966,6 @@ ice_txd_enable_checksum(uint64_t ol_flags,
 			uint32_t *td_offset,
 			union ci_tx_offload tx_offload)
 {
-	/* Set MACLEN */
-	if (!(ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK))
-		*td_offset |= (tx_offload.l2_len >> 1)
-			<< CI_TX_DESC_LEN_MACLEN_S;
 
 	/* Enable L3 checksum offloads */
 	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
@@ -3052,7 +3048,7 @@ ice_calc_context_desc(uint64_t flags)
 
 /* set ice TSO context descriptor */
 static inline uint64_t
-ice_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
+ice_set_tso_ctx(uint64_t ol_flags, const struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 {
 	uint64_t ctx_desc = 0;
 	uint32_t cd_cmd, hdr_len, cd_tso_len;
@@ -3063,7 +3059,7 @@ ice_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 	}
 
 	hdr_len = tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len;
-	hdr_len += (mbuf->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) ?
+	hdr_len += (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) ?
 		   tx_offload.outer_l2_len + tx_offload.outer_l3_len : 0;
 
 	cd_cmd = CI_TX_CTX_DESC_TSO;
@@ -3075,6 +3071,49 @@ ice_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 	return ctx_desc;
 }
 
+/* compute a context descriptor if one is necessary based on the ol_flags
+ *
+ * Returns 0 if no descriptor is necessary.
+ * Returns 1 if one is necessary and the contents of the descriptor are returned
+ *   in the values pointed to by qw0 and qw1. td_offset may also be modified.
+ */
+static __rte_always_inline uint16_t
+get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
+	const union ci_tx_offload *tx_offload, const struct ci_tx_queue *txq,
+	uint64_t *qw0, uint64_t *qw1)
+{
+	uint16_t cd_l2tag2 = 0;
+	uint64_t cd_type_cmd_tso_mss = ICE_TX_DESC_DTYPE_CTX;
+	uint32_t cd_tunneling_params = 0;
+	uint64_t ptp_tx_index = txq->ice_vsi->adapter->ptp_tx_index;
+
+	if (ice_calc_context_desc(ol_flags) == 0)
+		return 0;
+
+	if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)
+		ice_parse_tunneling_params(ol_flags, *tx_offload, &cd_tunneling_params);
+
+	if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
+		cd_type_cmd_tso_mss |= ice_set_tso_ctx(ol_flags, tx_pkt, *tx_offload);
+	else if (ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST)
+		cd_type_cmd_tso_mss |=
+			((uint64_t)CI_TX_CTX_DESC_TSYN << CI_TXD_QW1_CMD_S) |
+			((ptp_tx_index << ICE_TXD_CTX_QW1_TSYN_S) & ICE_TXD_CTX_QW1_TSYN_M);
+
+
+	/* TX context descriptor based double VLAN insert */
+	if (ol_flags & RTE_MBUF_F_TX_QINQ) {
+		cd_l2tag2 = tx_pkt->vlan_tci_outer;
+		cd_type_cmd_tso_mss |= ((uint64_t)CI_TX_CTX_DESC_IL2TAG2 << CI_TXD_QW1_CMD_S);
+	}
+
+	*qw0 = rte_cpu_to_le_32(cd_tunneling_params) |
+		((uint64_t)rte_cpu_to_le_16(cd_l2tag2) << 32);
+	*qw1 = rte_cpu_to_le_64(cd_type_cmd_tso_mss);
+
+	return 1;
+}
+
 uint16_t
 ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
@@ -3085,7 +3124,6 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	struct ci_tx_entry *txe, *txn;
 	struct rte_mbuf *tx_pkt;
 	struct rte_mbuf *m_seg;
-	uint32_t cd_tunneling_params;
 	uint16_t tx_id;
 	uint16_t ts_id = -1;
 	uint16_t nb_tx;
@@ -3096,6 +3134,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	uint32_t td_tag = 0;
 	uint16_t tx_last;
 	uint16_t slen;
+	uint16_t l2_len;
 	uint64_t buf_dma_addr;
 	uint64_t ol_flags;
 	union ci_tx_offload tx_offload = {0};
@@ -3114,20 +3153,25 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		(void)ci_tx_xmit_cleanup(txq);
 
 	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
+		uint64_t cd_qw0, cd_qw1;
 		tx_pkt = *tx_pkts++;
 
+		ol_flags = tx_pkt->ol_flags;
 		td_cmd = 0;
 		td_tag = 0;
-		td_offset = 0;
-		ol_flags = tx_pkt->ol_flags;
+		l2_len = ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK ?
+				tx_pkt->outer_l2_len : tx_pkt->l2_len;
+		td_offset = (l2_len >> 1) << CI_TX_DESC_LEN_MACLEN_S;
+
 		tx_offload.l2_len = tx_pkt->l2_len;
 		tx_offload.l3_len = tx_pkt->l3_len;
 		tx_offload.outer_l2_len = tx_pkt->outer_l2_len;
 		tx_offload.outer_l3_len = tx_pkt->outer_l3_len;
 		tx_offload.l4_len = tx_pkt->l4_len;
 		tx_offload.tso_segsz = tx_pkt->tso_segsz;
+
 		/* Calculate the number of context descriptors needed. */
-		nb_ctx = ice_calc_context_desc(ol_flags);
+		nb_ctx = get_context_desc(ol_flags, tx_pkt, &tx_offload, txq, &cd_qw0, &cd_qw1);
 
 		/* The number of descriptors that must be allocated for
 		 * a packet equals to the number of the segments of that
@@ -3169,15 +3213,6 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			td_tag = tx_pkt->vlan_tci;
 		}
 
-		/* Fill in tunneling parameters if necessary */
-		cd_tunneling_params = 0;
-		if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) {
-			td_offset |= (tx_offload.outer_l2_len >> 1)
-				<< CI_TX_DESC_LEN_MACLEN_S;
-			ice_parse_tunneling_params(ol_flags, tx_offload,
-						   &cd_tunneling_params);
-		}
-
 		/* Enable checksum offloading */
 		if (ol_flags & ICE_TX_CKSUM_OFFLOAD_MASK)
 			ice_txd_enable_checksum(ol_flags, &td_cmd,
@@ -3185,11 +3220,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		if (nb_ctx) {
 			/* Setup TX context descriptor if required */
-			volatile struct ice_tx_ctx_desc *ctx_txd =
-				(volatile struct ice_tx_ctx_desc *)
-					&ci_tx_ring[tx_id];
-			uint16_t cd_l2tag2 = 0;
-			uint64_t cd_type_cmd_tso_mss = ICE_TX_DESC_DTYPE_CTX;
+			uint64_t *ctx_txd = RTE_CAST_PTR(uint64_t *, &ci_tx_ring[tx_id]);
 
 			txn = &sw_ring[txe->next_id];
 			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
@@ -3198,29 +3229,8 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 				txe->mbuf = NULL;
 			}
 
-			if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
-				cd_type_cmd_tso_mss |=
-					ice_set_tso_ctx(tx_pkt, tx_offload);
-			else if (ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST)
-				cd_type_cmd_tso_mss |=
-					((uint64_t)CI_TX_CTX_DESC_TSYN <<
-					CI_TXD_QW1_CMD_S) |
-					 (((uint64_t)txq->ice_vsi->adapter->ptp_tx_index <<
-					 ICE_TXD_CTX_QW1_TSYN_S) & ICE_TXD_CTX_QW1_TSYN_M);
-
-			ctx_txd->tunneling_params =
-				rte_cpu_to_le_32(cd_tunneling_params);
-
-			/* TX context descriptor based double VLAN insert */
-			if (ol_flags & RTE_MBUF_F_TX_QINQ) {
-				cd_l2tag2 = tx_pkt->vlan_tci_outer;
-				cd_type_cmd_tso_mss |=
-					((uint64_t)CI_TX_CTX_DESC_IL2TAG2 <<
-					 CI_TXD_QW1_CMD_S);
-			}
-			ctx_txd->l2tag2 = rte_cpu_to_le_16(cd_l2tag2);
-			ctx_txd->qw1 =
-				rte_cpu_to_le_64(cd_type_cmd_tso_mss);
+			ctx_txd[0] = cd_qw0;
+			ctx_txd[1] = cd_qw1;
 
 			txe->last_id = tx_last;
 			tx_id = txe->next_id;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v3 08/36] net/i40e: refactor context descriptor handling
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
                     ` (6 preceding siblings ...)
  2026-01-30 11:41   ` [PATCH v3 07/36] net/ice: refactor context descriptor handling Bruce Richardson
@ 2026-01-30 11:41   ` Bruce Richardson
  2026-02-06 10:54     ` Loftus, Ciara
  2026-01-30 11:41   ` [PATCH v3 09/36] net/idpf: " Bruce Richardson
                     ` (27 subsequent siblings)
  35 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-01-30 11:41 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

move all context descriptor handling to a single function, as with the
ice driver, and use the same function signature as that driver.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/i40e/i40e_rxtx.c | 123 +++++++++++++++--------------
 1 file changed, 63 insertions(+), 60 deletions(-)

diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index b75306931a..183b70c63f 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -321,11 +321,6 @@ i40e_txd_enable_checksum(uint64_t ol_flags,
 			uint32_t *td_offset,
 			union ci_tx_offload tx_offload)
 {
-	/* Set MACLEN */
-	if (!(ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK))
-		*td_offset |= (tx_offload.l2_len >> 1)
-			<< CI_TX_DESC_LEN_MACLEN_S;
-
 	/* Enable L3 checksum offloads */
 	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
 		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
@@ -1004,7 +999,7 @@ i40e_calc_context_desc(uint64_t flags)
 
 /* set i40e TSO context descriptor */
 static inline uint64_t
-i40e_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
+i40e_set_tso_ctx(uint64_t ol_flags, const struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 {
 	uint64_t ctx_desc = 0;
 	uint32_t cd_cmd, hdr_len, cd_tso_len;
@@ -1015,7 +1010,7 @@ i40e_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 	}
 
 	hdr_len = tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len;
-	hdr_len += (mbuf->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) ?
+	hdr_len += (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) ?
 		   tx_offload.outer_l2_len + tx_offload.outer_l3_len : 0;
 
 	cd_cmd = I40E_TX_CTX_DESC_TSO;
@@ -1029,6 +1024,52 @@ i40e_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 	return ctx_desc;
 }
 
+/* compute a context descriptor if one is necessary based on the ol_flags
+ *
+ * Returns 0 if no descriptor is necessary.
+ * Returns 1 if one is necessary and the contents of the descriptor are returned
+ *   in the values pointed to by qw0 and qw1. td_offset may also be modified.
+ */
+static __rte_always_inline uint16_t
+get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
+		 const union ci_tx_offload *tx_offload,
+		 const struct ci_tx_queue *txq __rte_unused,
+		 uint64_t *qw0, uint64_t *qw1)
+{
+	uint16_t cd_l2tag2 = 0;
+	uint64_t cd_type_cmd_tso_mss = I40E_TX_DESC_DTYPE_CONTEXT;
+	uint32_t cd_tunneling_params = 0;
+
+	if (i40e_calc_context_desc(ol_flags) == 0)
+		return 0;
+
+	if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)
+		i40e_parse_tunneling_params(ol_flags, *tx_offload, &cd_tunneling_params);
+
+	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+		cd_type_cmd_tso_mss |= i40e_set_tso_ctx(ol_flags, tx_pkt, *tx_offload);
+	} else {
+#ifdef RTE_LIBRTE_IEEE1588
+		if (ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST)
+			cd_type_cmd_tso_mss |=
+				((uint64_t)I40E_TX_CTX_DESC_TSYN << I40E_TXD_CTX_QW1_CMD_SHIFT);
+#endif
+	}
+
+	/* TX context descriptor based double VLAN insert */
+	if (ol_flags & RTE_MBUF_F_TX_QINQ) {
+		cd_l2tag2 = tx_pkt->vlan_tci_outer;
+		cd_type_cmd_tso_mss |=
+				((uint64_t)I40E_TX_CTX_DESC_IL2TAG2 << I40E_TXD_CTX_QW1_CMD_SHIFT);
+	}
+
+	*qw0 = rte_cpu_to_le_32(cd_tunneling_params) |
+		((uint64_t)rte_cpu_to_le_16(cd_l2tag2) << 32);
+	*qw1 = rte_cpu_to_le_64(cd_type_cmd_tso_mss);
+
+	return 1;
+}
+
 uint16_t
 i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
@@ -1039,7 +1080,6 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	volatile struct ci_tx_desc *txr;
 	struct rte_mbuf *tx_pkt;
 	struct rte_mbuf *m_seg;
-	uint32_t cd_tunneling_params;
 	uint16_t tx_id;
 	uint16_t nb_tx;
 	uint32_t td_cmd;
@@ -1050,6 +1090,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	uint16_t nb_ctx;
 	uint16_t tx_last;
 	uint16_t slen;
+	uint16_t l2_len;
 	uint64_t buf_dma_addr;
 	union ci_tx_offload tx_offload = {0};
 
@@ -1064,14 +1105,15 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		(void)ci_tx_xmit_cleanup(txq);
 
 	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
-		td_cmd = 0;
-		td_tag = 0;
-		td_offset = 0;
-
 		tx_pkt = *tx_pkts++;
 		RTE_MBUF_PREFETCH_TO_FREE(txe->mbuf);
 
 		ol_flags = tx_pkt->ol_flags;
+		td_cmd = 0;
+		td_tag = 0;
+		l2_len = ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK ?
+				tx_pkt->outer_l2_len : tx_pkt->l2_len;
+		td_offset = (l2_len >> 1) << CI_TX_DESC_LEN_MACLEN_S;
 		tx_offload.l2_len = tx_pkt->l2_len;
 		tx_offload.l3_len = tx_pkt->l3_len;
 		tx_offload.outer_l2_len = tx_pkt->outer_l2_len;
@@ -1080,7 +1122,9 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		tx_offload.tso_segsz = tx_pkt->tso_segsz;
 
 		/* Calculate the number of context descriptors needed. */
-		nb_ctx = i40e_calc_context_desc(ol_flags);
+		uint64_t cd_qw0 = 0, cd_qw1 = 0;
+		nb_ctx = get_context_desc(ol_flags, tx_pkt, &tx_offload, txq,
+				&cd_qw0, &cd_qw1);
 
 		/**
 		 * The number of descriptors that must be allocated for
@@ -1126,14 +1170,6 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		/* Always enable CRC offload insertion */
 		td_cmd |= CI_TX_DESC_CMD_ICRC;
 
-		/* Fill in tunneling parameters if necessary */
-		cd_tunneling_params = 0;
-		if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) {
-			td_offset |= (tx_offload.outer_l2_len >> 1)
-					<< CI_TX_DESC_LEN_MACLEN_S;
-			i40e_parse_tunneling_params(ol_flags, tx_offload,
-						    &cd_tunneling_params);
-		}
 		/* Enable checksum offloading */
 		if (ol_flags & I40E_TX_CKSUM_OFFLOAD_MASK)
 			i40e_txd_enable_checksum(ol_flags, &td_cmd,
@@ -1141,12 +1177,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		if (nb_ctx) {
 			/* Setup TX context descriptor if required */
-			volatile struct i40e_tx_context_desc *ctx_txd =
-				(volatile struct i40e_tx_context_desc *)\
-							&txr[tx_id];
-			uint16_t cd_l2tag2 = 0;
-			uint64_t cd_type_cmd_tso_mss =
-				I40E_TX_DESC_DTYPE_CONTEXT;
+			uint64_t *desc = RTE_CAST_PTR(uint64_t *, &txr[tx_id]);
 
 			txn = &sw_ring[txe->next_id];
 			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
@@ -1155,41 +1186,13 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 				txe->mbuf = NULL;
 			}
 
-			/* TSO enabled means no timestamp */
-			if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-				cd_type_cmd_tso_mss |=
-					i40e_set_tso_ctx(tx_pkt, tx_offload);
-			else {
-#ifdef RTE_LIBRTE_IEEE1588
-				if (ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST)
-					cd_type_cmd_tso_mss |=
-						((uint64_t)I40E_TX_CTX_DESC_TSYN <<
-						 I40E_TXD_CTX_QW1_CMD_SHIFT);
-#endif
-			}
-
-			ctx_txd->tunneling_params =
-				rte_cpu_to_le_32(cd_tunneling_params);
-			if (ol_flags & RTE_MBUF_F_TX_QINQ) {
-				cd_l2tag2 = tx_pkt->vlan_tci_outer;
-				cd_type_cmd_tso_mss |=
-					((uint64_t)I40E_TX_CTX_DESC_IL2TAG2 <<
-						I40E_TXD_CTX_QW1_CMD_SHIFT);
-			}
-			ctx_txd->l2tag2 = rte_cpu_to_le_16(cd_l2tag2);
-			ctx_txd->type_cmd_tso_mss =
-				rte_cpu_to_le_64(cd_type_cmd_tso_mss);
+			desc[0] = cd_qw0;
+			desc[1] = cd_qw1;
 
 			PMD_TX_LOG(DEBUG, "mbuf: %p, TCD[%u]: "
-				"tunneling_params: %#x; "
-				"l2tag2: %#hx; "
-				"rsvd: %#hx; "
-				"type_cmd_tso_mss: %#"PRIx64";",
-				tx_pkt, tx_id,
-				ctx_txd->tunneling_params,
-				ctx_txd->l2tag2,
-				ctx_txd->rsvd,
-				ctx_txd->type_cmd_tso_mss);
+				"qw0: %#"PRIx64"; "
+				"qw1: %#"PRIx64";",
+				tx_pkt, tx_id, cd_qw0, cd_qw1);
 
 			txe->last_id = tx_last;
 			tx_id = txe->next_id;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v3 09/36] net/idpf: refactor context descriptor handling
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
                     ` (7 preceding siblings ...)
  2026-01-30 11:41   ` [PATCH v3 08/36] net/i40e: " Bruce Richardson
@ 2026-01-30 11:41   ` Bruce Richardson
  2026-02-06 10:59     ` Loftus, Ciara
  2026-01-30 11:41   ` [PATCH v3 10/36] net/intel: consolidate checksum mask definition Bruce Richardson
                     ` (26 subsequent siblings)
  35 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-01-30 11:41 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Jingjing Wu, Praveen Shetty

move all context descriptor handling to a single function, as with the
ice driver.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/idpf/idpf_common_rxtx.c | 61 +++++++++++------------
 1 file changed, 28 insertions(+), 33 deletions(-)

diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 11d6848430..9219ad9047 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -845,37 +845,36 @@ idpf_calc_context_desc(uint64_t flags)
 	return 0;
 }
 
-/* set TSO context descriptor
+/* set TSO context descriptor, returns 0 if no context needed, 1 if context set
  */
-static inline void
-idpf_set_splitq_tso_ctx(struct rte_mbuf *mbuf,
+static inline uint16_t
+idpf_set_tso_ctx(uint64_t ol_flags, struct rte_mbuf *mbuf,
 			union ci_tx_offload tx_offload,
-			volatile union idpf_flex_tx_ctx_desc *ctx_desc)
+			uint64_t *qw0, uint64_t *qw1)
 {
-	uint16_t cmd_dtype;
+	uint16_t cmd_dtype = IDPF_TX_DESC_DTYPE_FLEX_TSO_CTX | IDPF_TX_FLEX_CTX_DESC_CMD_TSO;
+	uint16_t tso_segsz = mbuf->tso_segsz;
 	uint32_t tso_len;
 	uint8_t hdr_len;
 
+	if (idpf_calc_context_desc(ol_flags) == 0)
+		return 0;
+
+	/* TSO context descriptor setup */
 	if (tx_offload.l4_len == 0) {
 		TX_LOG(DEBUG, "L4 length set to 0");
-		return;
+		return 0;
 	}
 
-	hdr_len = tx_offload.l2_len +
-		tx_offload.l3_len +
-		tx_offload.l4_len;
-	cmd_dtype = IDPF_TX_DESC_DTYPE_FLEX_TSO_CTX |
-		IDPF_TX_FLEX_CTX_DESC_CMD_TSO;
+	hdr_len = tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len;
 	tso_len = mbuf->pkt_len - hdr_len;
 
-	ctx_desc->tso.qw1.cmd_dtype = rte_cpu_to_le_16(cmd_dtype);
-	ctx_desc->tso.qw0.hdr_len = hdr_len;
-	ctx_desc->tso.qw0.mss_rt =
-		rte_cpu_to_le_16((uint16_t)mbuf->tso_segsz &
-				 IDPF_TXD_FLEX_CTX_MSS_RT_M);
-	ctx_desc->tso.qw0.flex_tlen =
-		rte_cpu_to_le_32(tso_len &
-				 IDPF_TXD_FLEX_CTX_MSS_RT_M);
+	*qw0 = rte_cpu_to_le_32(tso_len & IDPF_TXD_FLEX_CTX_MSS_RT_M) |
+	       ((uint64_t)rte_cpu_to_le_16(tso_segsz & IDPF_TXD_FLEX_CTX_MSS_RT_M) << 32) |
+	       ((uint64_t)hdr_len << 48);
+	*qw1 = rte_cpu_to_le_16(cmd_dtype);
+
+	return 1;
 }
 
 RTE_EXPORT_INTERNAL_SYMBOL(idpf_dp_splitq_xmit_pkts)
@@ -933,7 +932,8 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		tx_offload.l4_len = tx_pkt->l4_len;
 		tx_offload.tso_segsz = tx_pkt->tso_segsz;
 		/* Calculate the number of context descriptors needed. */
-		nb_ctx = idpf_calc_context_desc(ol_flags);
+		uint64_t cd_qw0, cd_qw1;
+		nb_ctx = idpf_set_tso_ctx(ol_flags, tx_pkt, tx_offload, &cd_qw0, &cd_qw1);
 
 		/* Calculate the number of TX descriptors needed for
 		 * each packet. For TSO packets, use ci_calc_pkt_desc as
@@ -950,12 +950,10 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 		/* context descriptor */
 		if (nb_ctx != 0) {
-			volatile union idpf_flex_tx_ctx_desc *ctx_desc =
-				(volatile union idpf_flex_tx_ctx_desc *)&txr[tx_id];
+			uint64_t *ctx_desc = RTE_CAST_PTR(uint64_t *, &txr[tx_id]);
 
-			if ((ol_flags & RTE_MBUF_F_TX_TCP_SEG) != 0)
-				idpf_set_splitq_tso_ctx(tx_pkt, tx_offload,
-							ctx_desc);
+			ctx_desc[0] = cd_qw0;
+			ctx_desc[1] = cd_qw1;
 
 			tx_id++;
 			if (tx_id == txq->nb_tx_desc)
@@ -1388,7 +1386,8 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		tx_offload.l4_len = tx_pkt->l4_len;
 		tx_offload.tso_segsz = tx_pkt->tso_segsz;
 		/* Calculate the number of context descriptors needed. */
-		nb_ctx = idpf_calc_context_desc(ol_flags);
+		uint64_t cd_qw0, cd_qw1;
+		nb_ctx = idpf_set_tso_ctx(ol_flags, tx_pkt, tx_offload, &cd_qw0, &cd_qw1);
 
 		/* The number of descriptors that must be allocated for
 		 * a packet. For TSO packets, use ci_calc_pkt_desc as
@@ -1431,9 +1430,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 		if (nb_ctx != 0) {
 			/* Setup TX context descriptor if required */
-			volatile union idpf_flex_tx_ctx_desc *ctx_txd =
-				(volatile union idpf_flex_tx_ctx_desc *)
-				&txr[tx_id];
+			uint64_t *ctx_txd = RTE_CAST_PTR(uint64_t *, &txr[tx_id]);
 
 			txn = &sw_ring[txe->next_id];
 			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
@@ -1442,10 +1439,8 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 				txe->mbuf = NULL;
 			}
 
-			/* TSO enabled */
-			if ((ol_flags & RTE_MBUF_F_TX_TCP_SEG) != 0)
-				idpf_set_splitq_tso_ctx(tx_pkt, tx_offload,
-							ctx_txd);
+			ctx_txd[0] = cd_qw0;
+			ctx_txd[1] = cd_qw1;
 
 			txe->last_id = tx_last;
 			tx_id = txe->next_id;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v3 10/36] net/intel: consolidate checksum mask definition
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
                     ` (8 preceding siblings ...)
  2026-01-30 11:41   ` [PATCH v3 09/36] net/idpf: " Bruce Richardson
@ 2026-01-30 11:41   ` Bruce Richardson
  2026-02-06 11:25     ` Loftus, Ciara
  2026-01-30 11:41   ` [PATCH v3 11/36] net/intel: create common checksum Tx offload function Bruce Richardson
                     ` (25 subsequent siblings)
  35 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-01-30 11:41 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Vladimir Medvedkin, Anatoly Burakov,
	Jingjing Wu, Praveen Shetty

Create a common definition for checksum masks across iavf, idpf, i40e
and ice drivers.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h             | 8 ++++++++
 drivers/net/intel/i40e/i40e_rxtx.c        | 7 +------
 drivers/net/intel/iavf/iavf_rxtx.c        | 2 +-
 drivers/net/intel/iavf/iavf_rxtx.h        | 8 --------
 drivers/net/intel/ice/ice_rxtx.c          | 8 +-------
 drivers/net/intel/idpf/idpf_common_rxtx.c | 4 ++--
 drivers/net/intel/idpf/idpf_common_rxtx.h | 7 +------
 7 files changed, 14 insertions(+), 30 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 01e42303b4..928fad1df5 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -53,6 +53,14 @@
 /* Common maximum data per TX descriptor */
 #define CI_MAX_DATA_PER_TXD     (CI_TXD_QW1_TX_BUF_SZ_M >> CI_TXD_QW1_TX_BUF_SZ_S)
 
+/* Checksum offload mask to identify packets requesting offload */
+#define CI_TX_CKSUM_OFFLOAD_MASK (RTE_MBUF_F_TX_IP_CKSUM |		 \
+				   RTE_MBUF_F_TX_L4_MASK |		 \
+				   RTE_MBUF_F_TX_TCP_SEG |		 \
+				   RTE_MBUF_F_TX_UDP_SEG |		 \
+				   RTE_MBUF_F_TX_OUTER_IP_CKSUM |	 \
+				   RTE_MBUF_F_TX_OUTER_UDP_CKSUM)
+
 /**
  * Common TX offload union for Intel drivers.
  * Supports both basic offloads (l2_len, l3_len, l4_len, tso_segsz) and
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 183b70c63f..a5349990f3 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -53,11 +53,6 @@
 #define I40E_TX_IEEE1588_TMST 0
 #endif
 
-#define I40E_TX_CKSUM_OFFLOAD_MASK (RTE_MBUF_F_TX_IP_CKSUM |		 \
-		RTE_MBUF_F_TX_L4_MASK |		 \
-		RTE_MBUF_F_TX_TCP_SEG |		 \
-		RTE_MBUF_F_TX_OUTER_IP_CKSUM)
-
 #define I40E_TX_OFFLOAD_MASK (RTE_MBUF_F_TX_OUTER_IPV4 |	\
 		RTE_MBUF_F_TX_OUTER_IPV6 |	\
 		RTE_MBUF_F_TX_IPV4 |		\
@@ -1171,7 +1166,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		td_cmd |= CI_TX_DESC_CMD_ICRC;
 
 		/* Enable checksum offloading */
-		if (ol_flags & I40E_TX_CKSUM_OFFLOAD_MASK)
+		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
 			i40e_txd_enable_checksum(ol_flags, &td_cmd,
 						 &td_offset, tx_offload);
 
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index 885d9309cc..3dbcfd5355 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -2596,7 +2596,7 @@ iavf_build_data_desc_cmd_offset_fields(volatile uint64_t *qw1,
 	}
 
 	if ((m->ol_flags &
-	    (IAVF_TX_CKSUM_OFFLOAD_MASK | RTE_MBUF_F_TX_SEC_OFFLOAD)) == 0)
+	    (CI_TX_CKSUM_OFFLOAD_MASK | RTE_MBUF_F_TX_SEC_OFFLOAD)) == 0)
 		goto skip_cksum;
 
 	/* Set MACLEN */
diff --git a/drivers/net/intel/iavf/iavf_rxtx.h b/drivers/net/intel/iavf/iavf_rxtx.h
index 395d97b4ee..cca5c25119 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.h
+++ b/drivers/net/intel/iavf/iavf_rxtx.h
@@ -136,14 +136,6 @@
 
 #define IAVF_TX_MIN_PKT_LEN 17
 
-#define IAVF_TX_CKSUM_OFFLOAD_MASK (		 \
-		RTE_MBUF_F_TX_IP_CKSUM |		 \
-		RTE_MBUF_F_TX_L4_MASK |		 \
-		RTE_MBUF_F_TX_TCP_SEG |          \
-		RTE_MBUF_F_TX_UDP_SEG |          \
-		RTE_MBUF_F_TX_OUTER_IP_CKSUM |   \
-		RTE_MBUF_F_TX_OUTER_UDP_CKSUM)
-
 #define IAVF_TX_OFFLOAD_MASK (  \
 		RTE_MBUF_F_TX_OUTER_IPV6 |		 \
 		RTE_MBUF_F_TX_OUTER_IPV4 |		 \
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index cc442fed75..99751bceb7 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -13,12 +13,6 @@
 #include "../common/rx_vec_x86.h"
 #endif
 
-#define ICE_TX_CKSUM_OFFLOAD_MASK (RTE_MBUF_F_TX_IP_CKSUM |		 \
-		RTE_MBUF_F_TX_L4_MASK |		 \
-		RTE_MBUF_F_TX_TCP_SEG |		 \
-		RTE_MBUF_F_TX_UDP_SEG |		 \
-		RTE_MBUF_F_TX_OUTER_IP_CKSUM)
-
 /**
  * The mbuf dynamic field pointer for protocol extraction metadata.
  */
@@ -3214,7 +3208,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		}
 
 		/* Enable checksum offloading */
-		if (ol_flags & ICE_TX_CKSUM_OFFLOAD_MASK)
+		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
 			ice_txd_enable_checksum(ol_flags, &td_cmd,
 						&td_offset, tx_offload);
 
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 9219ad9047..b34d545a0a 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -945,7 +945,7 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		else
 			nb_used = tx_pkt->nb_segs + nb_ctx;
 
-		if (ol_flags & IDPF_TX_CKSUM_OFFLOAD_MASK)
+		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
 			cmd_dtype = IDPF_TXD_FLEX_FLOW_CMD_CS_EN;
 
 		/* context descriptor */
@@ -1425,7 +1425,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			}
 		}
 
-		if (ol_flags & IDPF_TX_CKSUM_OFFLOAD_MASK)
+		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
 			td_cmd |= IDPF_TX_FLEX_DESC_CMD_CS_EN;
 
 		if (nb_ctx != 0) {
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.h b/drivers/net/intel/idpf/idpf_common_rxtx.h
index b88a87402d..fe7094d434 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.h
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.h
@@ -39,13 +39,8 @@
 #define IDPF_RLAN_CTX_DBUF_S	7
 #define IDPF_RX_MAX_DATA_BUF_SIZE	(16 * 1024 - 128)
 
-#define IDPF_TX_CKSUM_OFFLOAD_MASK (		\
-		RTE_MBUF_F_TX_IP_CKSUM |	\
-		RTE_MBUF_F_TX_L4_MASK |		\
-		RTE_MBUF_F_TX_TCP_SEG)
-
 #define IDPF_TX_OFFLOAD_MASK (			\
-		IDPF_TX_CKSUM_OFFLOAD_MASK |	\
+		CI_TX_CKSUM_OFFLOAD_MASK |	\
 		RTE_MBUF_F_TX_IPV4 |		\
 		RTE_MBUF_F_TX_IPV6)
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v3 11/36] net/intel: create common checksum Tx offload function
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
                     ` (9 preceding siblings ...)
  2026-01-30 11:41   ` [PATCH v3 10/36] net/intel: consolidate checksum mask definition Bruce Richardson
@ 2026-01-30 11:41   ` Bruce Richardson
  2026-02-06 11:37     ` Loftus, Ciara
  2026-01-30 11:41   ` [PATCH v3 12/36] net/intel: create a common scalar Tx function Bruce Richardson
                     ` (24 subsequent siblings)
  35 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-01-30 11:41 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

Since i40e and ice have the same checksum offload logic, merge their
functions into one. Future rework should enable this to be used by more
drivers also.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar_fns.h | 58 +++++++++++++++++++++++
 drivers/net/intel/i40e/i40e_rxtx.c       | 52 +-------------------
 drivers/net/intel/ice/ice_rxtx.c         | 60 +-----------------------
 3 files changed, 60 insertions(+), 110 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index f894cea616..f88ca7f25a 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -64,6 +64,64 @@ ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
 	return 0;
 }
 
+/* Common checksum enable function for Intel drivers (ice, i40e, etc.) */
+static inline void
+ci_txd_enable_checksum(uint64_t ol_flags,
+		       uint32_t *td_cmd,
+		       uint32_t *td_offset,
+		       union ci_tx_offload tx_offload)
+{
+	/* Enable L3 checksum offloads */
+	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
+		*td_offset |= (tx_offload.l3_len >> 2) <<
+			CI_TX_DESC_LEN_IPLEN_S;
+	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
+		*td_offset |= (tx_offload.l3_len >> 2) <<
+			CI_TX_DESC_LEN_IPLEN_S;
+	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
+		*td_offset |= (tx_offload.l3_len >> 2) <<
+			CI_TX_DESC_LEN_IPLEN_S;
+	}
+
+	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
+		*td_offset |= (tx_offload.l4_len >> 2) <<
+			      CI_TX_DESC_LEN_L4_LEN_S;
+		return;
+	}
+
+	if (ol_flags & RTE_MBUF_F_TX_UDP_SEG) {
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
+		*td_offset |= (tx_offload.l4_len >> 2) <<
+			      CI_TX_DESC_LEN_L4_LEN_S;
+		return;
+	}
+
+	/* Enable L4 checksum offloads */
+	switch (ol_flags & RTE_MBUF_F_TX_L4_MASK) {
+	case RTE_MBUF_F_TX_TCP_CKSUM:
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
+		*td_offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
+			      CI_TX_DESC_LEN_L4_LEN_S;
+		break;
+	case RTE_MBUF_F_TX_SCTP_CKSUM:
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
+		*td_offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
+			      CI_TX_DESC_LEN_L4_LEN_S;
+		break;
+	case RTE_MBUF_F_TX_UDP_CKSUM:
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
+		*td_offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
+			      CI_TX_DESC_LEN_L4_LEN_S;
+		break;
+	default:
+		break;
+	}
+}
+
 static inline uint16_t
 ci_div_roundup16(uint16_t x, uint16_t y)
 {
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index a5349990f3..1ad445c47b 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -310,56 +310,6 @@ i40e_parse_tunneling_params(uint64_t ol_flags,
 		*cd_tunneling |= I40E_TXD_CTX_QW0_L4T_CS_MASK;
 }
 
-static inline void
-i40e_txd_enable_checksum(uint64_t ol_flags,
-			uint32_t *td_cmd,
-			uint32_t *td_offset,
-			union ci_tx_offload tx_offload)
-{
-	/* Enable L3 checksum offloads */
-	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
-		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
-		*td_offset |= (tx_offload.l3_len >> 2)
-				<< CI_TX_DESC_LEN_IPLEN_S;
-	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
-		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
-		*td_offset |= (tx_offload.l3_len >> 2)
-				<< CI_TX_DESC_LEN_IPLEN_S;
-	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
-		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
-		*td_offset |= (tx_offload.l3_len >> 2)
-				<< CI_TX_DESC_LEN_IPLEN_S;
-	}
-
-	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
-		*td_offset |= (tx_offload.l4_len >> 2)
-			<< CI_TX_DESC_LEN_L4_LEN_S;
-		return;
-	}
-
-	/* Enable L4 checksum offloads */
-	switch (ol_flags & RTE_MBUF_F_TX_L4_MASK) {
-	case RTE_MBUF_F_TX_TCP_CKSUM:
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
-		*td_offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
-				CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	case RTE_MBUF_F_TX_SCTP_CKSUM:
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
-		*td_offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
-				CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	case RTE_MBUF_F_TX_UDP_CKSUM:
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
-		*td_offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
-				CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	default:
-		break;
-	}
-}
-
 /* Construct the tx flags */
 static inline uint64_t
 i40e_build_ctob(uint32_t td_cmd,
@@ -1167,7 +1117,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		/* Enable checksum offloading */
 		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
-			i40e_txd_enable_checksum(ol_flags, &td_cmd,
+			ci_txd_enable_checksum(ol_flags, &td_cmd,
 						 &td_offset, tx_offload);
 
 		if (nb_ctx) {
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 99751bceb7..8650925577 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -2954,64 +2954,6 @@ ice_parse_tunneling_params(uint64_t ol_flags,
 		*cd_tunneling |= ICE_TXD_CTX_QW0_L4T_CS_M;
 }
 
-static inline void
-ice_txd_enable_checksum(uint64_t ol_flags,
-			uint32_t *td_cmd,
-			uint32_t *td_offset,
-			union ci_tx_offload tx_offload)
-{
-
-	/* Enable L3 checksum offloads */
-	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
-		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
-		*td_offset |= (tx_offload.l3_len >> 2) <<
-			CI_TX_DESC_LEN_IPLEN_S;
-	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
-		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
-		*td_offset |= (tx_offload.l3_len >> 2) <<
-			CI_TX_DESC_LEN_IPLEN_S;
-	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
-		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
-		*td_offset |= (tx_offload.l3_len >> 2) <<
-			CI_TX_DESC_LEN_IPLEN_S;
-	}
-
-	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
-		*td_offset |= (tx_offload.l4_len >> 2) <<
-			      CI_TX_DESC_LEN_L4_LEN_S;
-		return;
-	}
-
-	if (ol_flags & RTE_MBUF_F_TX_UDP_SEG) {
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
-		*td_offset |= (tx_offload.l4_len >> 2) <<
-			      CI_TX_DESC_LEN_L4_LEN_S;
-		return;
-	}
-
-	/* Enable L4 checksum offloads */
-	switch (ol_flags & RTE_MBUF_F_TX_L4_MASK) {
-	case RTE_MBUF_F_TX_TCP_CKSUM:
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
-		*td_offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
-			      CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	case RTE_MBUF_F_TX_SCTP_CKSUM:
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
-		*td_offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
-			      CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	case RTE_MBUF_F_TX_UDP_CKSUM:
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
-		*td_offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
-			      CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	default:
-		break;
-	}
-}
-
 /* Construct the tx flags */
 static inline uint64_t
 ice_build_ctob(uint32_t td_cmd,
@@ -3209,7 +3151,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		/* Enable checksum offloading */
 		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
-			ice_txd_enable_checksum(ol_flags, &td_cmd,
+			ci_txd_enable_checksum(ol_flags, &td_cmd,
 						&td_offset, tx_offload);
 
 		if (nb_ctx) {
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v3 12/36] net/intel: create a common scalar Tx function
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
                     ` (10 preceding siblings ...)
  2026-01-30 11:41   ` [PATCH v3 11/36] net/intel: create common checksum Tx offload function Bruce Richardson
@ 2026-01-30 11:41   ` Bruce Richardson
  2026-02-06 12:01     ` Loftus, Ciara
  2026-01-30 11:41   ` [PATCH v3 13/36] net/i40e: use " Bruce Richardson
                     ` (23 subsequent siblings)
  35 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-01-30 11:41 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

Given the similarities between the transmit functions across various
Intel drivers, make a start on consolidating them by moving the ice Tx
function into common, for reuse by other drivers.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar_fns.h | 218 ++++++++++++++++++
 drivers/net/intel/ice/ice_rxtx.c         | 270 +++++------------------
 2 files changed, 270 insertions(+), 218 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index f88ca7f25a..6d01c14283 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -6,6 +6,7 @@
 #define _COMMON_INTEL_TX_SCALAR_FNS_H_
 
 #include <stdint.h>
+#include <rte_io.h>
 #include <rte_byteorder.h>
 
 /* depends on common Tx definitions. */
@@ -142,5 +143,222 @@ ci_calc_pkt_desc(const struct rte_mbuf *tx_pkt)
 	return count;
 }
 
+typedef uint16_t (*ci_get_ctx_desc_fn)(uint64_t ol_flags, const struct rte_mbuf *mbuf,
+		const union ci_tx_offload *tx_offload, const struct ci_tx_queue *txq,
+		uint64_t *qw0, uint64_t *qw1);
+
+/* gets current timestamp tail index */
+typedef uint16_t (*get_ts_tail_t)(struct ci_tx_queue *txq);
+/* writes a timestamp descriptor and returns new tail index */
+typedef uint16_t (*write_ts_desc_t)(struct ci_tx_queue *txq, struct rte_mbuf *mbuf,
+		uint16_t tx_id, uint16_t ts_id);
+/* writes a timestamp tail index - doorbell */
+typedef void (*write_ts_tail_t)(struct ci_tx_queue *txq, uint16_t ts_id);
+
+struct ci_timesstamp_queue_fns {
+	get_ts_tail_t get_ts_tail;
+	write_ts_desc_t write_ts_desc;
+	write_ts_tail_t write_ts_tail;
+};
+
+static inline uint16_t
+ci_xmit_pkts(struct ci_tx_queue *txq,
+	     struct rte_mbuf **tx_pkts,
+	     uint16_t nb_pkts,
+	     ci_get_ctx_desc_fn get_ctx_desc,
+	     const struct ci_timesstamp_queue_fns *ts_fns)
+{
+	volatile struct ci_tx_desc *ci_tx_ring;
+	volatile struct ci_tx_desc *txd;
+	struct ci_tx_entry *sw_ring;
+	struct ci_tx_entry *txe, *txn;
+	struct rte_mbuf *tx_pkt;
+	struct rte_mbuf *m_seg;
+	uint16_t tx_id;
+	uint16_t ts_id = -1;
+	uint16_t nb_tx;
+	uint16_t nb_used;
+	uint16_t nb_ctx;
+	uint32_t td_cmd = 0;
+	uint32_t td_offset = 0;
+	uint32_t td_tag = 0;
+	uint16_t tx_last;
+	uint16_t slen;
+	uint16_t l2_len;
+	uint64_t buf_dma_addr;
+	uint64_t ol_flags;
+	union ci_tx_offload tx_offload = {0};
+
+	sw_ring = txq->sw_ring;
+	ci_tx_ring = txq->ci_tx_ring;
+	tx_id = txq->tx_tail;
+	txe = &sw_ring[tx_id];
+
+	if (ts_fns != NULL)
+		ts_id = ts_fns->get_ts_tail(txq);
+
+	/* Check if the descriptor ring needs to be cleaned. */
+	if (txq->nb_tx_free < txq->tx_free_thresh)
+		(void)ci_tx_xmit_cleanup(txq);
+
+	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
+		uint64_t cd_qw0, cd_qw1;
+		tx_pkt = *tx_pkts++;
+
+		ol_flags = tx_pkt->ol_flags;
+		td_cmd = CI_TX_DESC_CMD_ICRC;
+		td_tag = 0;
+		l2_len = ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK ?
+				tx_pkt->outer_l2_len : tx_pkt->l2_len;
+		td_offset = (l2_len >> 1) << CI_TX_DESC_LEN_MACLEN_S;
+
+
+		tx_offload.l2_len = tx_pkt->l2_len;
+		tx_offload.l3_len = tx_pkt->l3_len;
+		tx_offload.outer_l2_len = tx_pkt->outer_l2_len;
+		tx_offload.outer_l3_len = tx_pkt->outer_l3_len;
+		tx_offload.l4_len = tx_pkt->l4_len;
+		tx_offload.tso_segsz = tx_pkt->tso_segsz;
+
+		/* Calculate the number of context descriptors needed. */
+		nb_ctx = get_ctx_desc(ol_flags, tx_pkt, &tx_offload, txq, &cd_qw0, &cd_qw1);
+
+		/* The number of descriptors that must be allocated for
+		 * a packet equals to the number of the segments of that
+		 * packet plus the number of context descriptor if needed.
+		 * Recalculate the needed tx descs when TSO enabled in case
+		 * the mbuf data size exceeds max data size that hw allows
+		 * per tx desc.
+		 */
+		if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
+			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
+		else
+			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
+		tx_last = (uint16_t)(tx_id + nb_used - 1);
+
+		/* Circular ring */
+		if (tx_last >= txq->nb_tx_desc)
+			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
+
+		if (nb_used > txq->nb_tx_free) {
+			if (ci_tx_xmit_cleanup(txq) != 0) {
+				if (nb_tx == 0)
+					return 0;
+				goto end_of_tx;
+			}
+			if (unlikely(nb_used > txq->tx_rs_thresh)) {
+				while (nb_used > txq->nb_tx_free) {
+					if (ci_tx_xmit_cleanup(txq) != 0) {
+						if (nb_tx == 0)
+							return 0;
+						goto end_of_tx;
+					}
+				}
+			}
+		}
+
+		/* Descriptor based VLAN insertion */
+		if (ol_flags & (RTE_MBUF_F_TX_VLAN | RTE_MBUF_F_TX_QINQ)) {
+			td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
+			td_tag = tx_pkt->vlan_tci;
+		}
+
+		/* Enable checksum offloading */
+		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
+			ci_txd_enable_checksum(ol_flags, &td_cmd,
+						&td_offset, tx_offload);
+
+		if (nb_ctx) {
+			/* Setup TX context descriptor if required */
+			uint64_t *ctx_txd = RTE_CAST_PTR(uint64_t *, &ci_tx_ring[tx_id]);
+
+			txn = &sw_ring[txe->next_id];
+			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
+			if (txe->mbuf) {
+				rte_pktmbuf_free_seg(txe->mbuf);
+				txe->mbuf = NULL;
+			}
+
+			ctx_txd[0] = cd_qw0;
+			ctx_txd[1] = cd_qw1;
+
+			txe->last_id = tx_last;
+			tx_id = txe->next_id;
+			txe = txn;
+		}
+		m_seg = tx_pkt;
+
+		do {
+			txd = &ci_tx_ring[tx_id];
+			txn = &sw_ring[txe->next_id];
+
+			if (txe->mbuf)
+				rte_pktmbuf_free_seg(txe->mbuf);
+			txe->mbuf = m_seg;
+
+			/* Setup TX Descriptor */
+			slen = m_seg->data_len;
+			buf_dma_addr = rte_mbuf_data_iova(m_seg);
+
+			while ((ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) &&
+					unlikely(slen > CI_MAX_DATA_PER_TXD)) {
+				txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
+				txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+					((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
+					((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+					((uint64_t)CI_MAX_DATA_PER_TXD << CI_TXD_QW1_TX_BUF_SZ_S) |
+					((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
+
+				buf_dma_addr += CI_MAX_DATA_PER_TXD;
+				slen -= CI_MAX_DATA_PER_TXD;
+
+				txe->last_id = tx_last;
+				tx_id = txe->next_id;
+				txe = txn;
+				txd = &ci_tx_ring[tx_id];
+				txn = &sw_ring[txe->next_id];
+			}
+
+			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
+			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
+				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+				((uint64_t)slen << CI_TXD_QW1_TX_BUF_SZ_S) |
+				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
+
+			txe->last_id = tx_last;
+			tx_id = txe->next_id;
+			txe = txn;
+			m_seg = m_seg->next;
+		} while (m_seg);
+
+		/* fill the last descriptor with End of Packet (EOP) bit */
+		td_cmd |= CI_TX_DESC_CMD_EOP;
+		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
+		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
+
+		/* set RS bit on the last descriptor of one packet */
+		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
+			td_cmd |= CI_TX_DESC_CMD_RS;
+
+			/* Update txq RS bit counters */
+			txq->nb_tx_used = 0;
+		}
+		txd->cmd_type_offset_bsz |=
+				rte_cpu_to_le_64(((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S);
+
+		if (ts_fns != NULL)
+			ts_id = ts_fns->write_ts_desc(txq, tx_pkt, tx_id, ts_id);
+	}
+end_of_tx:
+	/* update Tail register */
+	if (ts_fns != NULL)
+		ts_fns->write_ts_tail(txq, ts_id);
+	else
+		rte_write32_wc(tx_id, txq->qtx_tail);
+	txq->tx_tail = tx_id;
+
+	return nb_tx;
+}
 
 #endif /* _COMMON_INTEL_TX_SCALAR_FNS_H_ */
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 8650925577..d805bee4f0 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3050,230 +3050,64 @@ get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
 	return 1;
 }
 
-uint16_t
-ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+static uint16_t
+ice_get_ts_tail(struct ci_tx_queue *txq)
 {
-	struct ci_tx_queue *txq;
-	volatile struct ci_tx_desc *ci_tx_ring;
-	volatile struct ci_tx_desc *txd;
-	struct ci_tx_entry *sw_ring;
-	struct ci_tx_entry *txe, *txn;
-	struct rte_mbuf *tx_pkt;
-	struct rte_mbuf *m_seg;
-	uint16_t tx_id;
-	uint16_t ts_id = -1;
-	uint16_t nb_tx;
-	uint16_t nb_used;
-	uint16_t nb_ctx;
-	uint32_t td_cmd = 0;
-	uint32_t td_offset = 0;
-	uint32_t td_tag = 0;
-	uint16_t tx_last;
-	uint16_t slen;
-	uint16_t l2_len;
-	uint64_t buf_dma_addr;
-	uint64_t ol_flags;
-	union ci_tx_offload tx_offload = {0};
-
-	txq = tx_queue;
-	sw_ring = txq->sw_ring;
-	ci_tx_ring = txq->ci_tx_ring;
-	tx_id = txq->tx_tail;
-	txe = &sw_ring[tx_id];
-
-	if (txq->tsq != NULL && txq->tsq->ts_flag > 0)
-		ts_id = txq->tsq->ts_tail;
-
-	/* Check if the descriptor ring needs to be cleaned. */
-	if (txq->nb_tx_free < txq->tx_free_thresh)
-		(void)ci_tx_xmit_cleanup(txq);
-
-	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
-		uint64_t cd_qw0, cd_qw1;
-		tx_pkt = *tx_pkts++;
-
-		ol_flags = tx_pkt->ol_flags;
-		td_cmd = 0;
-		td_tag = 0;
-		l2_len = ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK ?
-				tx_pkt->outer_l2_len : tx_pkt->l2_len;
-		td_offset = (l2_len >> 1) << CI_TX_DESC_LEN_MACLEN_S;
-
-		tx_offload.l2_len = tx_pkt->l2_len;
-		tx_offload.l3_len = tx_pkt->l3_len;
-		tx_offload.outer_l2_len = tx_pkt->outer_l2_len;
-		tx_offload.outer_l3_len = tx_pkt->outer_l3_len;
-		tx_offload.l4_len = tx_pkt->l4_len;
-		tx_offload.tso_segsz = tx_pkt->tso_segsz;
-
-		/* Calculate the number of context descriptors needed. */
-		nb_ctx = get_context_desc(ol_flags, tx_pkt, &tx_offload, txq, &cd_qw0, &cd_qw1);
-
-		/* The number of descriptors that must be allocated for
-		 * a packet equals to the number of the segments of that
-		 * packet plus the number of context descriptor if needed.
-		 * Recalculate the needed tx descs when TSO enabled in case
-		 * the mbuf data size exceeds max data size that hw allows
-		 * per tx desc.
-		 */
-		if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
-			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
-		else
-			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
-		tx_last = (uint16_t)(tx_id + nb_used - 1);
-
-		/* Circular ring */
-		if (tx_last >= txq->nb_tx_desc)
-			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
-
-		if (nb_used > txq->nb_tx_free) {
-			if (ci_tx_xmit_cleanup(txq) != 0) {
-				if (nb_tx == 0)
-					return 0;
-				goto end_of_tx;
-			}
-			if (unlikely(nb_used > txq->tx_rs_thresh)) {
-				while (nb_used > txq->nb_tx_free) {
-					if (ci_tx_xmit_cleanup(txq) != 0) {
-						if (nb_tx == 0)
-							return 0;
-						goto end_of_tx;
-					}
-				}
-			}
-		}
-
-		/* Descriptor based VLAN insertion */
-		if (ol_flags & (RTE_MBUF_F_TX_VLAN | RTE_MBUF_F_TX_QINQ)) {
-			td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
-			td_tag = tx_pkt->vlan_tci;
-		}
-
-		/* Enable checksum offloading */
-		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
-			ci_txd_enable_checksum(ol_flags, &td_cmd,
-						&td_offset, tx_offload);
-
-		if (nb_ctx) {
-			/* Setup TX context descriptor if required */
-			uint64_t *ctx_txd = RTE_CAST_PTR(uint64_t *, &ci_tx_ring[tx_id]);
-
-			txn = &sw_ring[txe->next_id];
-			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
-			if (txe->mbuf) {
-				rte_pktmbuf_free_seg(txe->mbuf);
-				txe->mbuf = NULL;
-			}
-
-			ctx_txd[0] = cd_qw0;
-			ctx_txd[1] = cd_qw1;
-
-			txe->last_id = tx_last;
-			tx_id = txe->next_id;
-			txe = txn;
-		}
-		m_seg = tx_pkt;
-
-		do {
-			txd = &ci_tx_ring[tx_id];
-			txn = &sw_ring[txe->next_id];
-
-			if (txe->mbuf)
-				rte_pktmbuf_free_seg(txe->mbuf);
-			txe->mbuf = m_seg;
-
-			/* Setup TX Descriptor */
-			slen = m_seg->data_len;
-			buf_dma_addr = rte_mbuf_data_iova(m_seg);
-
-			while ((ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) &&
-					unlikely(slen > CI_MAX_DATA_PER_TXD)) {
-				txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-				txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
-					((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
-					((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
-					((uint64_t)CI_MAX_DATA_PER_TXD << CI_TXD_QW1_TX_BUF_SZ_S) |
-					((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
-
-				buf_dma_addr += CI_MAX_DATA_PER_TXD;
-				slen -= CI_MAX_DATA_PER_TXD;
-
-				txe->last_id = tx_last;
-				tx_id = txe->next_id;
-				txe = txn;
-				txd = &ci_tx_ring[tx_id];
-				txn = &sw_ring[txe->next_id];
-			}
-
-			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
-				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
-				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
-				((uint64_t)slen << CI_TXD_QW1_TX_BUF_SZ_S) |
-				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
-
-			txe->last_id = tx_last;
-			tx_id = txe->next_id;
-			txe = txn;
-			m_seg = m_seg->next;
-		} while (m_seg);
+	return txq->tsq->ts_tail;
+}
 
-		/* fill the last descriptor with End of Packet (EOP) bit */
-		td_cmd |= CI_TX_DESC_CMD_EOP;
-		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
-		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
+static uint16_t
+ice_write_ts_desc(struct ci_tx_queue *txq,
+		  struct rte_mbuf *tx_pkt,
+		  uint16_t tx_id,
+		  uint16_t ts_id)
+{
+	uint64_t txtime = *RTE_MBUF_DYNFIELD(tx_pkt, txq->tsq->ts_offset, uint64_t *);
+	uint32_t tstamp = (uint32_t)(txtime % NS_PER_S) >> ICE_TXTIME_CTX_RESOLUTION_128NS;
+	const uint32_t desc_tx_id = (tx_id == 0) ? txq->nb_tx_desc : tx_id;
+	__le32 ts_desc = rte_cpu_to_le_32(FIELD_PREP(ICE_TXTIME_TX_DESC_IDX_M, desc_tx_id) |
+			FIELD_PREP(ICE_TXTIME_STAMP_M, tstamp));
+
+	txq->tsq->ice_ts_ring[ts_id].tx_desc_idx_tstamp = ts_desc;
+	ts_id++;
+
+	/* To prevent an MDD, when wrapping the tstamp
+	 * ring create additional TS descriptors equal
+	 * to the number of the fetch TS descriptors
+	 * value. HW will merge the TS descriptors with
+	 * the same timestamp value into a single
+	 * descriptor.
+	 */
+	if (ts_id == txq->tsq->nb_ts_desc) {
+		uint16_t fetch = txq->tsq->nb_ts_desc - txq->nb_tx_desc;
+		ts_id = 0;
+		for (; ts_id < fetch; ts_id++)
+			txq->tsq->ice_ts_ring[ts_id].tx_desc_idx_tstamp = ts_desc;
+	}
+	return ts_id;
+}
 
-		/* set RS bit on the last descriptor of one packet */
-		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
-			PMD_TX_LOG(DEBUG,
-				   "Setting RS bit on TXD id="
-				   "%4u (port=%d queue=%d)",
-				   tx_last, txq->port_id, txq->queue_id);
+static void
+ice_write_ts_tail(struct ci_tx_queue *txq, uint16_t ts_tail)
+{
+	ICE_PCI_REG_WRITE(txq->qtx_tail, ts_tail);
+	txq->tsq->ts_tail = ts_tail;
+}
 
-			td_cmd |= CI_TX_DESC_CMD_RS;
+uint16_t
+ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	const struct ci_timesstamp_queue_fns ts_fns = {
+		.get_ts_tail = ice_get_ts_tail,
+		.write_ts_desc = ice_write_ts_desc,
+		.write_ts_tail = ice_write_ts_tail,
+	};
+	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
 
-			/* Update txq RS bit counters */
-			txq->nb_tx_used = 0;
-		}
-		txd->cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S);
-
-		if (txq->tsq != NULL && txq->tsq->ts_flag > 0) {
-			uint64_t txtime = *RTE_MBUF_DYNFIELD(tx_pkt,
-					txq->tsq->ts_offset, uint64_t *);
-			uint32_t tstamp = (uint32_t)(txtime % NS_PER_S) >>
-						ICE_TXTIME_CTX_RESOLUTION_128NS;
-			const uint32_t desc_tx_id = (tx_id == 0) ? txq->nb_tx_desc : tx_id;
-			__le32 ts_desc = rte_cpu_to_le_32(FIELD_PREP(ICE_TXTIME_TX_DESC_IDX_M,
-					desc_tx_id) | FIELD_PREP(ICE_TXTIME_STAMP_M, tstamp));
-			txq->tsq->ice_ts_ring[ts_id].tx_desc_idx_tstamp = ts_desc;
-			ts_id++;
-			/* To prevent an MDD, when wrapping the tstamp
-			 * ring create additional TS descriptors equal
-			 * to the number of the fetch TS descriptors
-			 * value. HW will merge the TS descriptors with
-			 * the same timestamp value into a single
-			 * descriptor.
-			 */
-			if (ts_id == txq->tsq->nb_ts_desc) {
-				uint16_t fetch = txq->tsq->nb_ts_desc - txq->nb_tx_desc;
-				ts_id = 0;
-				for (; ts_id < fetch; ts_id++)
-					txq->tsq->ice_ts_ring[ts_id].tx_desc_idx_tstamp = ts_desc;
-			}
-		}
-	}
-end_of_tx:
-	/* update Tail register */
-	if (txq->tsq != NULL && txq->tsq->ts_flag > 0) {
-		ICE_PCI_REG_WRITE(txq->qtx_tail, ts_id);
-		txq->tsq->ts_tail = ts_id;
-	} else {
-		ICE_PCI_REG_WRITE(txq->qtx_tail, tx_id);
-	}
-	txq->tx_tail = tx_id;
+	if (txq->tsq != NULL && txq->tsq->ts_flag > 0)
+		return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, &ts_fns);
 
-	return nb_tx;
+	return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, NULL);
 }
 
 static __rte_always_inline int
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v3 13/36] net/i40e: use common scalar Tx function
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
                     ` (11 preceding siblings ...)
  2026-01-30 11:41   ` [PATCH v3 12/36] net/intel: create a common scalar Tx function Bruce Richardson
@ 2026-01-30 11:41   ` Bruce Richardson
  2026-01-30 11:41   ` [PATCH v3 14/36] net/intel: add IPsec hooks to common " Bruce Richardson
                     ` (22 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-30 11:41 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

Following earlier rework, the scalar transmit function for i40e can use
the common function previously moved over from ice driver. This saves
hundreds of duplicated lines of code.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/i40e/i40e_rxtx.c | 208 +----------------------------
 1 file changed, 2 insertions(+), 206 deletions(-)

diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 1ad445c47b..81fdeb72cb 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -1018,212 +1018,8 @@ get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
 uint16_t
 i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
-	struct ci_tx_queue *txq;
-	struct ci_tx_entry *sw_ring;
-	struct ci_tx_entry *txe, *txn;
-	volatile struct ci_tx_desc *txd;
-	volatile struct ci_tx_desc *txr;
-	struct rte_mbuf *tx_pkt;
-	struct rte_mbuf *m_seg;
-	uint16_t tx_id;
-	uint16_t nb_tx;
-	uint32_t td_cmd;
-	uint32_t td_offset;
-	uint32_t td_tag;
-	uint64_t ol_flags;
-	uint16_t nb_used;
-	uint16_t nb_ctx;
-	uint16_t tx_last;
-	uint16_t slen;
-	uint16_t l2_len;
-	uint64_t buf_dma_addr;
-	union ci_tx_offload tx_offload = {0};
-
-	txq = tx_queue;
-	sw_ring = txq->sw_ring;
-	txr = txq->ci_tx_ring;
-	tx_id = txq->tx_tail;
-	txe = &sw_ring[tx_id];
-
-	/* Check if the descriptor ring needs to be cleaned. */
-	if (txq->nb_tx_free < txq->tx_free_thresh)
-		(void)ci_tx_xmit_cleanup(txq);
-
-	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
-		tx_pkt = *tx_pkts++;
-		RTE_MBUF_PREFETCH_TO_FREE(txe->mbuf);
-
-		ol_flags = tx_pkt->ol_flags;
-		td_cmd = 0;
-		td_tag = 0;
-		l2_len = ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK ?
-				tx_pkt->outer_l2_len : tx_pkt->l2_len;
-		td_offset = (l2_len >> 1) << CI_TX_DESC_LEN_MACLEN_S;
-		tx_offload.l2_len = tx_pkt->l2_len;
-		tx_offload.l3_len = tx_pkt->l3_len;
-		tx_offload.outer_l2_len = tx_pkt->outer_l2_len;
-		tx_offload.outer_l3_len = tx_pkt->outer_l3_len;
-		tx_offload.l4_len = tx_pkt->l4_len;
-		tx_offload.tso_segsz = tx_pkt->tso_segsz;
-
-		/* Calculate the number of context descriptors needed. */
-		uint64_t cd_qw0 = 0, cd_qw1 = 0;
-		nb_ctx = get_context_desc(ol_flags, tx_pkt, &tx_offload, txq,
-				&cd_qw0, &cd_qw1);
-
-		/**
-		 * The number of descriptors that must be allocated for
-		 * a packet equals to the number of the segments of that
-		 * packet plus 1 context descriptor if needed.
-		 * Recalculate the needed tx descs when TSO enabled in case
-		 * the mbuf data size exceeds max data size that hw allows
-		 * per tx desc.
-		 */
-		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
-		else
-			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
-		tx_last = (uint16_t)(tx_id + nb_used - 1);
-
-		/* Circular ring */
-		if (tx_last >= txq->nb_tx_desc)
-			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
-
-		if (nb_used > txq->nb_tx_free) {
-			if (ci_tx_xmit_cleanup(txq) != 0) {
-				if (nb_tx == 0)
-					return 0;
-				goto end_of_tx;
-			}
-			if (unlikely(nb_used > txq->tx_rs_thresh)) {
-				while (nb_used > txq->nb_tx_free) {
-					if (ci_tx_xmit_cleanup(txq) != 0) {
-						if (nb_tx == 0)
-							return 0;
-						goto end_of_tx;
-					}
-				}
-			}
-		}
-
-		/* Descriptor based VLAN insertion */
-		if (ol_flags & (RTE_MBUF_F_TX_VLAN | RTE_MBUF_F_TX_QINQ)) {
-			td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
-			td_tag = tx_pkt->vlan_tci;
-		}
-
-		/* Always enable CRC offload insertion */
-		td_cmd |= CI_TX_DESC_CMD_ICRC;
-
-		/* Enable checksum offloading */
-		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
-			ci_txd_enable_checksum(ol_flags, &td_cmd,
-						 &td_offset, tx_offload);
-
-		if (nb_ctx) {
-			/* Setup TX context descriptor if required */
-			uint64_t *desc = RTE_CAST_PTR(uint64_t *, &txr[tx_id]);
-
-			txn = &sw_ring[txe->next_id];
-			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
-			if (txe->mbuf != NULL) {
-				rte_pktmbuf_free_seg(txe->mbuf);
-				txe->mbuf = NULL;
-			}
-
-			desc[0] = cd_qw0;
-			desc[1] = cd_qw1;
-
-			PMD_TX_LOG(DEBUG, "mbuf: %p, TCD[%u]: "
-				"qw0: %#"PRIx64"; "
-				"qw1: %#"PRIx64";",
-				tx_pkt, tx_id, cd_qw0, cd_qw1);
-
-			txe->last_id = tx_last;
-			tx_id = txe->next_id;
-			txe = txn;
-		}
-
-		m_seg = tx_pkt;
-		do {
-			txd = &txr[tx_id];
-			txn = &sw_ring[txe->next_id];
-
-			if (txe->mbuf)
-				rte_pktmbuf_free_seg(txe->mbuf);
-			txe->mbuf = m_seg;
-
-			/* Setup TX Descriptor */
-			slen = m_seg->data_len;
-			buf_dma_addr = rte_mbuf_data_iova(m_seg);
-
-			while ((ol_flags & RTE_MBUF_F_TX_TCP_SEG) &&
-				unlikely(slen > CI_MAX_DATA_PER_TXD)) {
-				txd->buffer_addr =
-					rte_cpu_to_le_64(buf_dma_addr);
-				txd->cmd_type_offset_bsz =
-					i40e_build_ctob(td_cmd,
-					td_offset, CI_MAX_DATA_PER_TXD,
-					td_tag);
-
-				buf_dma_addr += CI_MAX_DATA_PER_TXD;
-				slen -= CI_MAX_DATA_PER_TXD;
-
-				txe->last_id = tx_last;
-				tx_id = txe->next_id;
-				txe = txn;
-				txd = &txr[tx_id];
-				txn = &sw_ring[txe->next_id];
-			}
-			PMD_TX_LOG(DEBUG, "mbuf: %p, TDD[%u]: "
-				"buf_dma_addr: %#"PRIx64"; "
-				"td_cmd: %#x; "
-				"td_offset: %#x; "
-				"td_len: %u; "
-				"td_tag: %#x;",
-				tx_pkt, tx_id, buf_dma_addr,
-				td_cmd, td_offset, slen, td_tag);
-
-			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-			txd->cmd_type_offset_bsz = i40e_build_ctob(td_cmd,
-						td_offset, slen, td_tag);
-			txe->last_id = tx_last;
-			tx_id = txe->next_id;
-			txe = txn;
-			m_seg = m_seg->next;
-		} while (m_seg != NULL);
-
-		/* The last packet data descriptor needs End Of Packet (EOP) */
-		td_cmd |= CI_TX_DESC_CMD_EOP;
-		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
-		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
-
-		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
-			PMD_TX_LOG(DEBUG,
-				   "Setting RS bit on TXD id="
-				   "%4u (port=%d queue=%d)",
-				   tx_last, txq->port_id, txq->queue_id);
-
-			td_cmd |= CI_TX_DESC_CMD_RS;
-
-			/* Update txq RS bit counters */
-			txq->nb_tx_used = 0;
-		}
-
-		txd->cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S);
-	}
-
-end_of_tx:
-	PMD_TX_LOG(DEBUG, "port_id=%u queue_id=%u tx_tail=%u nb_tx=%u",
-		   (unsigned) txq->port_id, (unsigned) txq->queue_id,
-		   (unsigned) tx_id, (unsigned) nb_tx);
-
-	rte_io_wmb();
-	I40E_PCI_REG_WC_WRITE_RELAXED(txq->qtx_tail, tx_id);
-	txq->tx_tail = tx_id;
-
-	return nb_tx;
+	/* i40e does not support timestamp queues, so pass NULL for ts_fns */
+	return ci_xmit_pkts(tx_queue, tx_pkts, nb_pkts, get_context_desc, NULL);
 }
 
 static __rte_always_inline int
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v3 14/36] net/intel: add IPsec hooks to common Tx function
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
                     ` (12 preceding siblings ...)
  2026-01-30 11:41   ` [PATCH v3 13/36] net/i40e: use " Bruce Richardson
@ 2026-01-30 11:41   ` Bruce Richardson
  2026-01-30 11:41   ` [PATCH v3 15/36] net/intel: support configurable VLAN tag insertion on Tx Bruce Richardson
                     ` (21 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-30 11:41 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

The iavf driver has IPsec offload support on Tx, so add hooks to the
common Tx function to support that. Do so in a way that has zero
performance impact for drivers which do not have IPSec support, by
passing in compile-time NULL constants for the function pointers, which
can be optimized away by the compiler.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar_fns.h | 60 ++++++++++++++++++++++--
 drivers/net/intel/i40e/i40e_rxtx.c       |  4 +-
 drivers/net/intel/ice/ice_rxtx.c         |  4 +-
 3 files changed, 60 insertions(+), 8 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index 6d01c14283..2fe503a548 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -147,6 +147,24 @@ typedef uint16_t (*ci_get_ctx_desc_fn)(uint64_t ol_flags, const struct rte_mbuf
 		const union ci_tx_offload *tx_offload, const struct ci_tx_queue *txq,
 		uint64_t *qw0, uint64_t *qw1);
 
+/* gets IPsec descriptor information and returns number of descriptors needed (0 or 1) */
+typedef uint16_t (*get_ipsec_desc_t)(const struct rte_mbuf *mbuf,
+		const struct ci_tx_queue *txq,
+		void **ipsec_metadata,
+		uint64_t *qw0,
+		uint64_t *qw1);
+/* calculates segment length for IPsec + TSO combinations */
+typedef uint16_t (*calc_ipsec_segment_len_t)(const struct rte_mbuf *mb_seg,
+		uint64_t ol_flags,
+		const void *ipsec_metadata,
+		uint16_t tlen);
+
+/** IPsec descriptor operations for drivers that support inline IPsec crypto. */
+struct ci_ipsec_ops {
+	get_ipsec_desc_t get_ipsec_desc;
+	calc_ipsec_segment_len_t calc_segment_len;
+};
+
 /* gets current timestamp tail index */
 typedef uint16_t (*get_ts_tail_t)(struct ci_tx_queue *txq);
 /* writes a timestamp descriptor and returns new tail index */
@@ -166,6 +184,7 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
 	     uint16_t nb_pkts,
 	     ci_get_ctx_desc_fn get_ctx_desc,
+	     const struct ci_ipsec_ops *ipsec_ops,
 	     const struct ci_timesstamp_queue_fns *ts_fns)
 {
 	volatile struct ci_tx_desc *ci_tx_ring;
@@ -202,6 +221,9 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 		(void)ci_tx_xmit_cleanup(txq);
 
 	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
+		void *ipsec_md = NULL;
+		uint16_t nb_ipsec = 0;
+		uint64_t ipsec_qw0 = 0, ipsec_qw1 = 0;
 		uint64_t cd_qw0, cd_qw1;
 		tx_pkt = *tx_pkts++;
 
@@ -223,17 +245,22 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 		/* Calculate the number of context descriptors needed. */
 		nb_ctx = get_ctx_desc(ol_flags, tx_pkt, &tx_offload, txq, &cd_qw0, &cd_qw1);
 
+		/* Get IPsec descriptor information if IPsec ops provided */
+		if (ipsec_ops != NULL)
+			nb_ipsec = ipsec_ops->get_ipsec_desc(tx_pkt, txq, &ipsec_md,
+					&ipsec_qw0, &ipsec_qw1);
+
 		/* The number of descriptors that must be allocated for
 		 * a packet equals to the number of the segments of that
-		 * packet plus the number of context descriptor if needed.
+		 * packet plus the number of context and IPsec descriptors if needed.
 		 * Recalculate the needed tx descs when TSO enabled in case
 		 * the mbuf data size exceeds max data size that hw allows
 		 * per tx desc.
 		 */
 		if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
-			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
+			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx + nb_ipsec);
 		else
-			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
+			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx + nb_ipsec);
 		tx_last = (uint16_t)(tx_id + nb_used - 1);
 
 		/* Circular ring */
@@ -286,6 +313,26 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			tx_id = txe->next_id;
 			txe = txn;
 		}
+
+		if (ipsec_ops != NULL && nb_ipsec > 0) {
+			/* Setup TX IPsec descriptor if required */
+			uint64_t *ipsec_txd = RTE_CAST_PTR(uint64_t *, &ci_tx_ring[tx_id]);
+
+			txn = &sw_ring[txe->next_id];
+			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
+			if (txe->mbuf) {
+				rte_pktmbuf_free_seg(txe->mbuf);
+				txe->mbuf = NULL;
+			}
+
+			ipsec_txd[0] = ipsec_qw0;
+			ipsec_txd[1] = ipsec_qw1;
+
+			txe->last_id = tx_last;
+			tx_id = txe->next_id;
+			txe = txn;
+		}
+
 		m_seg = tx_pkt;
 
 		do {
@@ -297,7 +344,12 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			txe->mbuf = m_seg;
 
 			/* Setup TX Descriptor */
-			slen = m_seg->data_len;
+			/* Calculate segment length, using IPsec callback if provided */
+			if (ipsec_ops != NULL)
+				slen = ipsec_ops->calc_segment_len(m_seg, ol_flags, ipsec_md, 0);
+			else
+				slen = m_seg->data_len;
+
 			buf_dma_addr = rte_mbuf_data_iova(m_seg);
 
 			while ((ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) &&
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 81fdeb72cb..e7dd637931 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -1018,8 +1018,8 @@ get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
 uint16_t
 i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
-	/* i40e does not support timestamp queues, so pass NULL for ts_fns */
-	return ci_xmit_pkts(tx_queue, tx_pkts, nb_pkts, get_context_desc, NULL);
+	/* i40e does not support IPsec or timestamp queues, so pass NULL for both */
+	return ci_xmit_pkts(tx_queue, tx_pkts, nb_pkts, get_context_desc, NULL, NULL);
 }
 
 static __rte_always_inline int
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index d805bee4f0..eea83bbae1 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3105,9 +3105,9 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
 
 	if (txq->tsq != NULL && txq->tsq->ts_flag > 0)
-		return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, &ts_fns);
+		return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, NULL, &ts_fns);
 
-	return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, NULL);
+	return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, NULL, NULL);
 }
 
 static __rte_always_inline int
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v3 15/36] net/intel: support configurable VLAN tag insertion on Tx
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
                     ` (13 preceding siblings ...)
  2026-01-30 11:41   ` [PATCH v3 14/36] net/intel: add IPsec hooks to common " Bruce Richardson
@ 2026-01-30 11:41   ` Bruce Richardson
  2026-01-30 11:41   ` [PATCH v3 16/36] net/iavf: use common scalar Tx function Bruce Richardson
                     ` (20 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-30 11:41 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

Make the VLAN tag insertion logic configurable in the common code, as to
where inner/outer tags get placed.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h            | 10 ++++++++++
 drivers/net/intel/common/tx_scalar_fns.h |  9 +++++++--
 drivers/net/intel/i40e/i40e_rxtx.c       |  6 +++---
 drivers/net/intel/ice/ice_rxtx.c         |  5 +++--
 4 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 928fad1df5..faa9bb9559 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -45,6 +45,16 @@
 #define CI_TX_CTX_DESC_TSYN             0x02
 #define CI_TX_CTX_DESC_IL2TAG2          0x04
 
+/**
+ * L2TAG1 Field Source Selection
+ * Specifies which mbuf VLAN field to use for the L2TAG1 field in data descriptors.
+ * Context descriptor VLAN handling (L2TAG2) is managed by driver-specific callbacks.
+ */
+enum ci_tx_l2tag1_field {
+	CI_VLAN_IN_L2TAG1,       /**< For VLAN (not QinQ), use L2Tag1 field in data desc */
+	CI_VLAN_IN_L2TAG2,       /**< For VLAN (not QinQ), use L2Tag2 field in ctx desc */
+};
+
 /* Common TX Descriptor Length Field Shifts */
 #define CI_TX_DESC_LEN_MACLEN_S         0  /* 7 BITS */
 #define CI_TX_DESC_LEN_IPLEN_S          7  /* 7 BITS */
diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index 2fe503a548..92eed4a476 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -183,6 +183,7 @@ static inline uint16_t
 ci_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
 	     uint16_t nb_pkts,
+	     enum ci_tx_l2tag1_field l2tag1_field,
 	     ci_get_ctx_desc_fn get_ctx_desc,
 	     const struct ci_ipsec_ops *ipsec_ops,
 	     const struct ci_timesstamp_queue_fns *ts_fns)
@@ -284,8 +285,12 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			}
 		}
 
-		/* Descriptor based VLAN insertion */
-		if (ol_flags & (RTE_MBUF_F_TX_VLAN | RTE_MBUF_F_TX_QINQ)) {
+		/* Descriptor based VLAN/QinQ insertion */
+		/* for single vlan offload, only insert in data desc with VLAN_IN_L2TAG1 is set
+		 * for qinq offload, we always put inner tag in L2Tag1
+		 */
+		if (((ol_flags & RTE_MBUF_F_TX_VLAN) && l2tag1_field == CI_VLAN_IN_L2TAG1) ||
+				(ol_flags & RTE_MBUF_F_TX_QINQ)) {
 			td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
 			td_tag = tx_pkt->vlan_tci;
 		}
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index e7dd637931..4e2071c024 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -1004,8 +1004,7 @@ get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
 	/* TX context descriptor based double VLAN insert */
 	if (ol_flags & RTE_MBUF_F_TX_QINQ) {
 		cd_l2tag2 = tx_pkt->vlan_tci_outer;
-		cd_type_cmd_tso_mss |=
-				((uint64_t)I40E_TX_CTX_DESC_IL2TAG2 << I40E_TXD_CTX_QW1_CMD_SHIFT);
+		cd_type_cmd_tso_mss |= (I40E_TX_CTX_DESC_IL2TAG2 << I40E_TXD_CTX_QW1_CMD_SHIFT);
 	}
 
 	*qw0 = rte_cpu_to_le_32(cd_tunneling_params) |
@@ -1019,7 +1018,8 @@ uint16_t
 i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
 	/* i40e does not support IPsec or timestamp queues, so pass NULL for both */
-	return ci_xmit_pkts(tx_queue, tx_pkts, nb_pkts, get_context_desc, NULL, NULL);
+	return ci_xmit_pkts(tx_queue, tx_pkts, nb_pkts, CI_VLAN_IN_L2TAG1,
+			get_context_desc, NULL, NULL);
 }
 
 static __rte_always_inline int
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index eea83bbae1..79b021a58f 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3105,9 +3105,10 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
 
 	if (txq->tsq != NULL && txq->tsq->ts_flag > 0)
-		return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, NULL, &ts_fns);
+		return ci_xmit_pkts(txq, tx_pkts, nb_pkts, CI_VLAN_IN_L2TAG1,
+				get_context_desc, NULL, &ts_fns);
 
-	return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, NULL, NULL);
+	return ci_xmit_pkts(txq, tx_pkts, nb_pkts, CI_VLAN_IN_L2TAG1, get_context_desc, NULL, NULL);
 }
 
 static __rte_always_inline int
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v3 16/36] net/iavf: use common scalar Tx function
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
                     ` (14 preceding siblings ...)
  2026-01-30 11:41   ` [PATCH v3 15/36] net/intel: support configurable VLAN tag insertion on Tx Bruce Richardson
@ 2026-01-30 11:41   ` Bruce Richardson
  2026-01-30 11:41   ` [PATCH v3 17/36] net/i40e: document requirement for QinQ support Bruce Richardson
                     ` (19 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-30 11:41 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Vladimir Medvedkin

Now that the common scalar Tx function has all necessary hooks for the
features supported by the iavf driver, use the common function to avoid
duplicated code.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar_fns.h |   3 +-
 drivers/net/intel/iavf/iavf_rxtx.c       | 534 +++++------------------
 2 files changed, 111 insertions(+), 426 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index 92eed4a476..5e301699be 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -231,7 +231,8 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 		ol_flags = tx_pkt->ol_flags;
 		td_cmd = CI_TX_DESC_CMD_ICRC;
 		td_tag = 0;
-		l2_len = ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK ?
+		l2_len = (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK &&
+					!(ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD)) ?
 				tx_pkt->outer_l2_len : tx_pkt->l2_len;
 		td_offset = (l2_len >> 1) << CI_TX_DESC_LEN_MACLEN_S;
 
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index 3dbcfd5355..2ea00e1975 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -2326,7 +2326,7 @@ iavf_recv_pkts_bulk_alloc(void *rx_queue,
 
 /* Check if the context descriptor is needed for TX offloading */
 static inline uint16_t
-iavf_calc_context_desc(struct rte_mbuf *mb, uint8_t vlan_flag)
+iavf_calc_context_desc(const struct rte_mbuf *mb, uint8_t vlan_flag)
 {
 	uint64_t flags = mb->ol_flags;
 	if (flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG |
@@ -2344,44 +2344,7 @@ iavf_calc_context_desc(struct rte_mbuf *mb, uint8_t vlan_flag)
 }
 
 static inline void
-iavf_fill_ctx_desc_cmd_field(volatile uint64_t *field, struct rte_mbuf *m,
-		uint8_t vlan_flag)
-{
-	uint64_t cmd = 0;
-
-	/* TSO enabled */
-	if (m->ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
-		cmd = CI_TX_CTX_DESC_TSO << IAVF_TXD_CTX_QW1_CMD_SHIFT;
-
-	if ((m->ol_flags & RTE_MBUF_F_TX_VLAN &&
-			vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2) ||
-			m->ol_flags & RTE_MBUF_F_TX_QINQ) {
-		cmd |= CI_TX_CTX_DESC_IL2TAG2
-			<< IAVF_TXD_CTX_QW1_CMD_SHIFT;
-	}
-
-	if (IAVF_CHECK_TX_LLDP(m))
-		cmd |= IAVF_TX_CTX_DESC_SWTCH_UPLINK
-			<< IAVF_TXD_CTX_QW1_CMD_SHIFT;
-
-	*field |= cmd;
-}
-
-static inline void
-iavf_fill_ctx_desc_ipsec_field(volatile uint64_t *field,
-	struct iavf_ipsec_crypto_pkt_metadata *ipsec_md)
-{
-	uint64_t ipsec_field =
-		(uint64_t)ipsec_md->ctx_desc_ipsec_params <<
-			IAVF_TXD_CTX_QW1_IPSEC_PARAMS_CIPHERBLK_SHIFT;
-
-	*field |= ipsec_field;
-}
-
-
-static inline void
-iavf_fill_ctx_desc_tunnelling_field(volatile uint64_t *qw0,
-		const struct rte_mbuf *m)
+iavf_fill_ctx_desc_tunnelling_field(uint64_t *qw0, const struct rte_mbuf *m)
 {
 	uint64_t eip_typ = IAVF_TX_CTX_DESC_EIPT_NONE;
 	uint64_t eip_len = 0;
@@ -2456,7 +2419,7 @@ iavf_fill_ctx_desc_tunnelling_field(volatile uint64_t *qw0,
 
 static inline uint16_t
 iavf_fill_ctx_desc_segmentation_field(volatile uint64_t *field,
-	struct rte_mbuf *m, struct iavf_ipsec_crypto_pkt_metadata *ipsec_md)
+	const struct rte_mbuf *m, struct iavf_ipsec_crypto_pkt_metadata *ipsec_md)
 {
 	uint64_t segmentation_field = 0;
 	uint64_t total_length = 0;
@@ -2495,59 +2458,31 @@ struct iavf_tx_context_desc_qws {
 	__le64 qw1;
 };
 
-static inline void
-iavf_fill_context_desc(volatile struct iavf_tx_context_desc *desc,
-	struct rte_mbuf *m, struct iavf_ipsec_crypto_pkt_metadata *ipsec_md,
-	uint16_t *tlen, uint8_t vlan_flag)
+/* IPsec callback for ci_xmit_pkts - gets IPsec descriptor information */
+static uint16_t
+iavf_get_ipsec_desc(const struct rte_mbuf *mbuf, const struct ci_tx_queue *txq,
+		    void **ipsec_metadata, uint64_t *qw0, uint64_t *qw1)
 {
-	volatile struct iavf_tx_context_desc_qws *desc_qws =
-			(volatile struct iavf_tx_context_desc_qws *)desc;
-	/* fill descriptor type field */
-	desc_qws->qw1 = IAVF_TX_DESC_DTYPE_CONTEXT;
-
-	/* fill command field */
-	iavf_fill_ctx_desc_cmd_field(&desc_qws->qw1, m, vlan_flag);
-
-	/* fill segmentation field */
-	if (m->ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) {
-		/* fill IPsec field */
-		if (m->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD)
-			iavf_fill_ctx_desc_ipsec_field(&desc_qws->qw1,
-				ipsec_md);
-
-		*tlen = iavf_fill_ctx_desc_segmentation_field(&desc_qws->qw1,
-				m, ipsec_md);
-	}
-
-	/* fill tunnelling field */
-	if (m->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)
-		iavf_fill_ctx_desc_tunnelling_field(&desc_qws->qw0, m);
-	else
-		desc_qws->qw0 = 0;
+	struct iavf_ipsec_crypto_pkt_metadata *md;
 
-	desc_qws->qw0 = rte_cpu_to_le_64(desc_qws->qw0);
-	desc_qws->qw1 = rte_cpu_to_le_64(desc_qws->qw1);
+	if (!(mbuf->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD))
+		return 0;
 
-	/* vlan_flag specifies VLAN tag location for VLAN, and outer tag location for QinQ. */
-	if (m->ol_flags & RTE_MBUF_F_TX_QINQ)
-		desc->l2tag2 = vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2 ? m->vlan_tci_outer :
-						m->vlan_tci;
-	else if (m->ol_flags & RTE_MBUF_F_TX_VLAN && vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2)
-		desc->l2tag2 = m->vlan_tci;
-}
+	md = RTE_MBUF_DYNFIELD(mbuf, txq->ipsec_crypto_pkt_md_offset,
+				     struct iavf_ipsec_crypto_pkt_metadata *);
+	if (!md)
+		return 0;
 
+	*ipsec_metadata = md;
 
-static inline void
-iavf_fill_ipsec_desc(volatile struct iavf_tx_ipsec_desc *desc,
-	const struct iavf_ipsec_crypto_pkt_metadata *md, uint16_t *ipsec_len)
-{
-	desc->qw0 = rte_cpu_to_le_64(((uint64_t)md->l4_payload_len <<
+	/* Fill IPsec descriptor using existing logic */
+	*qw0 = rte_cpu_to_le_64(((uint64_t)md->l4_payload_len <<
 		IAVF_IPSEC_TX_DESC_QW0_L4PAYLEN_SHIFT) |
 		((uint64_t)md->esn << IAVF_IPSEC_TX_DESC_QW0_IPSECESN_SHIFT) |
 		((uint64_t)md->esp_trailer_len <<
 				IAVF_IPSEC_TX_DESC_QW0_TRAILERLEN_SHIFT));
 
-	desc->qw1 = rte_cpu_to_le_64(((uint64_t)md->sa_idx <<
+	*qw1 = rte_cpu_to_le_64(((uint64_t)md->sa_idx <<
 		IAVF_IPSEC_TX_DESC_QW1_IPSECSA_SHIFT) |
 		((uint64_t)md->next_proto <<
 				IAVF_IPSEC_TX_DESC_QW1_IPSECNH_SHIFT) |
@@ -2556,143 +2491,106 @@ iavf_fill_ipsec_desc(volatile struct iavf_tx_ipsec_desc *desc,
 		((uint64_t)(md->ol_flags & IAVF_IPSEC_CRYPTO_OL_FLAGS_NATT ?
 				1ULL : 0ULL) <<
 				IAVF_IPSEC_TX_DESC_QW1_UDP_SHIFT) |
-		(uint64_t)IAVF_TX_DESC_DTYPE_IPSEC);
+		((uint64_t)IAVF_TX_DESC_DTYPE_IPSEC <<
+				CI_TXD_QW1_DTYPE_S));
 
-	/**
-	 * TODO: Pre-calculate this in the Session initialization
-	 *
-	 * Calculate IPsec length required in data descriptor func when TSO
-	 * offload is enabled
-	 */
-	*ipsec_len = sizeof(struct rte_esp_hdr) + (md->len_iv >> 2) +
-			(md->ol_flags & IAVF_IPSEC_CRYPTO_OL_FLAGS_NATT ?
-			sizeof(struct rte_udp_hdr) : 0);
+	return 1; /* One IPsec descriptor needed */
 }
 
-static inline void
-iavf_build_data_desc_cmd_offset_fields(volatile uint64_t *qw1,
-		struct rte_mbuf *m, uint8_t vlan_flag)
+/* IPsec callback for ci_xmit_pkts - calculates segment length for IPsec+TSO */
+static uint16_t
+iavf_calc_ipsec_segment_len(const struct rte_mbuf *mb_seg, uint64_t ol_flags,
+			    const void *ipsec_metadata, uint16_t tlen)
 {
-	uint64_t command = 0;
-	uint64_t offset = 0;
-	uint64_t l2tag1 = 0;
-
-	*qw1 = CI_TX_DESC_DTYPE_DATA;
-
-	command = (uint64_t)CI_TX_DESC_CMD_ICRC;
-
-	/* Descriptor based VLAN insertion */
-	if ((vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1) &&
-			m->ol_flags & RTE_MBUF_F_TX_VLAN) {
-		command |= (uint64_t)CI_TX_DESC_CMD_IL2TAG1;
-		l2tag1 |= m->vlan_tci;
-	}
-
-	/* Descriptor based QinQ insertion. vlan_flag specifies outer tag location. */
-	if (m->ol_flags & RTE_MBUF_F_TX_QINQ) {
-		command |= (uint64_t)CI_TX_DESC_CMD_IL2TAG1;
-		l2tag1 = vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1 ? m->vlan_tci_outer :
-									m->vlan_tci;
+	const struct iavf_ipsec_crypto_pkt_metadata *ipsec_md = ipsec_metadata;
+
+	if ((ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD) &&
+	    (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))) {
+		uint16_t ipseclen = ipsec_md ? (ipsec_md->esp_trailer_len +
+						ipsec_md->len_iv) : 0;
+		uint16_t slen = tlen + mb_seg->l2_len + mb_seg->l3_len +
+				mb_seg->outer_l3_len + ipseclen;
+		if (ol_flags & RTE_MBUF_F_TX_L4_MASK)
+			slen += mb_seg->l4_len;
+		return slen;
 	}
 
-	if ((m->ol_flags &
-	    (CI_TX_CKSUM_OFFLOAD_MASK | RTE_MBUF_F_TX_SEC_OFFLOAD)) == 0)
-		goto skip_cksum;
+	return mb_seg->data_len;
+}
 
-	/* Set MACLEN */
-	if (m->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK &&
-			!(m->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD))
-		offset |= (m->outer_l2_len >> 1)
-			<< CI_TX_DESC_LEN_MACLEN_S;
-	else
-		offset |= (m->l2_len >> 1)
-			<< CI_TX_DESC_LEN_MACLEN_S;
+/* Context descriptor callback for ci_xmit_pkts */
+static uint16_t
+iavf_get_context_desc(uint64_t ol_flags, const struct rte_mbuf *mbuf,
+		      const union ci_tx_offload *tx_offload __rte_unused,
+		      const struct ci_tx_queue *txq,
+		      uint64_t *qw0, uint64_t *qw1)
+{
+	uint8_t iavf_vlan_flag;
+	uint16_t cd_l2tag2 = 0;
+	uint64_t cd_type_cmd = IAVF_TX_DESC_DTYPE_CONTEXT;
+	uint64_t cd_tunneling_params = 0;
+	uint16_t tlen = 0;
+	struct iavf_ipsec_crypto_pkt_metadata *ipsec_md = NULL;
+
+	/* Use IAVF-specific vlan_flag from txq */
+	iavf_vlan_flag = txq->vlan_flag;
+
+	/* Check if context descriptor is needed using existing IAVF logic */
+	if (!iavf_calc_context_desc(mbuf, iavf_vlan_flag))
+		return 0;
 
-	/* Enable L3 checksum offloading inner */
-	if (m->ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
-		if (m->ol_flags & RTE_MBUF_F_TX_IPV4) {
-			command |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
-			offset |= (m->l3_len >> 2) << CI_TX_DESC_LEN_IPLEN_S;
-		}
-	} else if (m->ol_flags & RTE_MBUF_F_TX_IPV4) {
-		command |= CI_TX_DESC_CMD_IIPT_IPV4;
-		offset |= (m->l3_len >> 2) << CI_TX_DESC_LEN_IPLEN_S;
-	} else if (m->ol_flags & RTE_MBUF_F_TX_IPV6) {
-		command |= CI_TX_DESC_CMD_IIPT_IPV6;
-		offset |= (m->l3_len >> 2) << CI_TX_DESC_LEN_IPLEN_S;
+	/* Get IPsec metadata if needed */
+	if (ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD) {
+		ipsec_md = RTE_MBUF_DYNFIELD(mbuf, txq->ipsec_crypto_pkt_md_offset,
+					     struct iavf_ipsec_crypto_pkt_metadata *);
 	}
 
-	if (m->ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) {
-		if (m->ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-			command |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
-		else
-			command |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
-		offset |= (m->l4_len >> 2) <<
-			      CI_TX_DESC_LEN_L4_LEN_S;
+	/* TSO command field */
+	if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) {
+		cd_type_cmd |= (uint64_t)CI_TX_CTX_DESC_TSO << IAVF_TXD_CTX_QW1_CMD_SHIFT;
 
-		*qw1 = rte_cpu_to_le_64((((uint64_t)command <<
-			IAVF_TXD_DATA_QW1_CMD_SHIFT) & IAVF_TXD_DATA_QW1_CMD_MASK) |
-			(((uint64_t)offset << IAVF_TXD_DATA_QW1_OFFSET_SHIFT) &
-			IAVF_TXD_DATA_QW1_OFFSET_MASK) |
-			((uint64_t)l2tag1 << IAVF_TXD_DATA_QW1_L2TAG1_SHIFT));
+		/* IPsec field for TSO */
+		if (ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD && ipsec_md) {
+			uint64_t ipsec_field = (uint64_t)ipsec_md->ctx_desc_ipsec_params <<
+				IAVF_TXD_CTX_QW1_IPSEC_PARAMS_CIPHERBLK_SHIFT;
+			cd_type_cmd |= ipsec_field;
+		}
 
-		return;
+		/* TSO segmentation field */
+		tlen = iavf_fill_ctx_desc_segmentation_field(&cd_type_cmd,
+							     mbuf, ipsec_md);
+		(void)tlen; /* Suppress unused variable warning */
 	}
 
-	/* Enable L4 checksum offloads */
-	switch (m->ol_flags & RTE_MBUF_F_TX_L4_MASK) {
-	case RTE_MBUF_F_TX_TCP_CKSUM:
-		command |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
-		offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
-				CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	case RTE_MBUF_F_TX_SCTP_CKSUM:
-		command |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
-		offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
-				CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	case RTE_MBUF_F_TX_UDP_CKSUM:
-		command |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
-		offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
-				CI_TX_DESC_LEN_L4_LEN_S;
-		break;
+	/* VLAN field for L2TAG2 */
+	if ((ol_flags & RTE_MBUF_F_TX_VLAN &&
+	     iavf_vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2) ||
+	    ol_flags & RTE_MBUF_F_TX_QINQ) {
+		cd_type_cmd |= (uint64_t)CI_TX_CTX_DESC_IL2TAG2 << IAVF_TXD_CTX_QW1_CMD_SHIFT;
 	}
 
-skip_cksum:
-	*qw1 = rte_cpu_to_le_64((((uint64_t)command <<
-		IAVF_TXD_DATA_QW1_CMD_SHIFT) & IAVF_TXD_DATA_QW1_CMD_MASK) |
-		(((uint64_t)offset << IAVF_TXD_DATA_QW1_OFFSET_SHIFT) &
-		IAVF_TXD_DATA_QW1_OFFSET_MASK) |
-		((uint64_t)l2tag1 << IAVF_TXD_DATA_QW1_L2TAG1_SHIFT));
-}
-
-static inline void
-iavf_fill_data_desc(volatile struct ci_tx_desc *desc,
-	uint64_t desc_template,	uint16_t buffsz,
-	uint64_t buffer_addr)
-{
-	/* fill data descriptor qw1 from template */
-	desc->cmd_type_offset_bsz = desc_template;
-
-	/* set data buffer size */
-	desc->cmd_type_offset_bsz |=
-		(((uint64_t)buffsz << IAVF_TXD_DATA_QW1_TX_BUF_SZ_SHIFT) &
-		IAVF_TXD_DATA_QW1_TX_BUF_SZ_MASK);
-
-	desc->buffer_addr = rte_cpu_to_le_64(buffer_addr);
-	desc->cmd_type_offset_bsz = rte_cpu_to_le_64(desc->cmd_type_offset_bsz);
-}
-
+	/* LLDP switching field */
+	if (IAVF_CHECK_TX_LLDP(mbuf))
+		cd_type_cmd |= IAVF_TX_CTX_DESC_SWTCH_UPLINK << IAVF_TXD_CTX_QW1_CMD_SHIFT;
+
+	/* Tunneling field */
+	if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)
+		iavf_fill_ctx_desc_tunnelling_field((uint64_t *)&cd_tunneling_params, mbuf);
+
+	/* L2TAG2 field (VLAN) */
+	if (ol_flags & RTE_MBUF_F_TX_QINQ) {
+		cd_l2tag2 = iavf_vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2 ?
+			    mbuf->vlan_tci_outer : mbuf->vlan_tci;
+	} else if (ol_flags & RTE_MBUF_F_TX_VLAN &&
+		   iavf_vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2) {
+		cd_l2tag2 = mbuf->vlan_tci;
+	}
 
-static struct iavf_ipsec_crypto_pkt_metadata *
-iavf_ipsec_crypto_get_pkt_metadata(const struct ci_tx_queue *txq,
-		struct rte_mbuf *m)
-{
-	if (m->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD)
-		return RTE_MBUF_DYNFIELD(m, txq->ipsec_crypto_pkt_md_offset,
-				struct iavf_ipsec_crypto_pkt_metadata *);
+	/* Set outputs */
+	*qw0 = rte_cpu_to_le_64(cd_tunneling_params | ((uint64_t)cd_l2tag2 << 32));
+	*qw1 = rte_cpu_to_le_64(cd_type_cmd);
 
-	return NULL;
+	return 1; /* One context descriptor needed */
 }
 
 /* TX function */
@@ -2700,231 +2598,17 @@ uint16_t
 iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = tx_queue;
-	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
-	struct ci_tx_entry *txe_ring = txq->sw_ring;
-	struct ci_tx_entry *txe, *txn;
-	struct rte_mbuf *mb, *mb_seg;
-	uint64_t buf_dma_addr;
-	uint16_t desc_idx, desc_idx_last;
-	uint16_t idx;
-	uint16_t slen;
-
-
-	/* Check if the descriptor ring needs to be cleaned. */
-	if (txq->nb_tx_free < txq->tx_free_thresh)
-		ci_tx_xmit_cleanup(txq);
-
-	desc_idx = txq->tx_tail;
-	txe = &txe_ring[desc_idx];
-
-	for (idx = 0; idx < nb_pkts; idx++) {
-		volatile struct ci_tx_desc *ddesc;
-		struct iavf_ipsec_crypto_pkt_metadata *ipsec_md;
-
-		uint16_t nb_desc_ctx, nb_desc_ipsec;
-		uint16_t nb_desc_data, nb_desc_required;
-		uint16_t tlen = 0, ipseclen = 0;
-		uint64_t ddesc_template = 0;
-		uint64_t ddesc_cmd = 0;
-
-		mb = tx_pkts[idx];
 
-		RTE_MBUF_PREFETCH_TO_FREE(txe->mbuf);
-
-		/**
-		 * Get metadata for ipsec crypto from mbuf dynamic fields if
-		 * security offload is specified.
-		 */
-		ipsec_md = iavf_ipsec_crypto_get_pkt_metadata(txq, mb);
-
-		nb_desc_data = mb->nb_segs;
-		nb_desc_ctx =
-			iavf_calc_context_desc(mb, txq->vlan_flag);
-		nb_desc_ipsec = !!(mb->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD);
-
-		/**
-		 * The number of descriptors that must be allocated for
-		 * a packet equals to the number of the segments of that
-		 * packet plus the context and ipsec descriptors if needed.
-		 * Recalculate the needed tx descs when TSO enabled in case
-		 * the mbuf data size exceeds max data size that hw allows
-		 * per tx desc.
-		 */
-		if (mb->ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-			nb_desc_required = ci_calc_pkt_desc(mb) + nb_desc_ctx + nb_desc_ipsec;
-		else
-			nb_desc_required = nb_desc_data + nb_desc_ctx + nb_desc_ipsec;
-
-		desc_idx_last = (uint16_t)(desc_idx + nb_desc_required - 1);
-
-		/* wrap descriptor ring */
-		if (desc_idx_last >= txq->nb_tx_desc)
-			desc_idx_last =
-				(uint16_t)(desc_idx_last - txq->nb_tx_desc);
-
-		PMD_TX_LOG(DEBUG,
-			"port_id=%u queue_id=%u tx_first=%u tx_last=%u",
-			txq->port_id, txq->queue_id, desc_idx, desc_idx_last);
-
-		if (nb_desc_required > txq->nb_tx_free) {
-			if (ci_tx_xmit_cleanup(txq)) {
-				if (idx == 0)
-					return 0;
-				goto end_of_tx;
-			}
-			if (unlikely(nb_desc_required > txq->tx_rs_thresh)) {
-				while (nb_desc_required > txq->nb_tx_free) {
-					if (ci_tx_xmit_cleanup(txq)) {
-						if (idx == 0)
-							return 0;
-						goto end_of_tx;
-					}
-				}
-			}
-		}
-
-		iavf_build_data_desc_cmd_offset_fields(&ddesc_template, mb,
-			txq->vlan_flag);
-
-			/* Setup TX context descriptor if required */
-		if (nb_desc_ctx) {
-			volatile struct iavf_tx_context_desc *ctx_desc =
-				(volatile struct iavf_tx_context_desc *)
-					&txr[desc_idx];
-
-			/* clear QW0 or the previous writeback value
-			 * may impact next write
-			 */
-			*(volatile uint64_t *)ctx_desc = 0;
-
-			txn = &txe_ring[txe->next_id];
-			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
-
-			if (txe->mbuf) {
-				rte_pktmbuf_free_seg(txe->mbuf);
-				txe->mbuf = NULL;
-			}
-
-			iavf_fill_context_desc(ctx_desc, mb, ipsec_md, &tlen,
-				txq->vlan_flag);
-			IAVF_DUMP_TX_DESC(txq, ctx_desc, desc_idx);
-
-			txe->last_id = desc_idx_last;
-			desc_idx = txe->next_id;
-			txe = txn;
-		}
-
-		if (nb_desc_ipsec) {
-			volatile struct iavf_tx_ipsec_desc *ipsec_desc =
-				(volatile struct iavf_tx_ipsec_desc *)
-					&txr[desc_idx];
-
-			txn = &txe_ring[txe->next_id];
-			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
-
-			if (txe->mbuf) {
-				rte_pktmbuf_free_seg(txe->mbuf);
-				txe->mbuf = NULL;
-			}
-
-			iavf_fill_ipsec_desc(ipsec_desc, ipsec_md, &ipseclen);
-
-			IAVF_DUMP_TX_DESC(txq, ipsec_desc, desc_idx);
-
-			txe->last_id = desc_idx_last;
-			desc_idx = txe->next_id;
-			txe = txn;
-		}
-
-		mb_seg = mb;
-
-		do {
-			ddesc = (volatile struct ci_tx_desc *)
-					&txr[desc_idx];
-
-			txn = &txe_ring[txe->next_id];
-			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
-
-			if (txe->mbuf)
-				rte_pktmbuf_free_seg(txe->mbuf);
-
-			txe->mbuf = mb_seg;
-
-			if ((mb_seg->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD) &&
-					(mb_seg->ol_flags &
-						(RTE_MBUF_F_TX_TCP_SEG |
-						RTE_MBUF_F_TX_UDP_SEG))) {
-				slen = tlen + mb_seg->l2_len + mb_seg->l3_len +
-						mb_seg->outer_l3_len + ipseclen;
-				if (mb_seg->ol_flags & RTE_MBUF_F_TX_L4_MASK)
-					slen += mb_seg->l4_len;
-			} else {
-				slen = mb_seg->data_len;
-			}
-
-			buf_dma_addr = rte_mbuf_data_iova(mb_seg);
-			while ((mb_seg->ol_flags & (RTE_MBUF_F_TX_TCP_SEG |
-					RTE_MBUF_F_TX_UDP_SEG)) &&
-					unlikely(slen > CI_MAX_DATA_PER_TXD)) {
-				iavf_fill_data_desc(ddesc, ddesc_template,
-					CI_MAX_DATA_PER_TXD, buf_dma_addr);
-
-				IAVF_DUMP_TX_DESC(txq, ddesc, desc_idx);
-
-				buf_dma_addr += CI_MAX_DATA_PER_TXD;
-				slen -= CI_MAX_DATA_PER_TXD;
-
-				txe->last_id = desc_idx_last;
-				desc_idx = txe->next_id;
-				txe = txn;
-				ddesc = &txr[desc_idx];
-				txn = &txe_ring[txe->next_id];
-			}
-
-			iavf_fill_data_desc(ddesc, ddesc_template,
-					slen, buf_dma_addr);
-
-			IAVF_DUMP_TX_DESC(txq, ddesc, desc_idx);
-
-			txe->last_id = desc_idx_last;
-			desc_idx = txe->next_id;
-			txe = txn;
-			mb_seg = mb_seg->next;
-		} while (mb_seg);
-
-		/* The last packet data descriptor needs End Of Packet (EOP) */
-		ddesc_cmd = CI_TX_DESC_CMD_EOP;
-
-		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_desc_required);
-		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_desc_required);
-
-		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
-			PMD_TX_LOG(DEBUG, "Setting RS bit on TXD id="
-				   "%4u (port=%d queue=%d)",
-				   desc_idx_last, txq->port_id, txq->queue_id);
-
-			ddesc_cmd |= CI_TX_DESC_CMD_RS;
-
-			/* Update txq RS bit counters */
-			txq->nb_tx_used = 0;
-		}
-
-		ddesc->cmd_type_offset_bsz |= rte_cpu_to_le_64(ddesc_cmd <<
-				IAVF_TXD_DATA_QW1_CMD_SHIFT);
-
-		IAVF_DUMP_TX_DESC(txq, ddesc, desc_idx - 1);
-	}
-
-end_of_tx:
-	rte_wmb();
-
-	PMD_TX_LOG(DEBUG, "port_id=%u queue_id=%u tx_tail=%u nb_tx=%u",
-		   txq->port_id, txq->queue_id, desc_idx, idx);
-
-	IAVF_PCI_REG_WRITE_RELAXED(txq->qtx_tail, desc_idx);
-	txq->tx_tail = desc_idx;
+	const struct ci_ipsec_ops ipsec_ops = {
+		.get_ipsec_desc = iavf_get_ipsec_desc,
+		.calc_segment_len = iavf_calc_ipsec_segment_len,
+	};
 
-	return idx;
+	/* IAVF does not support timestamp queues, so pass NULL for ts_fns */
+	return ci_xmit_pkts(txq, tx_pkts, nb_pkts,
+			    (txq->vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1) ?
+				CI_VLAN_IN_L2TAG1 : CI_VLAN_IN_L2TAG2,
+			    iavf_get_context_desc, &ipsec_ops, NULL);
 }
 
 /* Check if the packet with vlan user priority is transmitted in the
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v3 17/36] net/i40e: document requirement for QinQ support
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
                     ` (15 preceding siblings ...)
  2026-01-30 11:41   ` [PATCH v3 16/36] net/iavf: use common scalar Tx function Bruce Richardson
@ 2026-01-30 11:41   ` Bruce Richardson
  2026-01-30 11:41   ` [PATCH v3 18/36] net/idpf: use common scalar Tx function Bruce Richardson
                     ` (18 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-30 11:41 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

In order to get multiple VLANs inserted in an outgoing packet with QinQ
offload the i40e driver needs to be set to double vlan mode. This is
done by using the VLAN_EXTEND Rx config flag. Add a code check for this
dependency and update the docs about it.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 doc/guides/nics/i40e.rst           | 18 ++++++++++++++++++
 drivers/net/intel/i40e/i40e_rxtx.c |  9 +++++++++
 2 files changed, 27 insertions(+)

diff --git a/doc/guides/nics/i40e.rst b/doc/guides/nics/i40e.rst
index 45dc083c94..cbfaddbdd8 100644
--- a/doc/guides/nics/i40e.rst
+++ b/doc/guides/nics/i40e.rst
@@ -245,6 +245,24 @@ Runtime Configuration
   * ``segment``: Check number of mbuf segments not exceed hw limitation.
   * ``offload``: Check any unsupported offload flag.
 
+QinQ Configuration
+~~~~~~~~~~~~~~~~~~
+
+When using QinQ TX offload (``RTE_ETH_TX_OFFLOAD_QINQ_INSERT``), you must also
+enable ``RTE_ETH_RX_OFFLOAD_VLAN_EXTEND`` to configure the hardware for double
+VLAN mode. Without this, only the inner VLAN tag will be inserted.
+
+Example::
+
+  struct rte_eth_conf port_conf = {
+      .rxmode = {
+          .offloads = RTE_ETH_RX_OFFLOAD_VLAN_EXTEND,
+      },
+      .txmode = {
+          .offloads = RTE_ETH_TX_OFFLOAD_QINQ_INSERT,
+      },
+  };
+
 Vector RX Pre-conditions
 ~~~~~~~~~~~~~~~~~~~~~~~~
 For Vector RX it is assumed that the number of descriptor rings will be a power
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 4e2071c024..49ec4dd8a1 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -2182,6 +2182,15 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	vsi = i40e_pf_get_vsi_by_qindex(pf, queue_idx);
 	if (!vsi)
 		return -EINVAL;
+
+	/* Check if QinQ TX offload requires VLAN extend mode */
+	if ((offloads & RTE_ETH_TX_OFFLOAD_QINQ_INSERT) &&
+			!(dev->data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_VLAN_EXTEND)) {
+		PMD_DRV_LOG(WARNING, "Port %u: QinQ TX offload is enabled but VLAN extend mode is not set. ",
+				dev->data->port_id);
+		PMD_DRV_LOG(WARNING, "Double VLAN insertion may not work correctly without RTE_ETH_RX_OFFLOAD_VLAN_EXTEND set in Rx configuration.");
+	}
+
 	q_offset = i40e_get_queue_offset_by_qindex(pf, queue_idx);
 	if (q_offset < 0)
 		return -EINVAL;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v3 18/36] net/idpf: use common scalar Tx function
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
                     ` (16 preceding siblings ...)
  2026-01-30 11:41   ` [PATCH v3 17/36] net/i40e: document requirement for QinQ support Bruce Richardson
@ 2026-01-30 11:41   ` Bruce Richardson
  2026-01-30 11:41   ` [PATCH v3 19/36] net/intel: avoid writing the final pkt descriptor twice Bruce Richardson
                     ` (17 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-30 11:41 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Jingjing Wu, Praveen Shetty

Update idpf driver to use the common scalar Tx function in single-queue
configuration.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/idpf/idpf_common_rxtx.c | 178 ++--------------------
 1 file changed, 10 insertions(+), 168 deletions(-)

diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index b34d545a0a..bca5f13c8e 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -8,7 +8,6 @@
 
 #include "idpf_common_rxtx.h"
 #include "idpf_common_device.h"
-#include "../common/rx.h"
 
 int idpf_timestamp_dynfield_offset = -1;
 uint64_t idpf_timestamp_dynflag;
@@ -848,9 +847,10 @@ idpf_calc_context_desc(uint64_t flags)
 /* set TSO context descriptor, returns 0 if no context needed, 1 if context set
  */
 static inline uint16_t
-idpf_set_tso_ctx(uint64_t ol_flags, struct rte_mbuf *mbuf,
-			union ci_tx_offload tx_offload,
-			uint64_t *qw0, uint64_t *qw1)
+idpf_set_tso_ctx(uint64_t ol_flags, const struct rte_mbuf *mbuf,
+		 const union ci_tx_offload *tx_offload,
+		 const struct ci_tx_queue *txq __rte_unused,
+		 uint64_t *qw0, uint64_t *qw1)
 {
 	uint16_t cmd_dtype = IDPF_TX_DESC_DTYPE_FLEX_TSO_CTX | IDPF_TX_FLEX_CTX_DESC_CMD_TSO;
 	uint16_t tso_segsz = mbuf->tso_segsz;
@@ -861,12 +861,12 @@ idpf_set_tso_ctx(uint64_t ol_flags, struct rte_mbuf *mbuf,
 		return 0;
 
 	/* TSO context descriptor setup */
-	if (tx_offload.l4_len == 0) {
+	if (tx_offload->l4_len == 0) {
 		TX_LOG(DEBUG, "L4 length set to 0");
 		return 0;
 	}
 
-	hdr_len = tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len;
+	hdr_len = tx_offload->l2_len + tx_offload->l3_len + tx_offload->l4_len;
 	tso_len = mbuf->pkt_len - hdr_len;
 
 	*qw0 = rte_cpu_to_le_32(tso_len & IDPF_TXD_FLEX_CTX_MSS_RT_M) |
@@ -933,7 +933,8 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		tx_offload.tso_segsz = tx_pkt->tso_segsz;
 		/* Calculate the number of context descriptors needed. */
 		uint64_t cd_qw0, cd_qw1;
-		nb_ctx = idpf_set_tso_ctx(ol_flags, tx_pkt, tx_offload, &cd_qw0, &cd_qw1);
+		nb_ctx = idpf_set_tso_ctx(ol_flags, tx_pkt, &tx_offload, txq,
+					  &cd_qw0, &cd_qw1);
 
 		/* Calculate the number of TX descriptors needed for
 		 * each packet. For TSO packets, use ci_calc_pkt_desc as
@@ -1339,167 +1340,8 @@ uint16_t
 idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			  uint16_t nb_pkts)
 {
-	volatile struct ci_tx_desc *txd;
-	volatile struct ci_tx_desc *txr;
-	union ci_tx_offload tx_offload = {0};
-	struct ci_tx_entry *txe, *txn;
-	struct ci_tx_entry *sw_ring;
-	struct ci_tx_queue *txq;
-	struct rte_mbuf *tx_pkt;
-	struct rte_mbuf *m_seg;
-	uint64_t buf_dma_addr;
-	uint32_t td_offset;
-	uint64_t ol_flags;
-	uint16_t tx_last;
-	uint16_t nb_used;
-	uint16_t nb_ctx;
-	uint16_t td_cmd;
-	uint16_t tx_id;
-	uint16_t nb_tx;
-	uint16_t slen;
-
-	nb_tx = 0;
-	txq = tx_queue;
-
-	if (unlikely(txq == NULL))
-		return nb_tx;
-
-	sw_ring = txq->sw_ring;
-	txr = txq->ci_tx_ring;
-	tx_id = txq->tx_tail;
-	txe = &sw_ring[tx_id];
-
-	/* Check if the descriptor ring needs to be cleaned. */
-	if (txq->nb_tx_free < txq->tx_free_thresh)
-		(void)ci_tx_xmit_cleanup(txq);
-
-	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
-		td_cmd = 0;
-		td_offset = 0;
-
-		tx_pkt = *tx_pkts++;
-		RTE_MBUF_PREFETCH_TO_FREE(txe->mbuf);
-
-		ol_flags = tx_pkt->ol_flags;
-		tx_offload.l2_len = tx_pkt->l2_len;
-		tx_offload.l3_len = tx_pkt->l3_len;
-		tx_offload.l4_len = tx_pkt->l4_len;
-		tx_offload.tso_segsz = tx_pkt->tso_segsz;
-		/* Calculate the number of context descriptors needed. */
-		uint64_t cd_qw0, cd_qw1;
-		nb_ctx = idpf_set_tso_ctx(ol_flags, tx_pkt, tx_offload, &cd_qw0, &cd_qw1);
-
-		/* The number of descriptors that must be allocated for
-		 * a packet. For TSO packets, use ci_calc_pkt_desc as
-		 * the mbuf data size might exceed max data size that hw allows
-		 * per tx desc.
-		 */
-		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
-		else
-			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
-		tx_last = (uint16_t)(tx_id + nb_used - 1);
-
-		/* Circular ring */
-		if (tx_last >= txq->nb_tx_desc)
-			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
-
-		TX_LOG(DEBUG, "port_id=%u queue_id=%u"
-		       " tx_first=%u tx_last=%u",
-		       txq->port_id, txq->queue_id, tx_id, tx_last);
-
-		if (nb_used > txq->nb_tx_free) {
-			if (ci_tx_xmit_cleanup(txq) != 0) {
-				if (nb_tx == 0)
-					return 0;
-				goto end_of_tx;
-			}
-			if (unlikely(nb_used > txq->tx_rs_thresh)) {
-				while (nb_used > txq->nb_tx_free) {
-					if (ci_tx_xmit_cleanup(txq) != 0) {
-						if (nb_tx == 0)
-							return 0;
-						goto end_of_tx;
-					}
-				}
-			}
-		}
-
-		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
-			td_cmd |= IDPF_TX_FLEX_DESC_CMD_CS_EN;
-
-		if (nb_ctx != 0) {
-			/* Setup TX context descriptor if required */
-			uint64_t *ctx_txd = RTE_CAST_PTR(uint64_t *, &txr[tx_id]);
-
-			txn = &sw_ring[txe->next_id];
-			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
-			if (txe->mbuf != NULL) {
-				rte_pktmbuf_free_seg(txe->mbuf);
-				txe->mbuf = NULL;
-			}
-
-			ctx_txd[0] = cd_qw0;
-			ctx_txd[1] = cd_qw1;
-
-			txe->last_id = tx_last;
-			tx_id = txe->next_id;
-			txe = txn;
-		}
-
-		m_seg = tx_pkt;
-		do {
-			txd = &txr[tx_id];
-			txn = &sw_ring[txe->next_id];
-
-			if (txe->mbuf != NULL)
-				rte_pktmbuf_free_seg(txe->mbuf);
-			txe->mbuf = m_seg;
-
-			/* Setup TX Descriptor */
-			slen = m_seg->data_len;
-			buf_dma_addr = rte_mbuf_data_iova(m_seg);
-			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
-				((uint64_t)td_cmd  << CI_TXD_QW1_CMD_S) |
-				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
-				((uint64_t)slen << CI_TXD_QW1_TX_BUF_SZ_S));
-
-			txe->last_id = tx_last;
-			tx_id = txe->next_id;
-			txe = txn;
-			m_seg = m_seg->next;
-		} while (m_seg);
-
-		/* The last packet data descriptor needs End Of Packet (EOP) */
-		td_cmd |= CI_TX_DESC_CMD_EOP;
-		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
-		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
-
-		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
-			TX_LOG(DEBUG, "Setting RS bit on TXD id="
-			       "%4u (port=%d queue=%d)",
-			       tx_last, txq->port_id, txq->queue_id);
-
-			td_cmd |= CI_TX_DESC_CMD_RS;
-
-			/* Update txq RS bit counters */
-			txq->nb_tx_used = 0;
-		}
-
-		txd->cmd_type_offset_bsz |= rte_cpu_to_le_16(td_cmd << CI_TXD_QW1_CMD_S);
-	}
-
-end_of_tx:
-	rte_wmb();
-
-	TX_LOG(DEBUG, "port_id=%u queue_id=%u tx_tail=%u nb_tx=%u",
-	       txq->port_id, txq->queue_id, tx_id, nb_tx);
-
-	IDPF_PCI_REG_WRITE(txq->qtx_tail, tx_id);
-	txq->tx_tail = tx_id;
-
-	return nb_tx;
+	return ci_xmit_pkts(tx_queue, tx_pkts, nb_pkts, CI_VLAN_IN_L2TAG1,
+			idpf_set_tso_ctx, NULL, NULL);
 }
 
 /* TX prep functions */
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v3 19/36] net/intel: avoid writing the final pkt descriptor twice
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
                     ` (17 preceding siblings ...)
  2026-01-30 11:41   ` [PATCH v3 18/36] net/idpf: use common scalar Tx function Bruce Richardson
@ 2026-01-30 11:41   ` Bruce Richardson
  2026-01-30 11:41   ` [PATCH v3 20/36] eal: add macro for marking assumed alignment Bruce Richardson
                     ` (16 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-30 11:41 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

In the scalar datapath, there is a loop to handle multi-segment, and
multi-descriptor packets on Tx. After that loop, the end-of-packet bit
was written to the descriptor separately, meaning that for each
single-descriptor packet there were two writes to the second quad-word -
basically 3 x 64-bit writes rather than just 2. Adjusting the code to
compute the EOP bit inside the loop saves that extra write per packet
and so improves performance.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar_fns.h | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index 5e301699be..bd8053f58c 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -377,6 +377,10 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 				txn = &sw_ring[txe->next_id];
 			}
 
+			/* fill the last descriptor with End of Packet (EOP) bit */
+			if (m_seg->next == NULL)
+				td_cmd |= CI_TX_DESC_CMD_EOP;
+
 			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
 			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
 				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
@@ -389,21 +393,17 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			txe = txn;
 			m_seg = m_seg->next;
 		} while (m_seg);
-
-		/* fill the last descriptor with End of Packet (EOP) bit */
-		td_cmd |= CI_TX_DESC_CMD_EOP;
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
 		/* set RS bit on the last descriptor of one packet */
 		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
-			td_cmd |= CI_TX_DESC_CMD_RS;
+			txd->cmd_type_offset_bsz |=
+					rte_cpu_to_le_64(CI_TX_DESC_CMD_RS << CI_TXD_QW1_CMD_S);
 
 			/* Update txq RS bit counters */
 			txq->nb_tx_used = 0;
 		}
-		txd->cmd_type_offset_bsz |=
-				rte_cpu_to_le_64(((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S);
 
 		if (ts_fns != NULL)
 			ts_id = ts_fns->write_ts_desc(txq, tx_pkt, tx_id, ts_id);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v3 20/36] eal: add macro for marking assumed alignment
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
                     ` (18 preceding siblings ...)
  2026-01-30 11:41   ` [PATCH v3 19/36] net/intel: avoid writing the final pkt descriptor twice Bruce Richardson
@ 2026-01-30 11:41   ` Bruce Richardson
  2026-01-30 11:41   ` [PATCH v3 21/36] net/intel: write descriptors using non-volatile pointers Bruce Richardson
                     ` (15 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-30 11:41 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

Provide a common DPDK macro for the gcc/clang builtin
__rte_assume_aligned to mark pointers as pointing to something with
known minimum alignment.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/eal/include/rte_common.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h
index 573bf4f2ce..51a2eaf8b4 100644
--- a/lib/eal/include/rte_common.h
+++ b/lib/eal/include/rte_common.h
@@ -121,6 +121,12 @@ extern "C" {
 #define __rte_aligned(a) __attribute__((__aligned__(a)))
 #endif
 
+#ifdef RTE_TOOLCHAIN_MSVC
+#define __rte_assume_aligned(ptr, align) (ptr)
+#else
+#define __rte_assume_aligned __builtin_assume_aligned
+#endif
+
 #ifdef RTE_ARCH_STRICT_ALIGN
 typedef uint64_t unaligned_uint64_t __rte_aligned(1);
 typedef uint32_t unaligned_uint32_t __rte_aligned(1);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v3 21/36] net/intel: write descriptors using non-volatile pointers
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
                     ` (19 preceding siblings ...)
  2026-01-30 11:41   ` [PATCH v3 20/36] eal: add macro for marking assumed alignment Bruce Richardson
@ 2026-01-30 11:41   ` Bruce Richardson
  2026-01-30 11:41   ` [PATCH v3 22/36] net/intel: remove unnecessary flag clearing Bruce Richardson
                     ` (14 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-30 11:41 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

Use a non-volatile uint64_t pointer to store to the descriptor ring.
This will allow the compiler to optionally merge the stores as it sees
best.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar_fns.h | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index bd8053f58c..ee93ce5811 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -179,6 +179,15 @@ struct ci_timesstamp_queue_fns {
 	write_ts_tail_t write_ts_tail;
 };
 
+static inline void
+write_txd(volatile void *txd, uint64_t qw0, uint64_t qw1)
+{
+	uint64_t *txd_qw = __rte_assume_aligned(RTE_CAST_PTR(void *, txd), 16);
+
+	txd_qw[0] = rte_cpu_to_le_64(qw0);
+	txd_qw[1] = rte_cpu_to_le_64(qw1);
+}
+
 static inline uint16_t
 ci_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
@@ -312,8 +321,7 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 				txe->mbuf = NULL;
 			}
 
-			ctx_txd[0] = cd_qw0;
-			ctx_txd[1] = cd_qw1;
+			write_txd(ctx_txd, cd_qw0, cd_qw1);
 
 			txe->last_id = tx_last;
 			tx_id = txe->next_id;
@@ -360,12 +368,12 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 
 			while ((ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) &&
 					unlikely(slen > CI_MAX_DATA_PER_TXD)) {
-				txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-				txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+				const uint64_t cmd_type_offset_bsz = CI_TX_DESC_DTYPE_DATA |
 					((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
 					((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
 					((uint64_t)CI_MAX_DATA_PER_TXD << CI_TXD_QW1_TX_BUF_SZ_S) |
-					((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
+					((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S);
+				write_txd(txd, buf_dma_addr, cmd_type_offset_bsz);
 
 				buf_dma_addr += CI_MAX_DATA_PER_TXD;
 				slen -= CI_MAX_DATA_PER_TXD;
@@ -381,12 +389,12 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			if (m_seg->next == NULL)
 				td_cmd |= CI_TX_DESC_CMD_EOP;
 
-			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+			const uint64_t cmd_type_offset_bsz = CI_TX_DESC_DTYPE_DATA |
 				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
 				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
 				((uint64_t)slen << CI_TXD_QW1_TX_BUF_SZ_S) |
-				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
+				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S);
+			write_txd(txd, buf_dma_addr, cmd_type_offset_bsz);
 
 			txe->last_id = tx_last;
 			tx_id = txe->next_id;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v3 22/36] net/intel: remove unnecessary flag clearing
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
                     ` (20 preceding siblings ...)
  2026-01-30 11:41   ` [PATCH v3 21/36] net/intel: write descriptors using non-volatile pointers Bruce Richardson
@ 2026-01-30 11:41   ` Bruce Richardson
  2026-01-30 11:41   ` [PATCH v3 23/36] net/intel: mark mid-burst ring cleanup as unlikely Bruce Richardson
                     ` (13 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-30 11:41 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

When cleaning the Tx ring, there is no need to zero out the done flag
from the completed entry. That flag will be automatically cleared when
the descriptor is next written. This gives a small performance benefit.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar_fns.h | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index ee93ce5811..90dc6ae423 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -51,13 +51,6 @@ ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
 	else
 		nb_tx_to_clean = (uint16_t)(desc_to_clean_to - last_desc_cleaned);
 
-	/* The last descriptor to clean is done, so that means all the
-	 * descriptors from the last descriptor that was cleaned
-	 * up to the last descriptor with the RS bit set
-	 * are done. Only reset the threshold descriptor.
-	 */
-	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
-
 	/* Update the txq to reflect the last descriptor that was cleaned */
 	txq->last_desc_cleaned = desc_to_clean_to;
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v3 23/36] net/intel: mark mid-burst ring cleanup as unlikely
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
                     ` (21 preceding siblings ...)
  2026-01-30 11:41   ` [PATCH v3 22/36] net/intel: remove unnecessary flag clearing Bruce Richardson
@ 2026-01-30 11:41   ` Bruce Richardson
  2026-01-30 11:41   ` [PATCH v3 24/36] net/intel: add special handling for single desc packets Bruce Richardson
                     ` (12 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-30 11:41 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

It should rarely be the case that we need to cleanup the descriptor ring
mid-burst, so mark as unlikely to help performance.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar_fns.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index 90dc6ae423..488130c813 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -271,7 +271,7 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 		if (tx_last >= txq->nb_tx_desc)
 			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
 
-		if (nb_used > txq->nb_tx_free) {
+		if (unlikely(nb_used > txq->nb_tx_free)) {
 			if (ci_tx_xmit_cleanup(txq) != 0) {
 				if (nb_tx == 0)
 					return 0;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v3 24/36] net/intel: add special handling for single desc packets
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
                     ` (22 preceding siblings ...)
  2026-01-30 11:41   ` [PATCH v3 23/36] net/intel: mark mid-burst ring cleanup as unlikely Bruce Richardson
@ 2026-01-30 11:41   ` Bruce Richardson
  2026-01-30 11:41   ` [PATCH v3 25/36] net/intel: use separate array for desc status tracking Bruce Richardson
                     ` (11 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-30 11:41 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

Within the scalar packets, add a shortcut path for packets that don't
use TSO and have only a single data descriptor.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar_fns.h | 26 ++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index 488130c813..8c4086296e 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -303,6 +303,31 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			ci_txd_enable_checksum(ol_flags, &td_cmd,
 						&td_offset, tx_offload);
 
+		/* special case for single descriptor packet, without TSO offload */
+		if (nb_used == 1 &&
+				(ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) == 0) {
+			txd = &ci_tx_ring[tx_id];
+			tx_id = txe->next_id;
+
+			if (txe->mbuf)
+				rte_pktmbuf_free_seg(txe->mbuf);
+			*txe = (struct ci_tx_entry){
+				.mbuf = tx_pkt, .last_id = tx_last, .next_id = tx_id
+			};
+
+			/* Setup TX Descriptor */
+			td_cmd |= CI_TX_DESC_CMD_EOP;
+			const uint64_t cmd_type_offset_bsz = CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
+				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+				((uint64_t)tx_pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S) |
+				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S);
+			write_txd(txd, rte_mbuf_data_iova(tx_pkt), cmd_type_offset_bsz);
+
+			txe = &sw_ring[tx_id];
+			goto end_pkt;
+		}
+
 		if (nb_ctx) {
 			/* Setup TX context descriptor if required */
 			uint64_t *ctx_txd = RTE_CAST_PTR(uint64_t *, &ci_tx_ring[tx_id]);
@@ -394,6 +419,7 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			txe = txn;
 			m_seg = m_seg->next;
 		} while (m_seg);
+end_pkt:
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v3 25/36] net/intel: use separate array for desc status tracking
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
                     ` (23 preceding siblings ...)
  2026-01-30 11:41   ` [PATCH v3 24/36] net/intel: add special handling for single desc packets Bruce Richardson
@ 2026-01-30 11:41   ` Bruce Richardson
  2026-01-30 11:41   ` [PATCH v3 26/36] net/ixgbe: " Bruce Richardson
                     ` (10 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-30 11:41 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Praveen Shetty, Vladimir Medvedkin,
	Anatoly Burakov, Jingjing Wu

Rather than writing a last_id for each individual descriptor, we can
write one only for places where the "report status" (RS) bit is set,
i.e. the descriptors which will be written back when done. The method
used for marking what descriptors are free is also changed in the
process, even if the last descriptor with the "done" bits set is past
the expected point, we only track up to the expected point, and leave
the rest to be counted as freed next time. This means that we always
have the RS/DD bits set at fixed intervals, and we always track free
slots in units of the same tx_free_thresh intervals.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h             |  4 ++
 drivers/net/intel/common/tx_scalar_fns.h  | 62 ++++++++++-------------
 drivers/net/intel/cpfl/cpfl_rxtx.c        | 17 +++++++
 drivers/net/intel/i40e/i40e_rxtx.c        | 20 ++++++++
 drivers/net/intel/iavf/iavf_rxtx.c        | 19 +++++++
 drivers/net/intel/ice/ice_rxtx.c          | 20 ++++++++
 drivers/net/intel/idpf/idpf_common_rxtx.c |  7 +++
 drivers/net/intel/idpf/idpf_rxtx.c        | 13 +++++
 8 files changed, 128 insertions(+), 34 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index faa9bb9559..a1372b145f 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -127,6 +127,8 @@ struct ci_tx_queue {
 		struct ci_tx_entry *sw_ring; /* virtual address of SW ring */
 		struct ci_tx_entry_vec *sw_ring_vec;
 	};
+	/* Scalar TX path: Array tracking last_id at each RS threshold boundary */
+	uint16_t *rs_last_id;
 	uint16_t nb_tx_desc;           /* number of TX descriptors */
 	uint16_t tx_tail; /* current value of tail register */
 	uint16_t nb_tx_used; /* number of TX desc used since RS bit set */
@@ -140,6 +142,8 @@ struct ci_tx_queue {
 	uint16_t tx_free_thresh;
 	/* Number of TX descriptors to use before RS bit is set. */
 	uint16_t tx_rs_thresh;
+	/* Scalar TX path: log2 of tx_rs_thresh for efficient bit operations */
+	uint8_t log2_rs_thresh;
 	uint16_t port_id;  /* Device port identifier. */
 	uint16_t queue_id; /* TX queue index. */
 	uint16_t reg_idx;
diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index 8c4086296e..f0f89e5e9e 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -23,37 +23,25 @@
 static __rte_always_inline int
 ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
 {
-	struct ci_tx_entry *sw_ring = txq->sw_ring;
 	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
-	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
-	uint16_t nb_tx_desc = txq->nb_tx_desc;
-	uint16_t desc_to_clean_to;
-	uint16_t nb_tx_to_clean;
+	const uint16_t last_desc_cleaned = txq->last_desc_cleaned;
+	const uint16_t nb_tx_desc = txq->nb_tx_desc;
 
-	/* Determine the last descriptor needing to be cleaned */
-	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
-	if (desc_to_clean_to >= nb_tx_desc)
-		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
-
-	/* Check to make sure the last descriptor to clean is done */
-	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
+	const uint16_t rs_idx = (last_desc_cleaned == nb_tx_desc - 1) ?
+			0 :
+			(last_desc_cleaned + 1) >> txq->log2_rs_thresh;
+	uint16_t desc_to_clean_to = (rs_idx << txq->log2_rs_thresh) + (txq->tx_rs_thresh - 1);
 
 	/* Check if descriptor is done - all drivers use 0xF as done value in bits 3:0 */
-	if ((txd[desc_to_clean_to].cmd_type_offset_bsz & rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
-			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE)) {
+	if ((txd[txq->rs_last_id[rs_idx]].cmd_type_offset_bsz &
+			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
+				rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
 		/* Descriptor not yet processed by hardware */
 		return -1;
-	}
-
-	/* Figure out how many descriptors will be cleaned */
-	if (last_desc_cleaned > desc_to_clean_to)
-		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) + desc_to_clean_to);
-	else
-		nb_tx_to_clean = (uint16_t)(desc_to_clean_to - last_desc_cleaned);
 
 	/* Update the txq to reflect the last descriptor that was cleaned */
 	txq->last_desc_cleaned = desc_to_clean_to;
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
+	txq->nb_tx_free += txq->tx_rs_thresh;
 
 	return 0;
 }
@@ -228,6 +216,7 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 		uint16_t nb_ipsec = 0;
 		uint64_t ipsec_qw0 = 0, ipsec_qw1 = 0;
 		uint64_t cd_qw0, cd_qw1;
+		uint16_t pkt_rs_idx;
 		tx_pkt = *tx_pkts++;
 
 		ol_flags = tx_pkt->ol_flags;
@@ -271,6 +260,9 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 		if (tx_last >= txq->nb_tx_desc)
 			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
 
+		/* Track the RS threshold bucket at packet start */
+		pkt_rs_idx = (uint16_t)(tx_id >> txq->log2_rs_thresh);
+
 		if (unlikely(nb_used > txq->nb_tx_free)) {
 			if (ci_tx_xmit_cleanup(txq) != 0) {
 				if (nb_tx == 0)
@@ -311,10 +303,7 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 
 			if (txe->mbuf)
 				rte_pktmbuf_free_seg(txe->mbuf);
-			*txe = (struct ci_tx_entry){
-				.mbuf = tx_pkt, .last_id = tx_last, .next_id = tx_id
-			};
-
+			txe->mbuf = tx_pkt;
 			/* Setup TX Descriptor */
 			td_cmd |= CI_TX_DESC_CMD_EOP;
 			const uint64_t cmd_type_offset_bsz = CI_TX_DESC_DTYPE_DATA |
@@ -341,7 +330,6 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 
 			write_txd(ctx_txd, cd_qw0, cd_qw1);
 
-			txe->last_id = tx_last;
 			tx_id = txe->next_id;
 			txe = txn;
 		}
@@ -360,7 +348,6 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			ipsec_txd[0] = ipsec_qw0;
 			ipsec_txd[1] = ipsec_qw1;
 
-			txe->last_id = tx_last;
 			tx_id = txe->next_id;
 			txe = txn;
 		}
@@ -396,7 +383,6 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 				buf_dma_addr += CI_MAX_DATA_PER_TXD;
 				slen -= CI_MAX_DATA_PER_TXD;
 
-				txe->last_id = tx_last;
 				tx_id = txe->next_id;
 				txe = txn;
 				txd = &ci_tx_ring[tx_id];
@@ -414,7 +400,6 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S);
 			write_txd(txd, buf_dma_addr, cmd_type_offset_bsz);
 
-			txe->last_id = tx_last;
 			tx_id = txe->next_id;
 			txe = txn;
 			m_seg = m_seg->next;
@@ -423,13 +408,22 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
-		/* set RS bit on the last descriptor of one packet */
-		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
+		/* Check if packet crosses into a new RS threshold bucket.
+		 * The RS bit is set on the last descriptor when we move from one bucket to another.
+		 * For example, with tx_rs_thresh=32 and a 5-descriptor packet using slots 30-34:
+		 *   - pkt_rs_idx = 30 >> 5 = 0 (started in bucket 0)
+		 *   - tx_last = 34, so 35 >> 5 = 1 (next packet is in bucket 1)
+		 *   - Since 0 != 1, set RS bit on descriptor 34, and record rs_last_id[0] = 34
+		 */
+		uint16_t next_rs_idx = ((tx_last + 1) >> txq->log2_rs_thresh);
+
+		if (next_rs_idx != pkt_rs_idx) {
+			/* Packet crossed into a new bucket - set RS bit on last descriptor */
 			txd->cmd_type_offset_bsz |=
 					rte_cpu_to_le_64(CI_TX_DESC_CMD_RS << CI_TXD_QW1_CMD_S);
 
-			/* Update txq RS bit counters */
-			txq->nb_tx_used = 0;
+			/* Record the last descriptor ID for the bucket we're leaving */
+			txq->rs_last_id[pkt_rs_idx] = tx_last;
 		}
 
 		if (ts_fns != NULL)
diff --git a/drivers/net/intel/cpfl/cpfl_rxtx.c b/drivers/net/intel/cpfl/cpfl_rxtx.c
index bc5bec65f0..e7a98ed4f6 100644
--- a/drivers/net/intel/cpfl/cpfl_rxtx.c
+++ b/drivers/net/intel/cpfl/cpfl_rxtx.c
@@ -5,6 +5,7 @@
 #include <ethdev_driver.h>
 #include <rte_net.h>
 #include <rte_vect.h>
+#include <rte_bitops.h>
 
 #include "cpfl_ethdev.h"
 #include "cpfl_rxtx.h"
@@ -330,6 +331,7 @@ cpfl_tx_queue_release(void *txq)
 
 	ci_txq_release_all_mbufs(q, q->vector_tx);
 	rte_free(q->sw_ring);
+	rte_free(q->rs_last_id);
 	rte_memzone_free(q->mz);
 	rte_free(cpfl_txq);
 }
@@ -572,6 +574,7 @@ cpfl_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 
 	txq->nb_tx_desc = nb_desc;
 	txq->tx_rs_thresh = tx_rs_thresh;
+	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
 	txq->tx_free_thresh = tx_free_thresh;
 	txq->queue_id = vport->chunks_info.tx_start_qid + queue_idx;
 	txq->port_id = dev->data->port_id;
@@ -605,6 +608,17 @@ cpfl_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 		goto err_sw_ring_alloc;
 	}
 
+	/* Allocate RS last_id tracking array */
+	uint16_t num_rs_buckets = nb_desc / tx_rs_thresh;
+	txq->rs_last_id = rte_zmalloc_socket("cpfl tx rs_last_id",
+			sizeof(txq->rs_last_id[0]) * num_rs_buckets,
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (txq->rs_last_id == NULL) {
+		PMD_INIT_LOG(ERR, "Failed to allocate memory for RS last_id array");
+		ret = -ENOMEM;
+		goto err_rs_last_id_alloc;
+	}
+
 	if (!is_splitq) {
 		txq->ci_tx_ring = mz->addr;
 		idpf_qc_single_tx_queue_reset(txq);
@@ -628,6 +642,9 @@ cpfl_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 	return 0;
 
 err_complq_setup:
+	rte_free(txq->rs_last_id);
+err_rs_last_id_alloc:
+	rte_free(txq->sw_ring);
 err_sw_ring_alloc:
 	cpfl_dma_zone_release(mz);
 err_mz_reserve:
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 49ec4dd8a1..4fcc5c2f54 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -24,6 +24,7 @@
 #include <rte_ip.h>
 #include <rte_net.h>
 #include <rte_vect.h>
+#include <rte_bitops.h>
 
 #include "i40e_logs.h"
 #include "base/i40e_prototype.h"
@@ -2280,6 +2281,13 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 			     (int)queue_idx);
 		return I40E_ERR_PARAM;
 	}
+	if (!rte_is_power_of_2(tx_rs_thresh)) {
+		PMD_INIT_LOG(ERR, "tx_rs_thresh must be a power of 2. (tx_rs_thresh=%u port=%d queue=%d)",
+			     (unsigned int)tx_rs_thresh,
+			     (int)dev->data->port_id,
+			     (int)queue_idx);
+		return I40E_ERR_PARAM;
+	}
 	if ((tx_rs_thresh > 1) && (tx_conf->tx_thresh.wthresh != 0)) {
 		PMD_INIT_LOG(ERR, "TX WTHRESH must be set to 0 if "
 			     "tx_rs_thresh is greater than 1. "
@@ -2321,6 +2329,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	txq->mz = tz;
 	txq->nb_tx_desc = nb_desc;
 	txq->tx_rs_thresh = tx_rs_thresh;
+	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
 	txq->tx_free_thresh = tx_free_thresh;
 	txq->queue_id = queue_idx;
 	txq->reg_idx = reg_idx;
@@ -2346,6 +2355,16 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 		return -ENOMEM;
 	}
 
+	/* Allocate RS last_id tracking array */
+	uint16_t num_rs_buckets = nb_desc / tx_rs_thresh;
+	txq->rs_last_id = rte_zmalloc_socket(NULL, sizeof(txq->rs_last_id[0]) * num_rs_buckets,
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (txq->rs_last_id == NULL) {
+		i40e_tx_queue_release(txq);
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for RS last_id array");
+		return -ENOMEM;
+	}
+
 	i40e_reset_tx_queue(txq);
 	txq->q_set = TRUE;
 
@@ -2391,6 +2410,7 @@ i40e_tx_queue_release(void *txq)
 
 	ci_txq_release_all_mbufs(q, false);
 	rte_free(q->sw_ring);
+	rte_free(q->rs_last_id);
 	rte_memzone_free(q->mz);
 	rte_free(q);
 }
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index 2ea00e1975..e7187f713d 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -25,6 +25,7 @@
 #include <rte_ip.h>
 #include <rte_net.h>
 #include <rte_vect.h>
+#include <rte_bitops.h>
 #include <rte_vxlan.h>
 #include <rte_gtp.h>
 #include <rte_geneve.h>
@@ -194,6 +195,11 @@ check_tx_thresh(uint16_t nb_desc, uint16_t tx_rs_thresh,
 			     tx_rs_thresh, nb_desc);
 		return -EINVAL;
 	}
+	if (!rte_is_power_of_2(tx_rs_thresh)) {
+		PMD_INIT_LOG(ERR, "tx_rs_thresh must be a power of 2. (tx_rs_thresh=%u)",
+			     tx_rs_thresh);
+		return -EINVAL;
+	}
 
 	return 0;
 }
@@ -801,6 +807,7 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
 
 	txq->nb_tx_desc = nb_desc;
 	txq->tx_rs_thresh = tx_rs_thresh;
+	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
 	txq->tx_free_thresh = tx_free_thresh;
 	txq->queue_id = queue_idx;
 	txq->port_id = dev->data->port_id;
@@ -826,6 +833,17 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
 		return -ENOMEM;
 	}
 
+	/* Allocate RS last_id tracking array */
+	uint16_t num_rs_buckets = nb_desc / tx_rs_thresh;
+	txq->rs_last_id = rte_zmalloc_socket(NULL, sizeof(txq->rs_last_id[0]) * num_rs_buckets,
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (txq->rs_last_id == NULL) {
+		PMD_INIT_LOG(ERR, "Failed to allocate memory for RS last_id array");
+		rte_free(txq->sw_ring);
+		rte_free(txq);
+		return -ENOMEM;
+	}
+
 	/* Allocate TX hardware ring descriptors. */
 	ring_size = sizeof(struct ci_tx_desc) * IAVF_MAX_RING_DESC;
 	ring_size = RTE_ALIGN(ring_size, IAVF_DMA_MEM_ALIGN);
@@ -1050,6 +1068,7 @@ iavf_dev_tx_queue_release(struct rte_eth_dev *dev, uint16_t qid)
 
 	ci_txq_release_all_mbufs(q, q->use_ctx);
 	rte_free(q->sw_ring);
+	rte_free(q->rs_last_id);
 	rte_memzone_free(q->mz);
 	rte_free(q);
 }
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 79b021a58f..32cb938c84 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -5,6 +5,7 @@
 #include <ethdev_driver.h>
 #include <rte_net.h>
 #include <rte_vect.h>
+#include <rte_bitops.h>
 
 #include "ice_rxtx.h"
 #include "ice_rxtx_vec_common.h"
@@ -1589,6 +1590,13 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
 			     (int)queue_idx);
 		return -EINVAL;
 	}
+	if (!rte_is_power_of_2(tx_rs_thresh)) {
+		PMD_INIT_LOG(ERR, "tx_rs_thresh must be a power of 2. (tx_rs_thresh=%u port=%d queue=%d)",
+			     (unsigned int)tx_rs_thresh,
+			     (int)dev->data->port_id,
+			     (int)queue_idx);
+		return -EINVAL;
+	}
 	if (tx_rs_thresh > 1 && tx_conf->tx_thresh.wthresh != 0) {
 		PMD_INIT_LOG(ERR, "TX WTHRESH must be set to 0 if "
 			     "tx_rs_thresh is greater than 1. "
@@ -1631,6 +1639,7 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
 	txq->mz = tz;
 	txq->nb_tx_desc = nb_desc;
 	txq->tx_rs_thresh = tx_rs_thresh;
+	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
 	txq->tx_free_thresh = tx_free_thresh;
 	txq->queue_id = queue_idx;
 
@@ -1657,6 +1666,16 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
 		return -ENOMEM;
 	}
 
+	/* Allocate RS last_id tracking array */
+	uint16_t num_rs_buckets = nb_desc / tx_rs_thresh;
+	txq->rs_last_id = rte_zmalloc_socket(NULL, sizeof(txq->rs_last_id[0]) * num_rs_buckets,
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (txq->rs_last_id == NULL) {
+		ice_tx_queue_release(txq);
+		PMD_INIT_LOG(ERR, "Failed to allocate memory for RS last_id array");
+		return -ENOMEM;
+	}
+
 	if (vsi->type == ICE_VSI_PF && (offloads & RTE_ETH_TX_OFFLOAD_SEND_ON_TIMESTAMP)) {
 		if (hw->phy_model != ICE_PHY_E830) {
 			ice_tx_queue_release(txq);
@@ -1729,6 +1748,7 @@ ice_tx_queue_release(void *txq)
 
 	ci_txq_release_all_mbufs(q, false);
 	rte_free(q->sw_ring);
+	rte_free(q->rs_last_id);
 	if (q->tsq) {
 		rte_memzone_free(q->tsq->ts_mz);
 		rte_free(q->tsq);
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index bca5f13c8e..8859bcca86 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -5,6 +5,7 @@
 #include <eal_export.h>
 #include <rte_mbuf_dyn.h>
 #include <rte_errno.h>
+#include <rte_bitops.h>
 
 #include "idpf_common_rxtx.h"
 #include "idpf_common_device.h"
@@ -73,6 +74,11 @@ idpf_qc_tx_thresh_check(uint16_t nb_desc, uint16_t tx_rs_thresh,
 			tx_rs_thresh, nb_desc);
 		return -EINVAL;
 	}
+	if (!rte_is_power_of_2(tx_rs_thresh)) {
+		DRV_LOG(ERR, "tx_rs_thresh must be a power of 2. (tx_rs_thresh=%u)",
+			tx_rs_thresh);
+		return -EINVAL;
+	}
 
 	return 0;
 }
@@ -333,6 +339,7 @@ idpf_qc_tx_queue_release(void *txq)
 	}
 
 	ci_txq_release_all_mbufs(q, false);
+	rte_free(q->rs_last_id);
 	rte_free(q->sw_ring);
 	rte_memzone_free(q->mz);
 	rte_free(q);
diff --git a/drivers/net/intel/idpf/idpf_rxtx.c b/drivers/net/intel/idpf/idpf_rxtx.c
index 0de54d9305..9420200f6d 100644
--- a/drivers/net/intel/idpf/idpf_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_rxtx.c
@@ -447,6 +447,7 @@ idpf_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 
 	txq->nb_tx_desc = nb_desc;
 	txq->tx_rs_thresh = tx_rs_thresh;
+	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
 	txq->tx_free_thresh = tx_free_thresh;
 	txq->queue_id = vport->chunks_info.tx_start_qid + queue_idx;
 	txq->port_id = dev->data->port_id;
@@ -480,6 +481,15 @@ idpf_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 		goto err_sw_ring_alloc;
 	}
 
+	txq->rs_last_id = rte_zmalloc_socket("idpf tx rs_last_id",
+			sizeof(txq->rs_last_id[0]) * (nb_desc >> txq->log2_rs_thresh),
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (txq->rs_last_id == NULL) {
+		PMD_INIT_LOG(ERR, "Failed to allocate memory for TX RS tracking");
+		ret = -ENOMEM;
+		goto err_rs_last_id_alloc;
+	}
+
 	if (!is_splitq) {
 		txq->ci_tx_ring = mz->addr;
 		idpf_qc_single_tx_queue_reset(txq);
@@ -502,6 +512,9 @@ idpf_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 	return 0;
 
 err_complq_setup:
+	rte_free(txq->rs_last_id);
+err_rs_last_id_alloc:
+	rte_free(txq->sw_ring);
 err_sw_ring_alloc:
 	idpf_dma_zone_release(mz);
 err_mz_reserve:
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v3 26/36] net/ixgbe: use separate array for desc status tracking
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
                     ` (24 preceding siblings ...)
  2026-01-30 11:41   ` [PATCH v3 25/36] net/intel: use separate array for desc status tracking Bruce Richardson
@ 2026-01-30 11:41   ` Bruce Richardson
  2026-01-30 11:41   ` [PATCH v3 27/36] net/intel: drop unused Tx queue used count Bruce Richardson
                     ` (9 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-30 11:41 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov, Vladimir Medvedkin

Due to significant differences in the ixgbe transmit descriptors, the
ixgbe driver does not use the common scalar Tx functionality. Update the
driver directly so its use of the rs_last_id array matches that of the
common Tx code.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/ixgbe/ixgbe_rxtx.c | 86 +++++++++++++++-------------
 1 file changed, 47 insertions(+), 39 deletions(-)

diff --git a/drivers/net/intel/ixgbe/ixgbe_rxtx.c b/drivers/net/intel/ixgbe/ixgbe_rxtx.c
index 0af04c9b0d..3e37ccc50d 100644
--- a/drivers/net/intel/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/intel/ixgbe/ixgbe_rxtx.c
@@ -43,6 +43,7 @@
 #include <rte_ip.h>
 #include <rte_net.h>
 #include <rte_vect.h>
+#include <rte_bitops.h>
 
 #include "ixgbe_logs.h"
 #include "base/ixgbe_api.h"
@@ -571,57 +572,35 @@ tx_desc_ol_flags_to_cmdtype(uint64_t ol_flags)
 static inline int
 ixgbe_xmit_cleanup(struct ci_tx_queue *txq)
 {
-	struct ci_tx_entry *sw_ring = txq->sw_ring;
 	volatile union ixgbe_adv_tx_desc *txr = txq->ixgbe_tx_ring;
-	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
-	uint16_t nb_tx_desc = txq->nb_tx_desc;
-	uint16_t desc_to_clean_to;
-	uint16_t nb_tx_to_clean;
-	uint32_t status;
+	const uint16_t last_desc_cleaned = txq->last_desc_cleaned;
+	const uint16_t nb_tx_desc = txq->nb_tx_desc;
 
-	/* Determine the last descriptor needing to be cleaned */
-	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
-	if (desc_to_clean_to >= nb_tx_desc)
-		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
+	const uint16_t rs_idx = (last_desc_cleaned == nb_tx_desc - 1) ?
+			0 :
+			(last_desc_cleaned + 1) >> txq->log2_rs_thresh;
+	uint16_t desc_to_clean_to = (rs_idx << txq->log2_rs_thresh) + (txq->tx_rs_thresh - 1);
 
-	/* Check to make sure the last descriptor to clean is done */
-	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-	status = txr[desc_to_clean_to].wb.status;
+	uint32_t status = txr[txq->rs_last_id[rs_idx]].wb.status;
 	if (!(status & rte_cpu_to_le_32(IXGBE_TXD_STAT_DD))) {
 		PMD_TX_LOG(DEBUG,
 			   "TX descriptor %4u is not done"
 			   "(port=%d queue=%d)",
-			   desc_to_clean_to,
+			   txq->rs_last_id[rs_idx],
 			   txq->port_id, txq->queue_id);
 		/* Failed to clean any descriptors, better luck next time */
 		return -(1);
 	}
 
-	/* Figure out how many descriptors will be cleaned */
-	if (last_desc_cleaned > desc_to_clean_to)
-		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) +
-							desc_to_clean_to);
-	else
-		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
-						last_desc_cleaned);
-
 	PMD_TX_LOG(DEBUG,
 		   "Cleaning %4u TX descriptors: %4u to %4u "
 		   "(port=%d queue=%d)",
-		   nb_tx_to_clean, last_desc_cleaned, desc_to_clean_to,
+		   txq->tx_rs_thresh, last_desc_cleaned, desc_to_clean_to,
 		   txq->port_id, txq->queue_id);
 
-	/*
-	 * The last descriptor to clean is done, so that means all the
-	 * descriptors from the last descriptor that was cleaned
-	 * up to the last descriptor with the RS bit set
-	 * are done. Only reset the threshold descriptor.
-	 */
-	txr[desc_to_clean_to].wb.status = 0;
-
 	/* Update the txq to reflect the last descriptor that was cleaned */
 	txq->last_desc_cleaned = desc_to_clean_to;
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
+	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + txq->tx_rs_thresh);
 
 	/* No Error */
 	return 0;
@@ -749,6 +728,9 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		if (tx_last >= txq->nb_tx_desc)
 			tx_last = (uint16_t) (tx_last - txq->nb_tx_desc);
 
+		/* Track the RS threshold bucket at packet start */
+		uint16_t pkt_rs_idx = (uint16_t)(tx_id >> txq->log2_rs_thresh);
+
 		PMD_TX_LOG(DEBUG, "port_id=%u queue_id=%u pktlen=%u"
 			   " tx_first=%u tx_last=%u",
 			   (unsigned) txq->port_id,
@@ -876,7 +858,6 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 					tx_offload,
 					rte_security_dynfield(tx_pkt));
 
-				txe->last_id = tx_last;
 				tx_id = txe->next_id;
 				txe = txn;
 			}
@@ -922,7 +903,6 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 				rte_cpu_to_le_32(cmd_type_len | slen);
 			txd->read.olinfo_status =
 				rte_cpu_to_le_32(olinfo_status);
-			txe->last_id = tx_last;
 			tx_id = txe->next_id;
 			txe = txn;
 			m_seg = m_seg->next;
@@ -935,8 +915,18 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
-		/* Set RS bit only on threshold packets' last descriptor */
-		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
+		/*
+		 * Check if packet crosses into a new RS threshold bucket.
+		 * The RS bit is set on the last descriptor when we move from one bucket to another.
+		 * For example, with tx_rs_thresh=32 and a 5-descriptor packet using slots 30-34:
+		 *   - pkt_rs_idx = 30 >> 5 = 0 (started in bucket 0)
+		 *   - tx_last = 34, so 35 >> 5 = 1 (next packet is in bucket 1)
+		 *   - Since 0 != 1, set RS bit on descriptor 34, and record rs_last_id[0] = 34
+		 */
+		uint16_t next_rs_idx = ((tx_last + 1) >> txq->log2_rs_thresh);
+
+		if (next_rs_idx != pkt_rs_idx) {
+			/* Packet crossed into a new bucket - set RS bit on last descriptor */
 			PMD_TX_LOG(DEBUG,
 				   "Setting RS bit on TXD id="
 				   "%4u (port=%d queue=%d)",
@@ -944,9 +934,8 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 			cmd_type_len |= IXGBE_TXD_CMD_RS;
 
-			/* Update txq RS bit counters */
-			txq->nb_tx_used = 0;
-			txp = NULL;
+			/* Record the last descriptor ID for the bucket we're leaving */
+			txq->rs_last_id[pkt_rs_idx] = tx_last;
 		} else
 			txp = txd;
 
@@ -2521,6 +2510,7 @@ ixgbe_tx_queue_release(struct ci_tx_queue *txq)
 	if (txq != NULL && txq->ops != NULL) {
 		ci_txq_release_all_mbufs(txq, false);
 		txq->ops->free_swring(txq);
+		rte_free(txq->rs_last_id);
 		rte_memzone_free(txq->mz);
 		rte_free(txq);
 	}
@@ -2825,6 +2815,13 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev,
 			     (int)dev->data->port_id, (int)queue_idx);
 		return -(EINVAL);
 	}
+	if (!rte_is_power_of_2(tx_rs_thresh)) {
+		PMD_INIT_LOG(ERR, "tx_rs_thresh must be a power of 2. (tx_rs_thresh=%u port=%d queue=%d)",
+			     (unsigned int)tx_rs_thresh,
+			     (int)dev->data->port_id,
+			     (int)queue_idx);
+		return -(EINVAL);
+	}
 
 	/*
 	 * If rs_bit_thresh is greater than 1, then TX WTHRESH should be
@@ -2870,6 +2867,7 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	txq->mz = tz;
 	txq->nb_tx_desc = nb_desc;
 	txq->tx_rs_thresh = tx_rs_thresh;
+	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
 	txq->tx_free_thresh = tx_free_thresh;
 	txq->pthresh = tx_conf->tx_thresh.pthresh;
 	txq->hthresh = tx_conf->tx_thresh.hthresh;
@@ -2913,6 +2911,16 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	PMD_INIT_LOG(DEBUG, "sw_ring=%p hw_ring=%p dma_addr=0x%"PRIx64,
 		     txq->sw_ring, txq->ixgbe_tx_ring, txq->tx_ring_dma);
 
+	/* Allocate RS last_id tracking array */
+	uint16_t num_rs_buckets = nb_desc / tx_rs_thresh;
+	txq->rs_last_id = rte_zmalloc_socket(NULL, sizeof(txq->rs_last_id[0]) * num_rs_buckets,
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (txq->rs_last_id == NULL) {
+		ixgbe_tx_queue_release(txq);
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for RS last_id array");
+		return -ENOMEM;
+	}
+
 	/* set up vector or scalar TX function as appropriate */
 	ixgbe_set_tx_function(dev, txq);
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v3 27/36] net/intel: drop unused Tx queue used count
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
                     ` (25 preceding siblings ...)
  2026-01-30 11:41   ` [PATCH v3 26/36] net/ixgbe: " Bruce Richardson
@ 2026-01-30 11:41   ` Bruce Richardson
  2026-01-30 11:41   ` [PATCH v3 28/36] net/intel: remove index for tracking end of packet Bruce Richardson
                     ` (8 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-30 11:41 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Vladimir Medvedkin, Anatoly Burakov,
	Jingjing Wu, Praveen Shetty

Since drivers now track the setting of the RS bit based on fixed
thresholds rather than after a fixed number of descriptors, we no longer
need to track the number of descriptors used from one call to another.
Therefore we can remove the tx_used value in the Tx queue structure.

This value was still being used inside the IDPF splitq scalar code,
however, the ipdf driver-specific section of the Tx queue structure also
had an rs_compl_count value that was only used for the vector code
paths, so we can use it to replace the old tx_used value in the scalar
path.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h                   | 1 -
 drivers/net/intel/common/tx_scalar_fns.h        | 1 -
 drivers/net/intel/i40e/i40e_rxtx.c              | 1 -
 drivers/net/intel/iavf/iavf_rxtx.c              | 1 -
 drivers/net/intel/ice/ice_dcf_ethdev.c          | 1 -
 drivers/net/intel/ice/ice_rxtx.c                | 1 -
 drivers/net/intel/idpf/idpf_common_rxtx.c       | 8 +++-----
 drivers/net/intel/ixgbe/ixgbe_rxtx.c            | 8 --------
 drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c | 1 -
 9 files changed, 3 insertions(+), 20 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index a1372b145f..0d11fcc142 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -131,7 +131,6 @@ struct ci_tx_queue {
 	uint16_t *rs_last_id;
 	uint16_t nb_tx_desc;           /* number of TX descriptors */
 	uint16_t tx_tail; /* current value of tail register */
-	uint16_t nb_tx_used; /* number of TX desc used since RS bit set */
 	/* index to last TX descriptor to have been cleaned */
 	uint16_t last_desc_cleaned;
 	/* Total number of TX descriptors ready to be allocated. */
diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index f0f89e5e9e..f82331101a 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -405,7 +405,6 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			m_seg = m_seg->next;
 		} while (m_seg);
 end_pkt:
-		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
 		/* Check if packet crosses into a new RS threshold bucket.
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 4fcc5c2f54..33a0ae332a 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -2645,7 +2645,6 @@ i40e_reset_tx_queue(struct ci_tx_queue *txq)
 	txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 
 	txq->last_desc_cleaned = (uint16_t)(txq->nb_tx_desc - 1);
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_desc - 1);
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index e7187f713d..3fcb8d7b79 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -288,7 +288,6 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	}
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 
 	txq->last_desc_cleaned = txq->nb_tx_desc - 1;
 	txq->nb_tx_free = txq->nb_tx_desc - 1;
diff --git a/drivers/net/intel/ice/ice_dcf_ethdev.c b/drivers/net/intel/ice/ice_dcf_ethdev.c
index 4ceecc15c6..02a23629d6 100644
--- a/drivers/net/intel/ice/ice_dcf_ethdev.c
+++ b/drivers/net/intel/ice/ice_dcf_ethdev.c
@@ -414,7 +414,6 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	}
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 
 	txq->last_desc_cleaned = txq->nb_tx_desc - 1;
 	txq->nb_tx_free = txq->nb_tx_desc - 1;
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 32cb938c84..14c4683ad7 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -1130,7 +1130,6 @@ ice_reset_tx_queue(struct ci_tx_queue *txq)
 	txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 
 	txq->last_desc_cleaned = (uint16_t)(txq->nb_tx_desc - 1);
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_desc - 1);
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 8859bcca86..95f2e1deea 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -224,7 +224,6 @@ idpf_qc_split_tx_descq_reset(struct ci_tx_queue *txq)
 	}
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 
 	/* Use this as next to clean for split desc queue */
 	txq->last_desc_cleaned = 0;
@@ -284,7 +283,6 @@ idpf_qc_single_tx_queue_reset(struct ci_tx_queue *txq)
 	}
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 
 	txq->last_desc_cleaned = txq->nb_tx_desc - 1;
 	txq->nb_tx_free = txq->nb_tx_desc - 1;
@@ -992,12 +990,12 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txd->qw1.cmd_dtype |= IDPF_TXD_FLEX_FLOW_CMD_EOP;
 
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
-		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
+		txq->rs_compl_count += nb_used;
 
-		if (txq->nb_tx_used >= 32) {
+		if (txq->rs_compl_count >= 32) {
 			txd->qw1.cmd_dtype |= IDPF_TXD_FLEX_FLOW_CMD_RE;
 			/* Update txq RE bit counters */
-			txq->nb_tx_used = 0;
+			txq->rs_compl_count = 0;
 		}
 	}
 
diff --git a/drivers/net/intel/ixgbe/ixgbe_rxtx.c b/drivers/net/intel/ixgbe/ixgbe_rxtx.c
index 3e37ccc50d..ea609d926a 100644
--- a/drivers/net/intel/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/intel/ixgbe/ixgbe_rxtx.c
@@ -708,12 +708,6 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		 */
 		nb_used = (uint16_t)(tx_pkt->nb_segs + new_ctx);
 
-		if (txp != NULL &&
-				nb_used + txq->nb_tx_used >= txq->tx_rs_thresh)
-			/* set RS on the previous packet in the burst */
-			txp->read.cmd_type_len |=
-				rte_cpu_to_le_32(IXGBE_TXD_CMD_RS);
-
 		/*
 		 * The number of descriptors that must be allocated for a
 		 * packet is the number of segments of that packet, plus 1
@@ -912,7 +906,6 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		 * The last packet data descriptor needs End Of Packet (EOP)
 		 */
 		cmd_type_len |= IXGBE_TXD_CMD_EOP;
-		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
 		/*
@@ -2551,7 +2544,6 @@ ixgbe_reset_tx_queue(struct ci_tx_queue *txq)
 	txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 	/*
 	 * Always allow 1 descriptor to be un-allocated to avoid
 	 * a H/W race condition
diff --git a/drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c b/drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c
index eb7c79eaf9..63c7cb50d3 100644
--- a/drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c
+++ b/drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c
@@ -47,7 +47,6 @@ ixgbe_reset_tx_queue_vec(struct ci_tx_queue *txq)
 	txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 	/*
 	 * Always allow 1 descriptor to be un-allocated to avoid
 	 * a H/W race condition
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v3 28/36] net/intel: remove index for tracking end of packet
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
                     ` (26 preceding siblings ...)
  2026-01-30 11:41   ` [PATCH v3 27/36] net/intel: drop unused Tx queue used count Bruce Richardson
@ 2026-01-30 11:41   ` Bruce Richardson
  2026-01-30 11:41   ` [PATCH v3 29/36] net/intel: merge ring writes in simple Tx for ice and i40e Bruce Richardson
                     ` (7 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-30 11:41 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Vladimir Medvedkin, Anatoly Burakov,
	Jingjing Wu, Praveen Shetty

The last_id value in each tx_sw_queue entry was no longer used in the
datapath, remove it and its initialization. For the function releasing
packets back, rather than relying on "last_id" to identify end of
packet, instead check for the next pointer being NULL.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h             | 1 -
 drivers/net/intel/i40e/i40e_rxtx.c        | 8 +++-----
 drivers/net/intel/iavf/iavf_rxtx.c        | 9 ++++-----
 drivers/net/intel/ice/ice_dcf_ethdev.c    | 1 -
 drivers/net/intel/ice/ice_rxtx.c          | 9 ++++-----
 drivers/net/intel/idpf/idpf_common_rxtx.c | 2 --
 drivers/net/intel/ixgbe/ixgbe_rxtx.c      | 9 ++++-----
 7 files changed, 15 insertions(+), 24 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 0d11fcc142..16f8256304 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -105,7 +105,6 @@ struct ci_tx_queue;
 struct ci_tx_entry {
 	struct rte_mbuf *mbuf; /* mbuf associated with TX desc, if any. */
 	uint16_t next_id; /* Index of next descriptor in ring. */
-	uint16_t last_id; /* Index of last scattered descriptor. */
 };
 
 /**
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 33a0ae332a..792fb5a86a 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -2536,14 +2536,13 @@ i40e_tx_done_cleanup_full(struct ci_tx_queue *txq,
 			pkt_cnt < free_cnt &&
 			tx_id != tx_last; i++) {
 			if (swr_ring[tx_id].mbuf != NULL) {
-				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
-				swr_ring[tx_id].mbuf = NULL;
-
 				/*
 				 * last segment in the packet,
 				 * increment packet count
 				 */
-				pkt_cnt += (swr_ring[tx_id].last_id == tx_id);
+				pkt_cnt += (swr_ring[tx_id].mbuf->next == NULL) ? 1 : 0;
+				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
+				swr_ring[tx_id].mbuf = NULL;
 			}
 
 			tx_id = swr_ring[tx_id].next_id;
@@ -2636,7 +2635,6 @@ i40e_reset_tx_queue(struct ci_tx_queue *txq)
 		txd->cmd_type_offset_bsz =
 			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
-		txe[i].last_id = i;
 		txe[prev].next_id = i;
 		prev = i;
 	}
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index 3fcb8d7b79..cb3b579d20 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -282,7 +282,6 @@ reset_tx_queue(struct ci_tx_queue *txq)
 		txq->ci_tx_ring[i].cmd_type_offset_bsz =
 			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
-		txe[i].last_id = i;
 		txe[prev].next_id = i;
 		prev = i;
 	}
@@ -3960,14 +3959,14 @@ iavf_tx_done_cleanup_full(struct ci_tx_queue *txq,
 	while (pkt_cnt < free_cnt) {
 		do {
 			if (swr_ring[tx_id].mbuf != NULL) {
-				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
-				swr_ring[tx_id].mbuf = NULL;
-
 				/*
 				 * last segment in the packet,
 				 * increment packet count
 				 */
-				pkt_cnt += (swr_ring[tx_id].last_id == tx_id);
+				pkt_cnt += (swr_ring[tx_id].mbuf->next == NULL) ? 1 : 0;
+				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
+				swr_ring[tx_id].mbuf = NULL;
+
 			}
 
 			tx_id = swr_ring[tx_id].next_id;
diff --git a/drivers/net/intel/ice/ice_dcf_ethdev.c b/drivers/net/intel/ice/ice_dcf_ethdev.c
index 02a23629d6..abd7875e7b 100644
--- a/drivers/net/intel/ice/ice_dcf_ethdev.c
+++ b/drivers/net/intel/ice/ice_dcf_ethdev.c
@@ -408,7 +408,6 @@ reset_tx_queue(struct ci_tx_queue *txq)
 		txq->ci_tx_ring[i].cmd_type_offset_bsz =
 			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
-		txe[i].last_id = i;
 		txe[prev].next_id = i;
 		prev = i;
 	}
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 14c4683ad7..b79e195b5c 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -1121,7 +1121,6 @@ ice_reset_tx_queue(struct ci_tx_queue *txq)
 		txd->cmd_type_offset_bsz =
 			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
-		txe[i].last_id = i;
 		txe[prev].next_id = i;
 		prev = i;
 	}
@@ -3201,14 +3200,14 @@ ice_tx_done_cleanup_full(struct ci_tx_queue *txq,
 			pkt_cnt < free_cnt &&
 			tx_id != tx_last; i++) {
 			if (swr_ring[tx_id].mbuf != NULL) {
-				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
-				swr_ring[tx_id].mbuf = NULL;
-
 				/*
 				 * last segment in the packet,
 				 * increment packet count
 				 */
-				pkt_cnt += (swr_ring[tx_id].last_id == tx_id);
+				pkt_cnt += swr_ring[tx_id].mbuf->next == NULL ? 1 : 0;
+				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
+				swr_ring[tx_id].mbuf = NULL;
+
 			}
 
 			tx_id = swr_ring[tx_id].next_id;
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 95f2e1deea..bd77113551 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -218,7 +218,6 @@ idpf_qc_split_tx_descq_reset(struct ci_tx_queue *txq)
 	prev = (uint16_t)(txq->sw_nb_desc - 1);
 	for (i = 0; i < txq->sw_nb_desc; i++) {
 		txe[i].mbuf = NULL;
-		txe[i].last_id = i;
 		txe[prev].next_id = i;
 		prev = i;
 	}
@@ -277,7 +276,6 @@ idpf_qc_single_tx_queue_reset(struct ci_tx_queue *txq)
 		txq->ci_tx_ring[i].cmd_type_offset_bsz =
 			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
-		txe[i].last_id = i;
 		txe[prev].next_id = i;
 		prev = i;
 	}
diff --git a/drivers/net/intel/ixgbe/ixgbe_rxtx.c b/drivers/net/intel/ixgbe/ixgbe_rxtx.c
index ea609d926a..dc9fda8e21 100644
--- a/drivers/net/intel/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/intel/ixgbe/ixgbe_rxtx.c
@@ -2407,14 +2407,14 @@ ixgbe_tx_done_cleanup_full(struct ci_tx_queue *txq, uint32_t free_cnt)
 			pkt_cnt < free_cnt &&
 			tx_id != tx_last; i++) {
 			if (swr_ring[tx_id].mbuf != NULL) {
-				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
-				swr_ring[tx_id].mbuf = NULL;
-
 				/*
 				 * last segment in the packet,
 				 * increment packet count
 				 */
-				pkt_cnt += (swr_ring[tx_id].last_id == tx_id);
+				pkt_cnt += swr_ring[tx_id].mbuf->next == NULL ? 1 : 0;
+				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
+				swr_ring[tx_id].mbuf = NULL;
+
 			}
 
 			tx_id = swr_ring[tx_id].next_id;
@@ -2535,7 +2535,6 @@ ixgbe_reset_tx_queue(struct ci_tx_queue *txq)
 
 		txd->wb.status = rte_cpu_to_le_32(IXGBE_TXD_STAT_DD);
 		txe[i].mbuf = NULL;
-		txe[i].last_id = i;
 		txe[prev].next_id = i;
 		prev = i;
 	}
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v3 29/36] net/intel: merge ring writes in simple Tx for ice and i40e
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
                     ` (27 preceding siblings ...)
  2026-01-30 11:41   ` [PATCH v3 28/36] net/intel: remove index for tracking end of packet Bruce Richardson
@ 2026-01-30 11:41   ` Bruce Richardson
  2026-01-30 11:41   ` [PATCH v3 30/36] net/intel: consolidate ice and i40e buffer free function Bruce Richardson
                     ` (6 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-30 11:41 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

The ice and i40e drivers have identical code for writing ring entries in
the simple Tx path, so merge in the descriptor writing code.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h                 |  6 ++
 drivers/net/intel/common/tx_scalar_fns.h      | 60 ++++++++++++++
 drivers/net/intel/i40e/i40e_rxtx.c            | 79 +------------------
 drivers/net/intel/i40e/i40e_rxtx.h            |  3 -
 .../net/intel/i40e/i40e_rxtx_vec_altivec.c    |  4 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c   |  4 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c |  4 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_neon.c   |  4 +-
 drivers/net/intel/ice/ice_rxtx.c              | 69 +---------------
 drivers/net/intel/ice/ice_rxtx.h              |  2 -
 drivers/net/intel/ice/ice_rxtx_vec_avx2.c     |  4 +-
 drivers/net/intel/ice/ice_rxtx_vec_avx512.c   |  4 +-
 12 files changed, 86 insertions(+), 157 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 16f8256304..0f545631af 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -63,6 +63,12 @@ enum ci_tx_l2tag1_field {
 /* Common maximum data per TX descriptor */
 #define CI_MAX_DATA_PER_TXD     (CI_TXD_QW1_TX_BUF_SZ_M >> CI_TXD_QW1_TX_BUF_SZ_S)
 
+/* Common TX maximum burst size for chunked transmission in simple paths */
+#define CI_TX_MAX_BURST 32
+
+/* Common TX descriptor command flags for simple transmit */
+#define CI_TX_DESC_CMD_DEFAULT (CI_TX_DESC_CMD_ICRC | CI_TX_DESC_CMD_EOP)
+
 /* Checksum offload mask to identify packets requesting offload */
 #define CI_TX_CKSUM_OFFLOAD_MASK (RTE_MBUF_F_TX_IP_CKSUM |		 \
 				   RTE_MBUF_F_TX_L4_MASK |		 \
diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index f82331101a..d09d118197 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -12,6 +12,66 @@
 /* depends on common Tx definitions. */
 #include "tx.h"
 
+/* Populate 4 descriptors with data from 4 mbufs */
+static inline void
+ci_tx_fill_hw_ring_tx4(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
+{
+	uint64_t dma_addr;
+	uint32_t i;
+
+	for (i = 0; i < 4; i++, txdp++, pkts++) {
+		dma_addr = rte_mbuf_data_iova(*pkts);
+		txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
+		txdp->cmd_type_offset_bsz =
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)CI_TX_DESC_CMD_DEFAULT << CI_TXD_QW1_CMD_S) |
+				((uint64_t)(*pkts)->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
+	}
+}
+
+/* Populate 1 descriptor with data from 1 mbuf */
+static inline void
+ci_tx_fill_hw_ring_tx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
+{
+	uint64_t dma_addr;
+
+	dma_addr = rte_mbuf_data_iova(*pkts);
+	txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
+	txdp->cmd_type_offset_bsz =
+		rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+			((uint64_t)CI_TX_DESC_CMD_DEFAULT << CI_TXD_QW1_CMD_S) |
+			((uint64_t)(*pkts)->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
+}
+
+/* Fill hardware descriptor ring with mbuf data */
+static inline void
+ci_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
+		   uint16_t nb_pkts)
+{
+	volatile struct ci_tx_desc *txdp = &txq->ci_tx_ring[txq->tx_tail];
+	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
+	const int N_PER_LOOP = 4;
+	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
+	int mainpart, leftover;
+	int i, j;
+
+	mainpart = nb_pkts & ((uint32_t)~N_PER_LOOP_MASK);
+	leftover = nb_pkts & ((uint32_t)N_PER_LOOP_MASK);
+	for (i = 0; i < mainpart; i += N_PER_LOOP) {
+		for (j = 0; j < N_PER_LOOP; ++j)
+			(txep + i + j)->mbuf = *(pkts + i + j);
+		ci_tx_fill_hw_ring_tx4(txdp + i, pkts + i);
+	}
+
+	if (unlikely(leftover > 0)) {
+		for (i = 0; i < leftover; ++i) {
+			(txep + mainpart + i)->mbuf = *(pkts + mainpart + i);
+			ci_tx_fill_hw_ring_tx1(txdp + mainpart + i,
+					       pkts + mainpart + i);
+		}
+	}
+}
+
 /*
  * Common transmit descriptor cleanup function for Intel drivers.
  * Used by ice, i40e, iavf, and idpf drivers.
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 792fb5a86a..bd85c7324d 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -311,19 +311,6 @@ i40e_parse_tunneling_params(uint64_t ol_flags,
 		*cd_tunneling |= I40E_TXD_CTX_QW0_L4T_CS_MASK;
 }
 
-/* Construct the tx flags */
-static inline uint64_t
-i40e_build_ctob(uint32_t td_cmd,
-		uint32_t td_offset,
-		unsigned int size,
-		uint32_t td_tag)
-{
-	return rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
-			((uint64_t)td_cmd  << CI_TXD_QW1_CMD_S) |
-			((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
-			((uint64_t)size  << CI_TXD_QW1_TX_BUF_SZ_S) |
-			((uint64_t)td_tag  << CI_TXD_QW1_L2TAG1_S));
-}
 
 static inline int
 #ifdef RTE_LIBRTE_I40E_RX_ALLOW_BULK_ALLOC
@@ -1082,64 +1069,6 @@ i40e_tx_free_bufs(struct ci_tx_queue *txq)
 	return tx_rs_thresh;
 }
 
-/* Populate 4 descriptors with data from 4 mbufs */
-static inline void
-tx4(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
-{
-	uint64_t dma_addr;
-	uint32_t i;
-
-	for (i = 0; i < 4; i++, txdp++, pkts++) {
-		dma_addr = rte_mbuf_data_iova(*pkts);
-		txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
-		txdp->cmd_type_offset_bsz =
-			i40e_build_ctob((uint32_t)I40E_TD_CMD, 0,
-					(*pkts)->data_len, 0);
-	}
-}
-
-/* Populate 1 descriptor with data from 1 mbuf */
-static inline void
-tx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
-{
-	uint64_t dma_addr;
-
-	dma_addr = rte_mbuf_data_iova(*pkts);
-	txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
-	txdp->cmd_type_offset_bsz =
-		i40e_build_ctob((uint32_t)I40E_TD_CMD, 0,
-				(*pkts)->data_len, 0);
-}
-
-/* Fill hardware descriptor ring with mbuf data */
-static inline void
-i40e_tx_fill_hw_ring(struct ci_tx_queue *txq,
-		     struct rte_mbuf **pkts,
-		     uint16_t nb_pkts)
-{
-	volatile struct ci_tx_desc *txdp = &txq->ci_tx_ring[txq->tx_tail];
-	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
-	const int N_PER_LOOP = 4;
-	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
-	int mainpart, leftover;
-	int i, j;
-
-	mainpart = (nb_pkts & ((uint32_t) ~N_PER_LOOP_MASK));
-	leftover = (nb_pkts & ((uint32_t)  N_PER_LOOP_MASK));
-	for (i = 0; i < mainpart; i += N_PER_LOOP) {
-		for (j = 0; j < N_PER_LOOP; ++j) {
-			(txep + i + j)->mbuf = *(pkts + i + j);
-		}
-		tx4(txdp + i, pkts + i);
-	}
-	if (unlikely(leftover > 0)) {
-		for (i = 0; i < leftover; ++i) {
-			(txep + mainpart + i)->mbuf = *(pkts + mainpart + i);
-			tx1(txdp + mainpart + i, pkts + mainpart + i);
-		}
-	}
-}
-
 static inline uint16_t
 tx_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
@@ -1164,7 +1093,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
 	if ((txq->tx_tail + nb_pkts) > txq->nb_tx_desc) {
 		n = (uint16_t)(txq->nb_tx_desc - txq->tx_tail);
-		i40e_tx_fill_hw_ring(txq, tx_pkts, n);
+		ci_tx_fill_hw_ring(txq, tx_pkts, n);
 		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
@@ -1172,7 +1101,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	}
 
 	/* Fill hardware descriptor ring with mbuf data */
-	i40e_tx_fill_hw_ring(txq, tx_pkts + n, (uint16_t)(nb_pkts - n));
+	ci_tx_fill_hw_ring(txq, tx_pkts + n, (uint16_t)(nb_pkts - n));
 	txq->tx_tail = (uint16_t)(txq->tx_tail + (nb_pkts - n));
 
 	/* Determine if RS bit needs to be set */
@@ -1201,13 +1130,13 @@ i40e_xmit_pkts_simple(void *tx_queue,
 {
 	uint16_t nb_tx = 0;
 
-	if (likely(nb_pkts <= I40E_TX_MAX_BURST))
+	if (likely(nb_pkts <= CI_TX_MAX_BURST))
 		return tx_xmit_pkts((struct ci_tx_queue *)tx_queue,
 						tx_pkts, nb_pkts);
 
 	while (nb_pkts) {
 		uint16_t ret, num = (uint16_t)RTE_MIN(nb_pkts,
-						I40E_TX_MAX_BURST);
+						CI_TX_MAX_BURST);
 
 		ret = tx_xmit_pkts((struct ci_tx_queue *)tx_queue,
 						&tx_pkts[nb_tx], num);
diff --git a/drivers/net/intel/i40e/i40e_rxtx.h b/drivers/net/intel/i40e/i40e_rxtx.h
index 307ffa3049..0977342064 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.h
+++ b/drivers/net/intel/i40e/i40e_rxtx.h
@@ -47,9 +47,6 @@
 #define I40E_RX_DESC_EXT_STATUS_FLEXBL_MASK   0x03
 #define I40E_RX_DESC_EXT_STATUS_FLEXBL_FLEX   0x01
 
-#define I40E_TD_CMD (CI_TX_DESC_CMD_ICRC |\
-		     CI_TX_DESC_CMD_EOP)
-
 enum i40e_header_split_mode {
 	i40e_header_split_none = 0,
 	i40e_header_split_enabled = 1,
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
index 4c36748d94..68667bdc9b 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
@@ -476,8 +476,8 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
-	uint64_t flags = I40E_TD_CMD;
-	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
+	uint64_t flags = CI_TX_DESC_CMD_DEFAULT;
+	uint64_t rs = CI_TX_DESC_CMD_RS | CI_TX_DESC_CMD_DEFAULT;
 	int i;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
index 502a1842c6..e1672c4371 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
@@ -741,8 +741,8 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
-	uint64_t flags = I40E_TD_CMD;
-	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
+	uint64_t flags = CI_TX_DESC_CMD_DEFAULT;
+	uint64_t rs = CI_TX_DESC_CMD_RS | CI_TX_DESC_CMD_DEFAULT;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
 		ci_tx_free_bufs_vec(txq, i40e_tx_desc_done, false);
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
index d48ff9f51e..bceb95ad2d 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
@@ -801,8 +801,8 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
-	uint64_t flags = I40E_TD_CMD;
-	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
+	uint64_t flags = CI_TX_DESC_CMD_DEFAULT;
+	uint64_t rs = CI_TX_DESC_CMD_RS | CI_TX_DESC_CMD_DEFAULT;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
 		ci_tx_free_bufs_vec(txq, i40e_tx_desc_done, false);
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
index be4c64942e..debc9bda28 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
@@ -626,8 +626,8 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict tx_queue,
 	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
-	uint64_t flags = I40E_TD_CMD;
-	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
+	uint64_t flags = CI_TX_DESC_CMD_DEFAULT;
+	uint64_t rs = CI_TX_DESC_CMD_RS | CI_TX_DESC_CMD_DEFAULT;
 	int i;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index b79e195b5c..26b4c73eb6 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3286,67 +3286,6 @@ ice_tx_done_cleanup(void *txq, uint32_t free_cnt)
 		return ice_tx_done_cleanup_full(q, free_cnt);
 }
 
-/* Populate 4 descriptors with data from 4 mbufs */
-static inline void
-tx4(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
-{
-	uint64_t dma_addr;
-	uint32_t i;
-
-	for (i = 0; i < 4; i++, txdp++, pkts++) {
-		dma_addr = rte_mbuf_data_iova(*pkts);
-		txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
-		txdp->cmd_type_offset_bsz =
-			ice_build_ctob((uint32_t)ICE_TD_CMD, 0,
-				       (*pkts)->data_len, 0);
-	}
-}
-
-/* Populate 1 descriptor with data from 1 mbuf */
-static inline void
-tx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
-{
-	uint64_t dma_addr;
-
-	dma_addr = rte_mbuf_data_iova(*pkts);
-	txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
-	txdp->cmd_type_offset_bsz =
-		ice_build_ctob((uint32_t)ICE_TD_CMD, 0,
-			       (*pkts)->data_len, 0);
-}
-
-static inline void
-ice_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
-		    uint16_t nb_pkts)
-{
-	volatile struct ci_tx_desc *txdp = &txq->ci_tx_ring[txq->tx_tail];
-	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
-	const int N_PER_LOOP = 4;
-	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
-	int mainpart, leftover;
-	int i, j;
-
-	/**
-	 * Process most of the packets in chunks of N pkts.  Any
-	 * leftover packets will get processed one at a time.
-	 */
-	mainpart = nb_pkts & ((uint32_t)~N_PER_LOOP_MASK);
-	leftover = nb_pkts & ((uint32_t)N_PER_LOOP_MASK);
-	for (i = 0; i < mainpart; i += N_PER_LOOP) {
-		/* Copy N mbuf pointers to the S/W ring */
-		for (j = 0; j < N_PER_LOOP; ++j)
-			(txep + i + j)->mbuf = *(pkts + i + j);
-		tx4(txdp + i, pkts + i);
-	}
-
-	if (unlikely(leftover > 0)) {
-		for (i = 0; i < leftover; ++i) {
-			(txep + mainpart + i)->mbuf = *(pkts + mainpart + i);
-			tx1(txdp + mainpart + i, pkts + mainpart + i);
-		}
-	}
-}
-
 static inline uint16_t
 tx_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
@@ -3371,7 +3310,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
 	if ((txq->tx_tail + nb_pkts) > txq->nb_tx_desc) {
 		n = (uint16_t)(txq->nb_tx_desc - txq->tx_tail);
-		ice_tx_fill_hw_ring(txq, tx_pkts, n);
+		ci_tx_fill_hw_ring(txq, tx_pkts, n);
 		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
@@ -3379,7 +3318,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	}
 
 	/* Fill hardware descriptor ring with mbuf data */
-	ice_tx_fill_hw_ring(txq, tx_pkts + n, (uint16_t)(nb_pkts - n));
+	ci_tx_fill_hw_ring(txq, tx_pkts + n, (uint16_t)(nb_pkts - n));
 	txq->tx_tail = (uint16_t)(txq->tx_tail + (nb_pkts - n));
 
 	/* Determine if RS bit needs to be set */
@@ -3408,13 +3347,13 @@ ice_xmit_pkts_simple(void *tx_queue,
 {
 	uint16_t nb_tx = 0;
 
-	if (likely(nb_pkts <= ICE_TX_MAX_BURST))
+	if (likely(nb_pkts <= CI_TX_MAX_BURST))
 		return tx_xmit_pkts((struct ci_tx_queue *)tx_queue,
 				    tx_pkts, nb_pkts);
 
 	while (nb_pkts) {
 		uint16_t ret, num = (uint16_t)RTE_MIN(nb_pkts,
-						      ICE_TX_MAX_BURST);
+						      CI_TX_MAX_BURST);
 
 		ret = tx_xmit_pkts((struct ci_tx_queue *)tx_queue,
 				   &tx_pkts[nb_tx], num);
diff --git a/drivers/net/intel/ice/ice_rxtx.h b/drivers/net/intel/ice/ice_rxtx.h
index cd5fa93d1c..ddcd012e8b 100644
--- a/drivers/net/intel/ice/ice_rxtx.h
+++ b/drivers/net/intel/ice/ice_rxtx.h
@@ -46,8 +46,6 @@
 
 #define ICE_SUPPORT_CHAIN_NUM 5
 
-#define ICE_TD_CMD                      CI_TX_DESC_CMD_EOP
-
 #define ICE_VPMD_RX_BURST            CI_VPMD_RX_BURST
 #define ICE_VPMD_TX_BURST            32
 #define ICE_VPMD_RXQ_REARM_THRESH    CI_VPMD_RX_REARM_THRESH
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
index 2922671158..d03f2e5b36 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
@@ -845,8 +845,8 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
-	uint64_t flags = ICE_TD_CMD;
-	uint64_t rs = CI_TX_DESC_CMD_RS | ICE_TD_CMD;
+	uint64_t flags = CI_TX_DESC_CMD_DEFAULT;
+	uint64_t rs = CI_TX_DESC_CMD_RS | CI_TX_DESC_CMD_DEFAULT;
 
 	/* cross rx_thresh boundary is not allowed */
 	nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
index e64b6e227b..004c01054a 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
@@ -909,8 +909,8 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
-	uint64_t flags = ICE_TD_CMD;
-	uint64_t rs = CI_TX_DESC_CMD_RS | ICE_TD_CMD;
+	uint64_t flags = CI_TX_DESC_CMD_DEFAULT;
+	uint64_t rs = CI_TX_DESC_CMD_RS | CI_TX_DESC_CMD_DEFAULT;
 
 	/* cross rx_thresh boundary is not allowed */
 	nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v3 30/36] net/intel: consolidate ice and i40e buffer free function
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
                     ` (28 preceding siblings ...)
  2026-01-30 11:41   ` [PATCH v3 29/36] net/intel: merge ring writes in simple Tx for ice and i40e Bruce Richardson
@ 2026-01-30 11:41   ` Bruce Richardson
  2026-01-30 11:41   ` [PATCH v3 31/36] net/intel: complete merging simple Tx paths Bruce Richardson
                     ` (5 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-30 11:41 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

The buffer freeing function for the simple scalar Tx path is almost
identical in both ice and i40e drivers, except that the i40e has
batching for the FAST_FREE case. Consolidate both functions into a
common one based off the better i40e version.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h            |  3 ++
 drivers/net/intel/common/tx_scalar_fns.h | 58 ++++++++++++++++++++++
 drivers/net/intel/i40e/i40e_rxtx.c       | 63 +-----------------------
 drivers/net/intel/ice/ice_rxtx.c         | 45 +----------------
 4 files changed, 65 insertions(+), 104 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 0f545631af..3c388857a7 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -66,6 +66,9 @@ enum ci_tx_l2tag1_field {
 /* Common TX maximum burst size for chunked transmission in simple paths */
 #define CI_TX_MAX_BURST 32
 
+/* Common TX maximum free buffer size for batched bulk freeing */
+#define CI_TX_MAX_FREE_BUF_SZ 64
+
 /* Common TX descriptor command flags for simple transmit */
 #define CI_TX_DESC_CMD_DEFAULT (CI_TX_DESC_CMD_ICRC | CI_TX_DESC_CMD_EOP)
 
diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index d09d118197..185fcdfa72 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -72,6 +72,64 @@ ci_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
 	}
 }
 
+/* Free transmitted mbufs from descriptor ring with bulk freeing for Tx simple path */
+static __rte_always_inline int
+ci_tx_free_bufs(struct ci_tx_queue *txq)
+{
+	const uint16_t rs_thresh = txq->tx_rs_thresh;
+	const uint16_t k = RTE_ALIGN_FLOOR(rs_thresh, CI_TX_MAX_FREE_BUF_SZ);
+	const uint16_t m = rs_thresh % CI_TX_MAX_FREE_BUF_SZ;
+	struct rte_mbuf *free[CI_TX_MAX_FREE_BUF_SZ];
+	struct ci_tx_entry *txep;
+
+	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
+			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
+		return 0;
+
+	txep = &txq->sw_ring[txq->tx_next_dd - (rs_thresh - 1)];
+
+	struct rte_mempool *fast_free_mp =
+			likely(txq->fast_free_mp != (void *)UINTPTR_MAX) ?
+				txq->fast_free_mp :
+				(txq->fast_free_mp = txep[0].mbuf->pool);
+
+	if (fast_free_mp) {
+		if (k) {
+			for (uint16_t j = 0; j != k; j += CI_TX_MAX_FREE_BUF_SZ) {
+				for (uint16_t i = 0; i < CI_TX_MAX_FREE_BUF_SZ; ++i, ++txep) {
+					free[i] = txep->mbuf;
+					txep->mbuf = NULL;
+				}
+				rte_mbuf_raw_free_bulk(fast_free_mp, free, CI_TX_MAX_FREE_BUF_SZ);
+			}
+		}
+
+		if (m) {
+			for (uint16_t i = 0; i < m; ++i, ++txep) {
+				free[i] = txep->mbuf;
+				txep->mbuf = NULL;
+			}
+			rte_mbuf_raw_free_bulk(fast_free_mp, free, m);
+		}
+	} else {
+		for (uint16_t i = 0; i < rs_thresh; ++i, ++txep)
+			rte_prefetch0((txep + i)->mbuf);
+
+		for (uint16_t i = 0; i < rs_thresh; ++i, ++txep) {
+			rte_pktmbuf_free_seg(txep->mbuf);
+			txep->mbuf = NULL;
+		}
+	}
+
+	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + rs_thresh);
+	txq->tx_next_dd = (uint16_t)(txq->tx_next_dd + rs_thresh);
+	if (txq->tx_next_dd >= txq->nb_tx_desc)
+		txq->tx_next_dd = (uint16_t)(rs_thresh - 1);
+
+	return rs_thresh;
+}
+
 /*
  * Common transmit descriptor cleanup function for Intel drivers.
  * Used by ice, i40e, iavf, and idpf drivers.
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index bd85c7324d..395808ff7c 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -1010,65 +1010,6 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			get_context_desc, NULL, NULL);
 }
 
-static __rte_always_inline int
-i40e_tx_free_bufs(struct ci_tx_queue *txq)
-{
-	struct ci_tx_entry *txep;
-	const uint16_t tx_rs_thresh = txq->tx_rs_thresh;
-	uint16_t i, j;
-	struct rte_mbuf *free[I40E_TX_MAX_FREE_BUF_SZ];
-	const uint16_t k = RTE_ALIGN_FLOOR(tx_rs_thresh, I40E_TX_MAX_FREE_BUF_SZ);
-	const uint16_t m = tx_rs_thresh % I40E_TX_MAX_FREE_BUF_SZ;
-
-	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
-			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
-		return 0;
-
-	txep = &txq->sw_ring[txq->tx_next_dd - (tx_rs_thresh - 1)];
-
-	struct rte_mempool *fast_free_mp =
-			likely(txq->fast_free_mp != (void *)UINTPTR_MAX) ?
-			txq->fast_free_mp :
-			(txq->fast_free_mp = txep[0].mbuf->pool);
-
-	if (fast_free_mp != NULL) {
-		if (k) {
-			for (j = 0; j != k; j += I40E_TX_MAX_FREE_BUF_SZ) {
-				for (i = 0; i < I40E_TX_MAX_FREE_BUF_SZ; ++i, ++txep) {
-					free[i] = txep->mbuf;
-					txep->mbuf = NULL;
-				}
-				rte_mbuf_raw_free_bulk(fast_free_mp, free,
-						I40E_TX_MAX_FREE_BUF_SZ);
-			}
-		}
-
-		if (m) {
-			for (i = 0; i < m; ++i, ++txep) {
-				free[i] = txep->mbuf;
-				txep->mbuf = NULL;
-			}
-			rte_mbuf_raw_free_bulk(fast_free_mp, free, m);
-		}
-	} else {
-		for (i = 0; i < tx_rs_thresh; i++)
-			rte_prefetch0((txep + i)->mbuf);
-
-		for (i = 0; i < tx_rs_thresh; ++i, ++txep) {
-			rte_pktmbuf_free_seg(txep->mbuf);
-			txep->mbuf = NULL;
-		}
-	}
-
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + tx_rs_thresh);
-	txq->tx_next_dd = (uint16_t)(txq->tx_next_dd + tx_rs_thresh);
-	if (txq->tx_next_dd >= txq->nb_tx_desc)
-		txq->tx_next_dd = (uint16_t)(tx_rs_thresh - 1);
-
-	return tx_rs_thresh;
-}
-
 static inline uint16_t
 tx_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
@@ -1083,7 +1024,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	 * descriptor, free the associated buffer.
 	 */
 	if (txq->nb_tx_free < txq->tx_free_thresh)
-		i40e_tx_free_bufs(txq);
+		ci_tx_free_bufs(txq);
 
 	/* Use available descriptor only */
 	nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
@@ -2508,7 +2449,7 @@ i40e_tx_done_cleanup_simple(struct ci_tx_queue *txq,
 		if (txq->nb_tx_desc - txq->nb_tx_free < txq->tx_rs_thresh)
 			break;
 
-		n = i40e_tx_free_bufs(txq);
+		n = ci_tx_free_bufs(txq);
 
 		if (n == 0)
 			break;
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 26b4c73eb6..c1477f3e87 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3129,47 +3129,6 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	return ci_xmit_pkts(txq, tx_pkts, nb_pkts, CI_VLAN_IN_L2TAG1, get_context_desc, NULL, NULL);
 }
 
-static __rte_always_inline int
-ice_tx_free_bufs(struct ci_tx_queue *txq)
-{
-	struct ci_tx_entry *txep;
-	uint16_t i;
-
-	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
-	     rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
-	    rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
-		return 0;
-
-	txep = &txq->sw_ring[txq->tx_next_dd - (txq->tx_rs_thresh - 1)];
-
-	struct rte_mempool *fast_free_mp =
-			likely(txq->fast_free_mp != (void *)UINTPTR_MAX) ?
-			txq->fast_free_mp :
-			(txq->fast_free_mp = txep[0].mbuf->pool);
-
-	if (fast_free_mp != NULL) {
-		for (i = 0; i < txq->tx_rs_thresh; ++i, ++txep) {
-			rte_mempool_put(fast_free_mp, txep->mbuf);
-			txep->mbuf = NULL;
-		}
-	} else {
-		for (i = 0; i < txq->tx_rs_thresh; i++)
-			rte_prefetch0((txep + i)->mbuf);
-
-		for (i = 0; i < txq->tx_rs_thresh; ++i, ++txep) {
-			rte_pktmbuf_free_seg(txep->mbuf);
-			txep->mbuf = NULL;
-		}
-	}
-
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + txq->tx_rs_thresh);
-	txq->tx_next_dd = (uint16_t)(txq->tx_next_dd + txq->tx_rs_thresh);
-	if (txq->tx_next_dd >= txq->nb_tx_desc)
-		txq->tx_next_dd = (uint16_t)(txq->tx_rs_thresh - 1);
-
-	return txq->tx_rs_thresh;
-}
-
 static int
 ice_tx_done_cleanup_full(struct ci_tx_queue *txq,
 			uint32_t free_cnt)
@@ -3259,7 +3218,7 @@ ice_tx_done_cleanup_simple(struct ci_tx_queue *txq,
 		if (txq->nb_tx_desc - txq->nb_tx_free < txq->tx_rs_thresh)
 			break;
 
-		n = ice_tx_free_bufs(txq);
+		n = ci_tx_free_bufs(txq);
 
 		if (n == 0)
 			break;
@@ -3300,7 +3259,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	 * descriptor, free the associated buffer.
 	 */
 	if (txq->nb_tx_free < txq->tx_free_thresh)
-		ice_tx_free_bufs(txq);
+		ci_tx_free_bufs(txq);
 
 	/* Use available descriptor only */
 	nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v3 31/36] net/intel: complete merging simple Tx paths
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
                     ` (29 preceding siblings ...)
  2026-01-30 11:41   ` [PATCH v3 30/36] net/intel: consolidate ice and i40e buffer free function Bruce Richardson
@ 2026-01-30 11:41   ` Bruce Richardson
  2026-01-30 11:41   ` [PATCH v3 32/36] net/intel: use non-volatile stores in simple Tx function Bruce Richardson
                     ` (4 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-30 11:41 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

Complete the deduplication/merger of the ice and i40e Tx simple scalar
paths.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar_fns.h | 87 ++++++++++++++++++++++++
 drivers/net/intel/i40e/i40e_rxtx.c       | 74 +-------------------
 drivers/net/intel/ice/ice_rxtx.c         | 74 +-------------------
 3 files changed, 89 insertions(+), 146 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index 185fcdfa72..5c4f09c197 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -130,6 +130,93 @@ ci_tx_free_bufs(struct ci_tx_queue *txq)
 	return rs_thresh;
 }
 
+/* Simple burst transmit for descriptor-based simple Tx path
+ *
+ * Transmits a burst of packets by filling hardware descriptors with mbuf
+ * data. Handles ring wrap-around and RS bit management. Performs descriptor
+ * cleanup when tx_free_thresh is reached.
+ *
+ * Returns: number of packets transmitted
+ */
+static inline uint16_t
+ci_xmit_burst_simple(struct ci_tx_queue *txq,
+		     struct rte_mbuf **tx_pkts,
+		     uint16_t nb_pkts)
+{
+	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
+	uint16_t n = 0;
+
+	/**
+	 * Begin scanning the H/W ring for done descriptors when the number
+	 * of available descriptors drops below tx_free_thresh. For each done
+	 * descriptor, free the associated buffer.
+	 */
+	if (txq->nb_tx_free < txq->tx_free_thresh)
+		ci_tx_free_bufs(txq);
+
+	/* Use available descriptor only */
+	nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
+	if (unlikely(!nb_pkts))
+		return 0;
+
+	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
+	if ((txq->tx_tail + nb_pkts) > txq->nb_tx_desc) {
+		n = (uint16_t)(txq->nb_tx_desc - txq->tx_tail);
+		ci_tx_fill_hw_ring(txq, tx_pkts, n);
+		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) <<
+					  CI_TXD_QW1_CMD_S);
+		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
+		txq->tx_tail = 0;
+	}
+
+	/* Fill hardware descriptor ring with mbuf data */
+	ci_tx_fill_hw_ring(txq, tx_pkts + n, (uint16_t)(nb_pkts - n));
+	txq->tx_tail = (uint16_t)(txq->tx_tail + (nb_pkts - n));
+
+	/* Determine if RS bit needs to be set */
+	if (txq->tx_tail > txq->tx_next_rs) {
+		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) <<
+					  CI_TXD_QW1_CMD_S);
+		txq->tx_next_rs =
+			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
+		if (txq->tx_next_rs >= txq->nb_tx_desc)
+			txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
+	}
+
+	if (txq->tx_tail >= txq->nb_tx_desc)
+		txq->tx_tail = 0;
+
+	/* Update the tx tail register */
+	rte_write32_wc((uint32_t)txq->tx_tail, txq->qtx_tail);
+
+	return nb_pkts;
+}
+
+static __rte_always_inline uint16_t
+ci_xmit_pkts_simple(struct ci_tx_queue *txq,
+		     struct rte_mbuf **tx_pkts,
+		     uint16_t nb_pkts)
+{
+	uint16_t nb_tx = 0;
+
+	if (likely(nb_pkts <= CI_TX_MAX_BURST))
+		return ci_xmit_burst_simple(txq, tx_pkts, nb_pkts);
+
+	while (nb_pkts) {
+		uint16_t ret, num = RTE_MIN(nb_pkts, CI_TX_MAX_BURST);
+
+		ret = ci_xmit_burst_simple(txq, &tx_pkts[nb_tx], num);
+		nb_tx += ret;
+		nb_pkts -= ret;
+		if (ret < num)
+			break;
+	}
+
+	return nb_tx;
+}
+
 /*
  * Common transmit descriptor cleanup function for Intel drivers.
  * Used by ice, i40e, iavf, and idpf drivers.
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 395808ff7c..b286e89b1b 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -1010,84 +1010,12 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			get_context_desc, NULL, NULL);
 }
 
-static inline uint16_t
-tx_xmit_pkts(struct ci_tx_queue *txq,
-	     struct rte_mbuf **tx_pkts,
-	     uint16_t nb_pkts)
-{
-	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
-	uint16_t n = 0;
-
-	/**
-	 * Begin scanning the H/W ring for done descriptors when the number
-	 * of available descriptors drops below tx_free_thresh. For each done
-	 * descriptor, free the associated buffer.
-	 */
-	if (txq->nb_tx_free < txq->tx_free_thresh)
-		ci_tx_free_bufs(txq);
-
-	/* Use available descriptor only */
-	nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
-	if (unlikely(!nb_pkts))
-		return 0;
-
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
-	if ((txq->tx_tail + nb_pkts) > txq->nb_tx_desc) {
-		n = (uint16_t)(txq->nb_tx_desc - txq->tx_tail);
-		ci_tx_fill_hw_ring(txq, tx_pkts, n);
-		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
-		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
-		txq->tx_tail = 0;
-	}
-
-	/* Fill hardware descriptor ring with mbuf data */
-	ci_tx_fill_hw_ring(txq, tx_pkts + n, (uint16_t)(nb_pkts - n));
-	txq->tx_tail = (uint16_t)(txq->tx_tail + (nb_pkts - n));
-
-	/* Determine if RS bit needs to be set */
-	if (txq->tx_tail > txq->tx_next_rs) {
-		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
-		txq->tx_next_rs =
-			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
-		if (txq->tx_next_rs >= txq->nb_tx_desc)
-			txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
-	}
-
-	if (txq->tx_tail >= txq->nb_tx_desc)
-		txq->tx_tail = 0;
-
-	/* Update the tx tail register */
-	I40E_PCI_REG_WC_WRITE(txq->qtx_tail, txq->tx_tail);
-
-	return nb_pkts;
-}
-
 static uint16_t
 i40e_xmit_pkts_simple(void *tx_queue,
 		      struct rte_mbuf **tx_pkts,
 		      uint16_t nb_pkts)
 {
-	uint16_t nb_tx = 0;
-
-	if (likely(nb_pkts <= CI_TX_MAX_BURST))
-		return tx_xmit_pkts((struct ci_tx_queue *)tx_queue,
-						tx_pkts, nb_pkts);
-
-	while (nb_pkts) {
-		uint16_t ret, num = (uint16_t)RTE_MIN(nb_pkts,
-						CI_TX_MAX_BURST);
-
-		ret = tx_xmit_pkts((struct ci_tx_queue *)tx_queue,
-						&tx_pkts[nb_tx], num);
-		nb_tx = (uint16_t)(nb_tx + ret);
-		nb_pkts = (uint16_t)(nb_pkts - ret);
-		if (ret < num)
-			break;
-	}
-
-	return nb_tx;
+	return ci_xmit_pkts_simple(tx_queue, tx_pkts, nb_pkts);
 }
 
 #ifndef RTE_ARCH_X86
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index c1477f3e87..eae57a08fc 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3245,84 +3245,12 @@ ice_tx_done_cleanup(void *txq, uint32_t free_cnt)
 		return ice_tx_done_cleanup_full(q, free_cnt);
 }
 
-static inline uint16_t
-tx_xmit_pkts(struct ci_tx_queue *txq,
-	     struct rte_mbuf **tx_pkts,
-	     uint16_t nb_pkts)
-{
-	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
-	uint16_t n = 0;
-
-	/**
-	 * Begin scanning the H/W ring for done descriptors when the number
-	 * of available descriptors drops below tx_free_thresh. For each done
-	 * descriptor, free the associated buffer.
-	 */
-	if (txq->nb_tx_free < txq->tx_free_thresh)
-		ci_tx_free_bufs(txq);
-
-	/* Use available descriptor only */
-	nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
-	if (unlikely(!nb_pkts))
-		return 0;
-
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
-	if ((txq->tx_tail + nb_pkts) > txq->nb_tx_desc) {
-		n = (uint16_t)(txq->nb_tx_desc - txq->tx_tail);
-		ci_tx_fill_hw_ring(txq, tx_pkts, n);
-		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
-		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
-		txq->tx_tail = 0;
-	}
-
-	/* Fill hardware descriptor ring with mbuf data */
-	ci_tx_fill_hw_ring(txq, tx_pkts + n, (uint16_t)(nb_pkts - n));
-	txq->tx_tail = (uint16_t)(txq->tx_tail + (nb_pkts - n));
-
-	/* Determine if RS bit needs to be set */
-	if (txq->tx_tail > txq->tx_next_rs) {
-		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
-		txq->tx_next_rs =
-			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
-		if (txq->tx_next_rs >= txq->nb_tx_desc)
-			txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
-	}
-
-	if (txq->tx_tail >= txq->nb_tx_desc)
-		txq->tx_tail = 0;
-
-	/* Update the tx tail register */
-	ICE_PCI_REG_WC_WRITE(txq->qtx_tail, txq->tx_tail);
-
-	return nb_pkts;
-}
-
 static uint16_t
 ice_xmit_pkts_simple(void *tx_queue,
 		     struct rte_mbuf **tx_pkts,
 		     uint16_t nb_pkts)
 {
-	uint16_t nb_tx = 0;
-
-	if (likely(nb_pkts <= CI_TX_MAX_BURST))
-		return tx_xmit_pkts((struct ci_tx_queue *)tx_queue,
-				    tx_pkts, nb_pkts);
-
-	while (nb_pkts) {
-		uint16_t ret, num = (uint16_t)RTE_MIN(nb_pkts,
-						      CI_TX_MAX_BURST);
-
-		ret = tx_xmit_pkts((struct ci_tx_queue *)tx_queue,
-				   &tx_pkts[nb_tx], num);
-		nb_tx = (uint16_t)(nb_tx + ret);
-		nb_pkts = (uint16_t)(nb_pkts - ret);
-		if (ret < num)
-			break;
-	}
-
-	return nb_tx;
+	return ci_xmit_pkts_simple(tx_queue, tx_pkts, nb_pkts);
 }
 
 static const struct ci_rx_path_info ice_rx_path_infos[] = {
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v3 32/36] net/intel: use non-volatile stores in simple Tx function
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
                     ` (30 preceding siblings ...)
  2026-01-30 11:41   ` [PATCH v3 31/36] net/intel: complete merging simple Tx paths Bruce Richardson
@ 2026-01-30 11:41   ` Bruce Richardson
  2026-01-30 11:42   ` [PATCH v3 33/36] net/intel: align scalar simple Tx path with vector logic Bruce Richardson
                     ` (3 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-30 11:41 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

The simple Tx code path can be reworked to use non-volatile stores - as
is the case with the full-featured Tx path - by reusing the existing
write_txd function (which just needs to be moved up in the header file).
This gives a small performance boost.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar_fns.h | 55 +++++++-----------------
 1 file changed, 16 insertions(+), 39 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index 5c4f09c197..579206f7ab 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -12,35 +12,13 @@
 /* depends on common Tx definitions. */
 #include "tx.h"
 
-/* Populate 4 descriptors with data from 4 mbufs */
 static inline void
-ci_tx_fill_hw_ring_tx4(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
+write_txd(volatile void *txd, uint64_t qw0, uint64_t qw1)
 {
-	uint64_t dma_addr;
-	uint32_t i;
-
-	for (i = 0; i < 4; i++, txdp++, pkts++) {
-		dma_addr = rte_mbuf_data_iova(*pkts);
-		txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
-		txdp->cmd_type_offset_bsz =
-			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
-				((uint64_t)CI_TX_DESC_CMD_DEFAULT << CI_TXD_QW1_CMD_S) |
-				((uint64_t)(*pkts)->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
-	}
-}
+	uint64_t *txd_qw =  __rte_assume_aligned(RTE_CAST_PTR(void *, txd), 16);
 
-/* Populate 1 descriptor with data from 1 mbuf */
-static inline void
-ci_tx_fill_hw_ring_tx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
-{
-	uint64_t dma_addr;
-
-	dma_addr = rte_mbuf_data_iova(*pkts);
-	txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
-	txdp->cmd_type_offset_bsz =
-		rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
-			((uint64_t)CI_TX_DESC_CMD_DEFAULT << CI_TXD_QW1_CMD_S) |
-			((uint64_t)(*pkts)->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
+	txd_qw[0] = rte_cpu_to_le_64(qw0);
+	txd_qw[1] = rte_cpu_to_le_64(qw1);
 }
 
 /* Fill hardware descriptor ring with mbuf data */
@@ -60,14 +38,22 @@ ci_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
 	for (i = 0; i < mainpart; i += N_PER_LOOP) {
 		for (j = 0; j < N_PER_LOOP; ++j)
 			(txep + i + j)->mbuf = *(pkts + i + j);
-		ci_tx_fill_hw_ring_tx4(txdp + i, pkts + i);
+		for (j = 0; j < N_PER_LOOP; ++j)
+			write_txd(txdp + i + j, rte_mbuf_data_iova(*(pkts + i + j)),
+				CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)CI_TX_DESC_CMD_DEFAULT << CI_TXD_QW1_CMD_S) |
+				((uint64_t)(*(pkts + i + j))->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 	}
 
 	if (unlikely(leftover > 0)) {
 		for (i = 0; i < leftover; ++i) {
-			(txep + mainpart + i)->mbuf = *(pkts + mainpart + i);
-			ci_tx_fill_hw_ring_tx1(txdp + mainpart + i,
-					       pkts + mainpart + i);
+			uint16_t idx = mainpart + i;
+			(txep + idx)->mbuf = *(pkts + idx);
+			write_txd(txdp + idx, rte_mbuf_data_iova(*(pkts + idx)),
+				CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)CI_TX_DESC_CMD_DEFAULT << CI_TXD_QW1_CMD_S) |
+				((uint64_t)(*(pkts + idx))->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
+
 		}
 	}
 }
@@ -365,15 +351,6 @@ struct ci_timesstamp_queue_fns {
 	write_ts_tail_t write_ts_tail;
 };
 
-static inline void
-write_txd(volatile void *txd, uint64_t qw0, uint64_t qw1)
-{
-	uint64_t *txd_qw = __rte_assume_aligned(RTE_CAST_PTR(void *, txd), 16);
-
-	txd_qw[0] = rte_cpu_to_le_64(qw0);
-	txd_qw[1] = rte_cpu_to_le_64(qw1);
-}
-
 static inline uint16_t
 ci_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v3 33/36] net/intel: align scalar simple Tx path with vector logic
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
                     ` (31 preceding siblings ...)
  2026-01-30 11:41   ` [PATCH v3 32/36] net/intel: use non-volatile stores in simple Tx function Bruce Richardson
@ 2026-01-30 11:42   ` Bruce Richardson
  2026-01-30 11:42   ` [PATCH v3 34/36] net/intel: use vector SW ring entry for simple path Bruce Richardson
                     ` (2 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-30 11:42 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

The scalar simple Tx path has the same restrictions as the vector Tx
path, so we can use the same logic flow in both, to try and ensure we
get best performance from the scalar path.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar_fns.h | 54 +++++++++++++++---------
 1 file changed, 34 insertions(+), 20 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index 579206f7ab..3f02fc00d6 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -21,13 +21,11 @@ write_txd(volatile void *txd, uint64_t qw0, uint64_t qw1)
 	txd_qw[1] = rte_cpu_to_le_64(qw1);
 }
 
-/* Fill hardware descriptor ring with mbuf data */
+/* Fill hardware descriptor ring with mbuf data (simple path) */
 static inline void
-ci_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
-		   uint16_t nb_pkts)
+ci_tx_fill_hw_ring_simple(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts,
+			  uint16_t nb_pkts)
 {
-	volatile struct ci_tx_desc *txdp = &txq->ci_tx_ring[txq->tx_tail];
-	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
 	const int N_PER_LOOP = 4;
 	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
 	int mainpart, leftover;
@@ -36,8 +34,6 @@ ci_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
 	mainpart = nb_pkts & ((uint32_t)~N_PER_LOOP_MASK);
 	leftover = nb_pkts & ((uint32_t)N_PER_LOOP_MASK);
 	for (i = 0; i < mainpart; i += N_PER_LOOP) {
-		for (j = 0; j < N_PER_LOOP; ++j)
-			(txep + i + j)->mbuf = *(pkts + i + j);
 		for (j = 0; j < N_PER_LOOP; ++j)
 			write_txd(txdp + i + j, rte_mbuf_data_iova(*(pkts + i + j)),
 				CI_TX_DESC_DTYPE_DATA |
@@ -48,12 +44,10 @@ ci_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
 	if (unlikely(leftover > 0)) {
 		for (i = 0; i < leftover; ++i) {
 			uint16_t idx = mainpart + i;
-			(txep + idx)->mbuf = *(pkts + idx);
 			write_txd(txdp + idx, rte_mbuf_data_iova(*(pkts + idx)),
 				CI_TX_DESC_DTYPE_DATA |
 				((uint64_t)CI_TX_DESC_CMD_DEFAULT << CI_TXD_QW1_CMD_S) |
 				((uint64_t)(*(pkts + idx))->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
-
 		}
 	}
 }
@@ -130,6 +124,9 @@ ci_xmit_burst_simple(struct ci_tx_queue *txq,
 		     uint16_t nb_pkts)
 {
 	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
+	volatile struct ci_tx_desc *txdp;
+	struct ci_tx_entry *txep;
+	uint16_t tx_id;
 	uint16_t n = 0;
 
 	/**
@@ -145,23 +142,41 @@ ci_xmit_burst_simple(struct ci_tx_queue *txq,
 	if (unlikely(!nb_pkts))
 		return 0;
 
+	tx_id = txq->tx_tail;
+	txdp = &txr[tx_id];
+	txep = &txq->sw_ring[tx_id];
+
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
-	if ((txq->tx_tail + nb_pkts) > txq->nb_tx_desc) {
-		n = (uint16_t)(txq->nb_tx_desc - txq->tx_tail);
-		ci_tx_fill_hw_ring(txq, tx_pkts, n);
+
+	if ((tx_id + nb_pkts) > txq->nb_tx_desc) {
+		n = (uint16_t)(txq->nb_tx_desc - tx_id);
+
+		/* Store mbufs in backlog */
+		ci_tx_backlog_entry(txep, tx_pkts, n);
+
+		/* Write descriptors to HW ring */
+		ci_tx_fill_hw_ring_simple(txdp, tx_pkts, n);
+
 		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) <<
 					  CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
-		txq->tx_tail = 0;
+
+		tx_id = 0;
+		txdp = &txr[tx_id];
+		txep = &txq->sw_ring[tx_id];
 	}
 
-	/* Fill hardware descriptor ring with mbuf data */
-	ci_tx_fill_hw_ring(txq, tx_pkts + n, (uint16_t)(nb_pkts - n));
-	txq->tx_tail = (uint16_t)(txq->tx_tail + (nb_pkts - n));
+	/* Store remaining mbufs in backlog */
+	ci_tx_backlog_entry(txep, tx_pkts + n, (uint16_t)(nb_pkts - n));
+
+	/* Write remaining descriptors to HW ring */
+	ci_tx_fill_hw_ring_simple(txdp, tx_pkts + n, (uint16_t)(nb_pkts - n));
+
+	tx_id = (uint16_t)(tx_id + (nb_pkts - n));
 
 	/* Determine if RS bit needs to be set */
-	if (txq->tx_tail > txq->tx_next_rs) {
+	if (tx_id > txq->tx_next_rs) {
 		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) <<
 					  CI_TXD_QW1_CMD_S);
@@ -171,11 +186,10 @@ ci_xmit_burst_simple(struct ci_tx_queue *txq,
 			txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 	}
 
-	if (txq->tx_tail >= txq->nb_tx_desc)
-		txq->tx_tail = 0;
+	txq->tx_tail = tx_id;
 
 	/* Update the tx tail register */
-	rte_write32_wc((uint32_t)txq->tx_tail, txq->qtx_tail);
+	rte_write32_wc((uint32_t)tx_id, txq->qtx_tail);
 
 	return nb_pkts;
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v3 34/36] net/intel: use vector SW ring entry for simple path
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
                     ` (32 preceding siblings ...)
  2026-01-30 11:42   ` [PATCH v3 33/36] net/intel: align scalar simple Tx path with vector logic Bruce Richardson
@ 2026-01-30 11:42   ` Bruce Richardson
  2026-01-30 11:42   ` [PATCH v3 35/36] net/intel: use vector mbuf cleanup from simple scalar path Bruce Richardson
  2026-01-30 11:42   ` [PATCH v3 36/36] net/idpf: enable simple Tx function Bruce Richardson
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-30 11:42 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Praveen Shetty, Vladimir Medvedkin,
	Anatoly Burakov, Jingjing Wu

The simple scalar Tx path does not need to use the full sw_entry
structure that the full Tx path uses, so rename the flag for "vector_tx"
to instead be "use_vec_entry" since its sole purpose is to flag the use
of the smaller tx_entry_vec structure. Then set this flag for the simple
Tx path, giving us a perf boost.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h                    |  6 ++++--
 drivers/net/intel/common/tx_scalar_fns.h         | 14 +++++++-------
 drivers/net/intel/cpfl/cpfl_rxtx.c               |  4 ++--
 drivers/net/intel/i40e/i40e_rxtx.c               |  2 +-
 drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c      |  2 +-
 drivers/net/intel/ice/ice_rxtx.c                 |  2 +-
 drivers/net/intel/idpf/idpf_common_rxtx_avx512.c |  2 +-
 drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c  |  2 +-
 8 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 3c388857a7..dc21a4c906 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -166,7 +166,7 @@ struct ci_tx_queue {
 	rte_iova_t tx_ring_dma;        /* TX ring DMA address */
 	bool tx_deferred_start; /* don't start this queue in dev start */
 	bool q_set;             /* indicate if tx queue has been configured */
-	bool vector_tx;         /* port is using vector TX */
+	bool use_vec_entry;     /* use sw_ring_vec (true for vector and simple paths) */
 	union {                  /* the VSI this queue belongs to */
 		struct i40e_vsi *i40e_vsi;
 		struct iavf_vsi *iavf_vsi;
@@ -354,7 +354,8 @@ ci_txq_release_all_mbufs(struct ci_tx_queue *txq, bool use_ctx)
 	if (unlikely(!txq || !txq->sw_ring))
 		return;
 
-	if (!txq->vector_tx) {
+	if (!txq->use_vec_entry) {
+		/* Regular scalar path uses sw_ring with ci_tx_entry */
 		for (uint16_t i = 0; i < txq->nb_tx_desc; i++) {
 			if (txq->sw_ring[i].mbuf != NULL) {
 				rte_pktmbuf_free_seg(txq->sw_ring[i].mbuf);
@@ -365,6 +366,7 @@ ci_txq_release_all_mbufs(struct ci_tx_queue *txq, bool use_ctx)
 	}
 
 	/**
+	 *  Vector and simple paths use sw_ring_vec (ci_tx_entry_vec).
 	 *  vPMD tx will not set sw_ring's mbuf to NULL after free,
 	 *  so determining buffers to free is a little more complex.
 	 */
diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index 3f02fc00d6..c8d370a921 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -60,14 +60,14 @@ ci_tx_free_bufs(struct ci_tx_queue *txq)
 	const uint16_t k = RTE_ALIGN_FLOOR(rs_thresh, CI_TX_MAX_FREE_BUF_SZ);
 	const uint16_t m = rs_thresh % CI_TX_MAX_FREE_BUF_SZ;
 	struct rte_mbuf *free[CI_TX_MAX_FREE_BUF_SZ];
-	struct ci_tx_entry *txep;
+	struct ci_tx_entry_vec *txep;
 
 	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
 			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
 			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
 		return 0;
 
-	txep = &txq->sw_ring[txq->tx_next_dd - (rs_thresh - 1)];
+	txep = &txq->sw_ring_vec[txq->tx_next_dd - (rs_thresh - 1)];
 
 	struct rte_mempool *fast_free_mp =
 			likely(txq->fast_free_mp != (void *)UINTPTR_MAX) ?
@@ -125,7 +125,7 @@ ci_xmit_burst_simple(struct ci_tx_queue *txq,
 {
 	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
 	volatile struct ci_tx_desc *txdp;
-	struct ci_tx_entry *txep;
+	struct ci_tx_entry_vec *txep;
 	uint16_t tx_id;
 	uint16_t n = 0;
 
@@ -144,7 +144,7 @@ ci_xmit_burst_simple(struct ci_tx_queue *txq,
 
 	tx_id = txq->tx_tail;
 	txdp = &txr[tx_id];
-	txep = &txq->sw_ring[tx_id];
+	txep = &txq->sw_ring_vec[tx_id];
 
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
 
@@ -152,7 +152,7 @@ ci_xmit_burst_simple(struct ci_tx_queue *txq,
 		n = (uint16_t)(txq->nb_tx_desc - tx_id);
 
 		/* Store mbufs in backlog */
-		ci_tx_backlog_entry(txep, tx_pkts, n);
+		ci_tx_backlog_entry_vec(txep, tx_pkts, n);
 
 		/* Write descriptors to HW ring */
 		ci_tx_fill_hw_ring_simple(txdp, tx_pkts, n);
@@ -164,11 +164,11 @@ ci_xmit_burst_simple(struct ci_tx_queue *txq,
 
 		tx_id = 0;
 		txdp = &txr[tx_id];
-		txep = &txq->sw_ring[tx_id];
+		txep = &txq->sw_ring_vec[tx_id];
 	}
 
 	/* Store remaining mbufs in backlog */
-	ci_tx_backlog_entry(txep, tx_pkts + n, (uint16_t)(nb_pkts - n));
+	ci_tx_backlog_entry_vec(txep, tx_pkts + n, (uint16_t)(nb_pkts - n));
 
 	/* Write remaining descriptors to HW ring */
 	ci_tx_fill_hw_ring_simple(txdp, tx_pkts + n, (uint16_t)(nb_pkts - n));
diff --git a/drivers/net/intel/cpfl/cpfl_rxtx.c b/drivers/net/intel/cpfl/cpfl_rxtx.c
index e7a98ed4f6..b5b9015310 100644
--- a/drivers/net/intel/cpfl/cpfl_rxtx.c
+++ b/drivers/net/intel/cpfl/cpfl_rxtx.c
@@ -329,7 +329,7 @@ cpfl_tx_queue_release(void *txq)
 		rte_free(q->complq);
 	}
 
-	ci_txq_release_all_mbufs(q, q->vector_tx);
+	ci_txq_release_all_mbufs(q, q->use_vec_entry);
 	rte_free(q->sw_ring);
 	rte_free(q->rs_last_id);
 	rte_memzone_free(q->mz);
@@ -1364,7 +1364,7 @@ cpfl_tx_queue_stop(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 	}
 
 	txq = &cpfl_txq->base;
-	ci_txq_release_all_mbufs(txq, txq->vector_tx);
+	ci_txq_release_all_mbufs(txq, txq->use_vec_entry);
 	if (vport->txq_model == VIRTCHNL2_QUEUE_MODEL_SINGLE) {
 		idpf_qc_single_tx_queue_reset(txq);
 	} else {
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index b286e89b1b..ba63d42b85 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -1451,7 +1451,7 @@ i40e_dev_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 		PMD_DRV_LOG(WARNING, "TX queue %u is deferred start",
 			    tx_queue_id);
 
-	txq->vector_tx = ad->tx_vec_allowed;
+	txq->use_vec_entry = ad->tx_vec_allowed || ad->tx_simple_allowed;
 
 	/*
 	 * tx_queue_id is queue id application refers to, while
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
index cea4ee9863..374c713a94 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
@@ -1803,7 +1803,7 @@ iavf_xmit_pkts_vec_avx2_offload(void *tx_queue, struct rte_mbuf **tx_pkts,
 int __rte_cold
 iavf_txq_vec_setup(struct ci_tx_queue *txq)
 {
-	txq->vector_tx = true;
+	txq->use_vec_entry = true;
 	return 0;
 }
 
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index eae57a08fc..94951369fb 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -882,7 +882,7 @@ ice_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 		}
 
 	/* record what kind of descriptor cleanup we need on teardown */
-	txq->vector_tx = ad->tx_vec_allowed;
+	txq->use_vec_entry = ad->tx_vec_allowed || ad->tx_simple_allowed;
 
 	if (txq->tsq != NULL && txq->tsq->ts_flag > 0) {
 		struct ice_aqc_set_txtime_qgrp *ts_elem;
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
index 49ace35615..666ad1a4dd 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
@@ -1365,6 +1365,6 @@ idpf_qc_tx_vec_avx512_setup(struct ci_tx_queue *txq)
 	if (!txq)
 		return 0;
 
-	txq->vector_tx = true;
+	txq->use_vec_entry = true;
 	return 0;
 }
diff --git a/drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c b/drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c
index 63c7cb50d3..c42b8fc96b 100644
--- a/drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c
+++ b/drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c
@@ -111,7 +111,7 @@ ixgbe_txq_vec_setup(struct ci_tx_queue *txq)
 	/* leave the first one for overflow */
 	txq->sw_ring_vec = txq->sw_ring_vec + 1;
 	txq->ops = &vec_txq_ops;
-	txq->vector_tx = 1;
+	txq->use_vec_entry = true;
 
 	return 0;
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v3 35/36] net/intel: use vector mbuf cleanup from simple scalar path
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
                     ` (33 preceding siblings ...)
  2026-01-30 11:42   ` [PATCH v3 34/36] net/intel: use vector SW ring entry for simple path Bruce Richardson
@ 2026-01-30 11:42   ` Bruce Richardson
  2026-01-30 11:42   ` [PATCH v3 36/36] net/idpf: enable simple Tx function Bruce Richardson
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-01-30 11:42 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

Since the simple scalar path now uses the vector Tx entry struct, we can
leverage the vector mbuf cleanup function from that path and avoid
having a separate cleanup function for it.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar_fns.h | 74 +++++-------------------
 drivers/net/intel/i40e/i40e_rxtx.c       |  2 +-
 drivers/net/intel/ice/ice_rxtx.c         |  2 +-
 3 files changed, 17 insertions(+), 61 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar_fns.h b/drivers/net/intel/common/tx_scalar_fns.h
index c8d370a921..27a07cf9e9 100644
--- a/drivers/net/intel/common/tx_scalar_fns.h
+++ b/drivers/net/intel/common/tx_scalar_fns.h
@@ -21,6 +21,20 @@ write_txd(volatile void *txd, uint64_t qw0, uint64_t qw1)
 	txd_qw[1] = rte_cpu_to_le_64(qw1);
 }
 
+static __rte_always_inline int
+ci_tx_desc_done_simple(struct ci_tx_queue *txq, uint16_t idx)
+{
+	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz & rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
+}
+
+/* Free transmitted mbufs using vector-style cleanup */
+static __rte_always_inline int
+ci_tx_free_bufs_simple(struct ci_tx_queue *txq)
+{
+	return ci_tx_free_bufs_vec(txq, ci_tx_desc_done_simple, false);
+}
+
 /* Fill hardware descriptor ring with mbuf data (simple path) */
 static inline void
 ci_tx_fill_hw_ring_simple(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts,
@@ -52,64 +66,6 @@ ci_tx_fill_hw_ring_simple(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pk
 	}
 }
 
-/* Free transmitted mbufs from descriptor ring with bulk freeing for Tx simple path */
-static __rte_always_inline int
-ci_tx_free_bufs(struct ci_tx_queue *txq)
-{
-	const uint16_t rs_thresh = txq->tx_rs_thresh;
-	const uint16_t k = RTE_ALIGN_FLOOR(rs_thresh, CI_TX_MAX_FREE_BUF_SZ);
-	const uint16_t m = rs_thresh % CI_TX_MAX_FREE_BUF_SZ;
-	struct rte_mbuf *free[CI_TX_MAX_FREE_BUF_SZ];
-	struct ci_tx_entry_vec *txep;
-
-	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
-			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
-		return 0;
-
-	txep = &txq->sw_ring_vec[txq->tx_next_dd - (rs_thresh - 1)];
-
-	struct rte_mempool *fast_free_mp =
-			likely(txq->fast_free_mp != (void *)UINTPTR_MAX) ?
-				txq->fast_free_mp :
-				(txq->fast_free_mp = txep[0].mbuf->pool);
-
-	if (fast_free_mp) {
-		if (k) {
-			for (uint16_t j = 0; j != k; j += CI_TX_MAX_FREE_BUF_SZ) {
-				for (uint16_t i = 0; i < CI_TX_MAX_FREE_BUF_SZ; ++i, ++txep) {
-					free[i] = txep->mbuf;
-					txep->mbuf = NULL;
-				}
-				rte_mbuf_raw_free_bulk(fast_free_mp, free, CI_TX_MAX_FREE_BUF_SZ);
-			}
-		}
-
-		if (m) {
-			for (uint16_t i = 0; i < m; ++i, ++txep) {
-				free[i] = txep->mbuf;
-				txep->mbuf = NULL;
-			}
-			rte_mbuf_raw_free_bulk(fast_free_mp, free, m);
-		}
-	} else {
-		for (uint16_t i = 0; i < rs_thresh; ++i, ++txep)
-			rte_prefetch0((txep + i)->mbuf);
-
-		for (uint16_t i = 0; i < rs_thresh; ++i, ++txep) {
-			rte_pktmbuf_free_seg(txep->mbuf);
-			txep->mbuf = NULL;
-		}
-	}
-
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + rs_thresh);
-	txq->tx_next_dd = (uint16_t)(txq->tx_next_dd + rs_thresh);
-	if (txq->tx_next_dd >= txq->nb_tx_desc)
-		txq->tx_next_dd = (uint16_t)(rs_thresh - 1);
-
-	return rs_thresh;
-}
-
 /* Simple burst transmit for descriptor-based simple Tx path
  *
  * Transmits a burst of packets by filling hardware descriptors with mbuf
@@ -135,7 +91,7 @@ ci_xmit_burst_simple(struct ci_tx_queue *txq,
 	 * descriptor, free the associated buffer.
 	 */
 	if (txq->nb_tx_free < txq->tx_free_thresh)
-		ci_tx_free_bufs(txq);
+		ci_tx_free_bufs_simple(txq);
 
 	/* Use available descriptor only */
 	nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index ba63d42b85..6ea6ffbb2f 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -2377,7 +2377,7 @@ i40e_tx_done_cleanup_simple(struct ci_tx_queue *txq,
 		if (txq->nb_tx_desc - txq->nb_tx_free < txq->tx_rs_thresh)
 			break;
 
-		n = ci_tx_free_bufs(txq);
+		n = ci_tx_free_bufs_simple(txq);
 
 		if (n == 0)
 			break;
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 94951369fb..ece6ef6e2d 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3218,7 +3218,7 @@ ice_tx_done_cleanup_simple(struct ci_tx_queue *txq,
 		if (txq->nb_tx_desc - txq->nb_tx_free < txq->tx_rs_thresh)
 			break;
 
-		n = ci_tx_free_bufs(txq);
+		n = ci_tx_free_bufs_simple(txq);
 
 		if (n == 0)
 			break;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v3 36/36] net/idpf: enable simple Tx function
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
                     ` (34 preceding siblings ...)
  2026-01-30 11:42   ` [PATCH v3 35/36] net/intel: use vector mbuf cleanup from simple scalar path Bruce Richardson
@ 2026-01-30 11:42   ` Bruce Richardson
  2026-01-30 17:56     ` [REVIEW] " Stephen Hemminger
  35 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-01-30 11:42 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Jingjing Wu, Praveen Shetty

The common "simple Tx" function - in some ways a scalar version of the
vector Tx functions - can be used by the idpf driver as well as i40e and
ice, so add support for it to the driver.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/idpf/idpf_common_device.h |  2 ++
 drivers/net/intel/idpf/idpf_common_rxtx.c   | 19 +++++++++++++++
 drivers/net/intel/idpf/idpf_common_rxtx.h   |  3 +++
 drivers/net/intel/idpf/idpf_rxtx.c          | 26 ++++++++++++++++++++-
 4 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/drivers/net/intel/idpf/idpf_common_device.h b/drivers/net/intel/idpf/idpf_common_device.h
index 31915a03d4..527aa9b3dc 100644
--- a/drivers/net/intel/idpf/idpf_common_device.h
+++ b/drivers/net/intel/idpf/idpf_common_device.h
@@ -78,6 +78,7 @@ enum idpf_rx_func_type {
 enum idpf_tx_func_type {
 	IDPF_TX_DEFAULT,
 	IDPF_TX_SINGLEQ,
+	IDPF_TX_SINGLEQ_SIMPLE,
 	IDPF_TX_SINGLEQ_AVX2,
 	IDPF_TX_AVX512,
 	IDPF_TX_SINGLEQ_AVX512,
@@ -100,6 +101,7 @@ struct idpf_adapter {
 
 	bool is_tx_singleq; /* true - single queue model, false - split queue model */
 	bool is_rx_singleq; /* true - single queue model, false - split queue model */
+	bool tx_simple_allowed; /* true if all queues support simple TX */
 
 	/* For timestamp */
 	uint64_t time_hw;
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index bd77113551..0da2506bf0 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -1347,6 +1347,15 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			idpf_set_tso_ctx, NULL, NULL);
 }
 
+RTE_EXPORT_INTERNAL_SYMBOL(idpf_dp_singleq_xmit_pkts_simple)
+uint16_t
+idpf_dp_singleq_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
+				   uint16_t nb_pkts)
+{
+	return ci_xmit_pkts_simple(tx_queue, tx_pkts, nb_pkts);
+}
+
+
 /* TX prep functions */
 RTE_EXPORT_INTERNAL_SYMBOL(idpf_dp_prep_pkts)
 uint16_t
@@ -1532,6 +1541,16 @@ const struct ci_tx_path_info idpf_tx_path_infos[] = {
 			.single_queue = true
 		}
 	},
+	[IDPF_TX_SINGLEQ_SIMPLE] = {
+		.pkt_burst = idpf_dp_singleq_xmit_pkts_simple,
+		.info = "Single Queue Scalar Simple",
+		.features = {
+			.tx_offloads = IDPF_TX_VECTOR_OFFLOADS,
+			.single_queue = true,
+			.simple_tx = true,
+		}
+	},
+
 #ifdef RTE_ARCH_X86
 	[IDPF_TX_SINGLEQ_AVX2] = {
 		.pkt_burst = idpf_dp_singleq_xmit_pkts_avx2,
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.h b/drivers/net/intel/idpf/idpf_common_rxtx.h
index fe7094d434..914cab0f25 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.h
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.h
@@ -221,6 +221,9 @@ __rte_internal
 uint16_t idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 				   uint16_t nb_pkts);
 __rte_internal
+uint16_t idpf_dp_singleq_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
+				   uint16_t nb_pkts);
+__rte_internal
 uint16_t idpf_dp_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			   uint16_t nb_pkts);
 __rte_internal
diff --git a/drivers/net/intel/idpf/idpf_rxtx.c b/drivers/net/intel/idpf/idpf_rxtx.c
index 9420200f6d..f2e202d57d 100644
--- a/drivers/net/intel/idpf/idpf_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_rxtx.c
@@ -509,6 +509,22 @@ idpf_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 	txq->q_set = true;
 	dev->data->tx_queues[queue_idx] = txq;
 
+	/* Set tx_simple_allowed flag based on queue configuration.
+	 * For queue 0: explicitly set the flag based on its configuration.
+	 * For other queues: only set to false if this queue cannot use simple_tx.
+	 */
+	if (vport->txq_model == VIRTCHNL2_QUEUE_MODEL_SPLIT)
+		goto out;
+
+	/* for first queue, default to true, disable later if any queue can't meet conditions */
+	if (queue_idx == 0)
+		adapter->tx_simple_allowed = true;
+
+	if ((txq->offloads != (txq->offloads & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE)) ||
+			txq->tx_rs_thresh < IDPF_VPMD_TX_MAX_BURST)
+		adapter->tx_simple_allowed = false;
+
+out:
 	return 0;
 
 err_complq_setup:
@@ -651,6 +667,7 @@ int
 idpf_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 {
 	struct idpf_vport *vport = dev->data->dev_private;
+	struct idpf_adapter *ad = vport->adapter;
 	struct ci_tx_queue *txq = dev->data->tx_queues[tx_queue_id];
 	int err = 0;
 
@@ -667,6 +684,12 @@ idpf_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 		return err;
 	}
 
+	/* Record what kind of descriptor cleanup we need on teardown.
+	 * For single queue mode, vector or simple tx paths use vec entry format.
+	 */
+	if (vport->txq_model == VIRTCHNL2_QUEUE_MODEL_SINGLE)
+		txq->use_vec_entry = ad->tx_simple_allowed;
+
 	/* Ready to switch the queue on */
 	err = idpf_vc_queue_switch(vport, tx_queue_id, false, true,
 							VIRTCHNL2_QUEUE_TYPE_TX);
@@ -847,7 +870,8 @@ idpf_set_tx_function(struct rte_eth_dev *dev)
 	struct ci_tx_path_features req_features = {
 		.tx_offloads = dev->data->dev_conf.txmode.offloads,
 		.simd_width = RTE_VECT_SIMD_DISABLED,
-		.single_queue = (vport->txq_model == VIRTCHNL2_QUEUE_MODEL_SINGLE)
+		.single_queue = (vport->txq_model == VIRTCHNL2_QUEUE_MODEL_SINGLE),
+		.simple_tx = ad->tx_simple_allowed
 	};
 
 	/* The primary process selects the tx path for all processes. */
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [REVIEW] net/idpf: enable simple Tx function
  2026-01-30 11:42   ` [PATCH v3 36/36] net/idpf: enable simple Tx function Bruce Richardson
@ 2026-01-30 17:56     ` Stephen Hemminger
  0 siblings, 0 replies; 274+ messages in thread
From: Stephen Hemminger @ 2026-01-30 17:56 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

AI-generated review of bundle-1715-intel-scalar.mbox
Reviewed using Claude (claude-opus-4-5-20251101) on 2026-01-30

This is an automated review. Please verify all suggestions.

---



=============================================================== Patch 1/36 ===

## Patch Review: [PATCH v3 01/36] net/intel: create common Tx descriptor structure

### Overall Assessment
This patch creates a common Tx descriptor structure (`struct ci_tx_desc`) shared across i40e, iavf, ice, and idpf drivers. The change is well-structured and follows DPDK conventions.

---

### Errors

None identified.

---

### Warnings

1. **Missing release notes for internal API change**
   
   While this is an internal structure change (not public API), the patch introduces a new common structure (`struct ci_tx_desc`) that affects multiple drivers. Consider documenting this refactoring in the release notes under a "drivers/net/intel" section, especially since this is patch 1 of a 36-patch series that likely represents significant internal restructuring.

---

### Info

1. **Consistent naming convention**
   
   The `ci_` prefix (common intel) is used consistently throughout the change, matching the existing `ci_tx_queue` and `ci_tx_entry` patterns. This is good practice.

2. **Field naming**
   
   The new structure uses `buffer_addr` and `cmd_type_offset_bsz` which matches the ice/i40e naming. The idpf driver previously used `buf_addr` and `qw1` - the patch correctly updates all idpf references to use the new field names.

3. **Base code preservation**
   
   The commit message correctly notes that original struct definitions in base code are left in place. This is appropriate for vendor-shared base code.

4. **Line length in idpf_common_rxtx.c**
   
   Line 1494-1495 creates a long line after the field rename:
   ```c
   txd->cmd_type_offset_bsz = rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DATA |
   ```
   This appears to remain within the 100-character limit but is close. The existing style is maintained which is acceptable.

---

### Summary

This is a clean refactoring patch that consolidates identical descriptor structures across Intel drivers. The implementation is correct and follows existing code patterns. The only suggestion is to consider adding release notes documentation for this driver restructuring work.

=== Patch 2/36 ===

## Review of DPDK Patch 2/36: "net/intel: use common Tx ring structure"

### Summary
This patch consolidates multiple per-driver TX ring pointers (`i40e_tx_ring`, `iavf_tx_ring`, `ice_tx_ring`, `idpf_tx_ring`) into a single `ci_tx_ring` pointer, since they all use the same `struct ci_tx_desc` type.

---

### Errors

None identified.

---

### Warnings

1. **DMA zone name change may affect ABI/debugging**
   
   The patch changes DMA zone names from driver-specific names to a generic name:
   ```c
   -	tz = rte_eth_dma_zone_reserve(dev, "i40e_tx_ring", queue_idx,
   +	tz = rte_eth_dma_zone_reserve(dev, "ci_tx_ring", queue_idx,
   ```
   This change affects i40e, iavf, and ice drivers. While not strictly an error, this could impact debugging tools or scripts that look for driver-specific zone names. Consider documenting this change in the commit message.

---

### Info

1. **Clean refactoring** - The consolidation from four separate union members to one is a sensible simplification given the common descriptor type introduced in the previous patch.

2. **Consistent naming** - All references are consistently updated across all affected files (23 files modified).

3. **Variable rename in ice_xmit_pkts** - The local variable rename from `ice_tx_ring` to `ci_tx_ring` (lines 3151, 3174) maintains consistency with the structure member name change.

---

### Overall Assessment

This is a straightforward mechanical refactoring patch. The changes are consistent across all affected drivers and vector implementations. The commit message adequately explains the rationale for the change.

=== Patch 3/36 ===

## Review of DPDK Patch 3/36: net/intel: create common post-Tx cleanup function

### Summary
This patch deduplicates the TX descriptor cleanup function across ice, iavf, idpf, and i40e drivers by moving it to a common header file.

---

### Errors

None found.

---

### Warnings

1. **Loss of debug logging in common function**
   
   The original driver-specific functions included debug logging when descriptors were not yet processed:
   ```c
   PMD_TX_LOG(DEBUG, "TX descriptor %4u is not done "
              "(port=%d queue=%d)", desc_to_clean_to,
              txq->port_id, txq->queue_id);
   ```
   
   The new common function silently returns -1 without any debug output. This could make debugging hardware issues more difficult. Consider adding a generic debug log mechanism or documenting why debug logging was intentionally removed.

2. **Comment accuracy regarding descriptor done value**
   
   Line 287 in tx.h states:
   ```c
   /* Check if descriptor is done - all drivers use 0xF as done value in bits 3:0 */
   ```
   
   The original drivers used different masks:
   - i40e: `I40E_TXD_QW1_DTYPE_MASK` / `I40E_TX_DESC_DTYPE_DESC_DONE`
   - iavf: `IAVF_TXD_QW1_DTYPE_MASK` / `IAVF_TX_DESC_DTYPE_DESC_DONE`
   - ice: `ICE_TX_DESC_DTYPE_DESC_DONE`
   - idpf: `IDPF_TXD_QW1_DTYPE_M` / `IDPF_TX_DESC_DTYPE_DESC_DONE`
   
   While the hardcoded `0xF` value may be correct for all these drivers, the comment should reference the actual descriptor type definitions or verify that `0xF` is indeed the universal done value across all supported hardware.

---

### Info

1. **Code deduplication is well-executed**
   
   The patch successfully removes ~160 lines of duplicated code across four drivers while maintaining identical functionality.

2. **Function placement in header file**
   
   Using `static __rte_always_inline` in the common header is appropriate for this performance-critical path, avoiding function call overhead.

3. **Consistent style**
   
   The new common function follows DPDK coding conventions with proper comment formatting and explicit comparisons.

=== Patch 4/36 ===

## Code Review for Patch 4/36: net/intel: consolidate definitions for Tx desc fields

### Overview
This patch consolidates TX descriptor field definitions across i40e, iavf, ice, and idpf drivers into a common `tx.h` header, eliminating duplicate definitions.

### Errors
None found.

### Warnings

1. **Missing Doxygen documentation for new public macros and union**
   - The new `CI_TXD_QW1_*`, `CI_TX_DESC_*`, and `CI_TX_CTX_DESC_*` macros in `drivers/net/intel/common/tx.h` lack documentation comments explaining their purpose and valid values.
   - The `union ci_tx_offload` has brief field comments but could benefit from a more complete description of usage context.

2. **Inconsistent use of `IAVF_TX_DESC_CMD_IL2TAG1` vs `CI_TX_DESC_CMD_IL2TAG1`**
   - In `drivers/net/intel/iavf/iavf_rxtx_vec_common.h` line 197-198, the code still uses `IAVF_TX_DESC_CMD_IL2TAG1` while other similar usages in the same file were converted to `CI_TX_DESC_CMD_IL2TAG1`.

   ```c
   /* Line 197-198 still uses old define */
   td_cmd |= IAVF_TX_DESC_CMD_IL2TAG1;
   ```

3. **Inconsistent macro usage in `iavf_rxtx_vec_avx512.c`**
   - Lines 2127, 2130, 2139, 2147, 2150 mix `CI_TXD_QW1_CMD_S` with `IAVF_TXD_CTX_QW1_CMD_SHIFT` in the same expressions. While this may be intentional (context descriptor vs data descriptor), the inconsistency could be confusing.

4. **Unused old macros remain in headers**
   - The commit message states "Original definitions are in base code, and are left in place because of that, but are unused." Consider adding a comment in the code explaining this to prevent future cleanup attempts that could break base code compatibility.

### Info

1. **Code style improvement**
   - The patch improves code formatting in several places by reducing line length and improving readability of multi-line expressions (e.g., in `ice_rxtx_vec_avx512.c` and `iavf_rxtx_vec_avx512.c`).

2. **Comment update in ice_rxtx_vec_common.h**
   - Line 166 changes comment from "Tx VLAN insertion Offload" to "Tx VLAN/QINQ insertion Offload" - this is a documentation improvement that accurately reflects the code's behavior.

3. **Naming convention**
   - The `CI_` prefix is consistent with the common Intel driver namespace established in earlier patches of this series.

4. **Macro suffix conventions**
   - The new macros use `_S` for shift values and `_M` for masks, which is consistent with existing DPDK Intel driver conventions.

### Summary
The patch successfully consolidates duplicate TX descriptor definitions into a common location. The main concern is the incomplete conversion in `iavf_rxtx_vec_common.h` where `IAVF_TX_DESC_CMD_IL2TAG1` is still used instead of the new `CI_TX_DESC_CMD_IL2TAG1`. This should be verified as intentional or corrected for consistency.

=== Patch 5/36 ===

## Review of Patch 5/36: net/intel: create separate header for Tx scalar fns

### Errors

1. **Copyright year in future**: The new file `tx_scalar_fns.h` has copyright year 2025, but the current date is 2026. This should be corrected to 2026.

### Warnings

1. **Circular include dependency**: The new file `tx_scalar_fns.h` includes `tx.h` at line 13, but `tx.h` includes `tx_scalar_fns.h` at line 443. While header guards prevent infinite recursion, this creates a confusing circular dependency. Consider restructuring so that `tx_scalar_fns.h` is truly standalone or moving shared definitions to a separate base header.

2. **Comment style inconsistency**: In `tx.h` line 440-442, the multi-line comment uses lowercase start after `/*`:
   ```c
   /* include the scalar functions at the end, so they can use the common definitions.
    * This is done so drivers can use all functions just by including tx.h
    */
   ```
   Per DPDK style, multi-line comments that are "VERY important" or explanatory should start with a capital letter and be complete sentences.

### Info

1. The patch cleanly moves code without modification, which is good for reviewability. The `ci_tx_xmit_cleanup` function appears identical in both the removed and added sections.

2. The approach of including the scalar functions header at the end of `tx.h` is a reasonable way to maintain a single include point for drivers while organizing the code into separate files.

### Summary

The patch is a straightforward code reorganization moving the `ci_tx_xmit_cleanup` function to a new header file. The main issue is the copyright year which should be 2026. The circular include pattern, while functional, could be improved architecturally but is not a blocking issue.

=== Patch 6/36 ===

## Code Review: Patch 6/36 - net/intel: add common fn to calculate needed descriptors

### Overall Assessment
This patch consolidates duplicate descriptor calculation logic into common code and fixes a bug in the idpf driver. The implementation is clean and well-structured.

---

### Errors
None.

---

### Warnings

1. **Extra blank line at end of function block**
   
   In `drivers/net/intel/common/tx_scalar_fns.h`, there are two blank lines before `#endif`:
   ```c
   	return count;
   }
   
   
   #endif /* _COMMON_INTEL_TX_SCALAR_FNS_H_ */
   ```
   
   **Suggested fix**: Remove one blank line to have just a single blank line before the `#endif`.

2. **Function parameter could use const qualifier**
   
   The original driver functions used `struct rte_mbuf *tx_pkt` (non-const), but the new common function correctly uses `const struct rte_mbuf *tx_pkt`. However, this means the callers are now passing non-const pointers to a const parameter, which is fine, but worth noting the improvement in const-correctness.

3. **Missing release notes for bug fix**
   
   The commit message mentions "fix idpf driver calculation for the TSO case" - this is a bug fix that affects the idpf driver's TSO functionality. Bug fixes that affect driver behavior should typically have release notes.
   
   **Suggested fix**: Consider adding a note to the release notes about the idpf TSO descriptor calculation fix.

---

### Info

1. **Good consolidation of duplicate code**
   
   The patch removes identical `i40e_calc_pkt_desc`, `iavf_calc_pkt_desc`, and `ice_calc_pkt_desc` functions and replaces them with a single `ci_calc_pkt_desc` function. This is good code hygiene.

2. **Useful helper function added**
   
   The `ci_div_roundup16` function is a reasonable addition for 16-bit division with rounding up, avoiding potential issues with the generic `DIV_ROUND_UP` macro.

3. **Bug fix for idpf is a good catch**
   
   The idpf driver was previously not accounting for TSO packets potentially needing more descriptors than segments. The fix aligns it with the other Intel drivers.

---

### Summary

The patch is well-constructed and achieves its goal of consolidating common code while fixing a bug. The only items to address are the extra blank line in the header file and potentially adding release notes for the idpf TSO fix.

=== Patch 7/36 ===

## Review of Patch 7/36: net/ice: refactor context descriptor handling

### Overall Assessment
This patch refactors the context descriptor handling in the ice driver by consolidating the logic into a single `get_context_desc()` function. The refactoring is generally well done, but there are some issues to address.

---

### Errors

**1. Extra blank line in function**
Line 3092-3093 has two consecutive blank lines inside `get_context_desc()`, which is inconsistent with DPDK style.

```c
			((ptp_tx_index << ICE_TXD_CTX_QW1_TSYN_S) & ICE_TXD_CTX_QW1_TSYN_M);


	/* TX context descriptor based double VLAN insert */
```

**Suggested fix:** Remove one of the blank lines.

---

### Warnings

**1. Blank line removal at start of function changes code unnecessarily**
At line 2969, there's now a blank line at the start of `ice_txd_enable_checksum()` after the code removal. While not strictly wrong, a function should not start with a blank line before any statements.

```c
{
-	/* Set MACLEN */
-	if (!(ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK))
-		*td_offset |= (tx_offload.l2_len >> 1)
-			<< CI_TX_DESC_LEN_MACLEN_S;

	/* Enable L3 checksum offloads */
```

**Suggested fix:** Remove the extra blank line that remains after deleting the MACLEN code block.

**2. Function parameter alignment inconsistency**
In `get_context_desc()` definition (line 3082-3084), the continuation lines use single tab indentation rather than aligning with the opening parenthesis or using double indentation for continuation:

```c
static __rte_always_inline uint16_t
get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
	const union ci_tx_offload *tx_offload, const struct ci_tx_queue *txq,
	uint64_t *qw0, uint64_t *qw1)
```

**Suggested fix:** Use double-tab indentation for continuation lines in function parameters for consistency.

**3. Variable initialization**
Line 3085 initializes `cd_l2tag2` to 0, but it's only written when `RTE_MBUF_F_TX_QINQ` is set. The early return at line 3091 means this initialization is necessary for correctness, so this is acceptable. However, `cd_type_cmd_tso_mss` could be left uninitialized until after the early return check for better compiler analysis.

---

### Info

**1. Casting approach**
Line 3223 uses `RTE_CAST_PTR` macro:
```c
uint64_t *ctx_txd = RTE_CAST_PTR(uint64_t *, &ci_tx_ring[tx_id]);
```
This is cleaner than the previous volatile struct pointer cast. The change from using struct field names (`tunneling_params`, `l2tag2`, `qw1`) to raw `uint64_t` array access (`ctx_txd[0]`, `ctx_txd[1]`) is less self-documenting but acceptable for performance-critical code.

**2. Function return type**
The function returns `uint16_t` but only ever returns 0 or 1. A `bool` or `int` might be more semantically appropriate, though `uint16_t` matches `nb_ctx` usage which is fine.

---

### Summary

The refactoring consolidates context descriptor logic well and should improve maintainability. The main issues are:
- Double blank line inside `get_context_desc()`
- Residual blank line in `ice_txd_enable_checksum()`

=== Patch 8/36 ===

## Review of DPDK Patch 8/36: net/i40e: refactor context descriptor handling

### Errors

1. **Commit message body starts with lowercase**
   - The body begins with "move all context descriptor handling..." which should be capitalized: "Move all context descriptor handling..."

### Warnings

1. **Subject line capitalization**
   - Subject uses lowercase "refactor" which is acceptable, but the body starting with lowercase "move" is inconsistent with standard English sentence structure.

2. **Missing explicit NULL comparison**
   - Line 1040: `if (i40e_calc_context_desc(ol_flags) == 0)` - This is correct and explicit.
   - However, note that the function returns `uint16_t`, so comparing with 0 is appropriate.

3. **Declaration in middle of block**
   - Lines 1125-1127: Variables `cd_qw0` and `cd_qw1` are declared in the middle of the function body, after statements. While C99 allows this, it's mixed with the existing declaration-at-top style in this function. Consider moving these declarations to the top of the `for` loop block or function for consistency.

   ```c
   /* Calculate the number of context descriptors needed. */
   uint64_t cd_qw0 = 0, cd_qw1 = 0;
   nb_ctx = get_context_desc(ol_flags, tx_pkt, &tx_offload, txq,
           &cd_qw0, &cd_qw1);
   ```

4. **Unused parameter annotation style**
   - Line 1034: The `txq` parameter is marked `__rte_unused`. This is acceptable, but consider whether this parameter will be needed in future development given this is a refactoring patch meant to align with the ice driver.

### Info

1. **Code consolidation is well-structured**
   - The new `get_context_desc()` function provides a clean abstraction matching the ice driver pattern as mentioned in the commit message.

2. **MACLEN handling moved appropriately**
   - The MACLEN calculation was removed from `i40e_txd_enable_checksum()` and moved to the main transmit loop where `td_offset` is initialized. This correctly handles both tunneled and non-tunneled cases in a single location.

3. **RTE_CAST_PTR usage**
   - Line 1180: `RTE_CAST_PTR(uint64_t *, &txr[tx_id])` - Good use of the DPDK cast macro.

4. **Debug logging simplified**
   - The PMD_TX_LOG message was simplified from printing individual context descriptor fields to printing the raw qw0/qw1 values. This is a reasonable simplification for debugging purposes.

### Summary

The patch is a reasonable refactoring that consolidates context descriptor handling into a single function. The main issue is the lowercase start of the commit message body. The code changes are well-structured and follow the existing patterns in the codebase.

=== Patch 9/36 ===

## Review of DPDK Patch: net/idpf: refactor context descriptor handling

### Errors

1. **Subject line starts with lowercase after prefix**
   - Subject: "move all context descriptor handling to a single function, as with the ice driver."
   - The commit body starts with lowercase "move" which is acceptable, but this is the first line of the body and should not start with a lowercase word in this context. The sentence should be capitalized: "Move all context descriptor handling..."

2. **Body line potentially exceeds 75 characters**
   - Line: "move all context descriptor handling to a single function, as with the ice driver."
   - This line is 78 characters. Should be wrapped at 75 characters.

### Warnings

1. **Variable declaration in the middle of a block**
   - At lines where `uint64_t cd_qw0, cd_qw1;` is declared after executable statements:
   ```c
   tx_offload.tso_segsz = tx_pkt->tso_segsz;
   /* Calculate the number of context descriptors needed. */
   uint64_t cd_qw0, cd_qw1;
   nb_ctx = idpf_set_tso_ctx(ol_flags, tx_pkt, tx_offload, &cd_qw0, &cd_qw1);
   ```
   While C99 allows mixed declarations, DPDK style prefers declarations at the start of blocks or in inner blocks. Consider moving the declaration to the beginning of the loop or creating an inner block.

2. **Potential uninitialized variable usage**
   - `cd_qw0` and `cd_qw1` are used when `nb_ctx != 0`, but if `idpf_set_tso_ctx()` returns 0 due to the early return at line 857 (`if (tx_offload.l4_len == 0)`), the function returns 0 without setting `*qw0` and `*qw1`. However, since `nb_ctx` would be 0 in this case, the values wouldn't be used. This is correct but could be clearer - consider initializing the output parameters to 0 at the start of `idpf_set_tso_ctx()` for defensive programming.

3. **Inconsistent function naming**
   - The function was renamed from `idpf_set_splitq_tso_ctx` to `idpf_set_tso_ctx`, which is good as it's now used by both splitq and singleq paths. However, the comment on line 848 still only mentions "TSO context descriptor" - consider documenting that this handles the context descriptor for both queue types.

### Info

1. **Good refactoring pattern**
   - The consolidation of context descriptor handling into a single function that returns quad-word values rather than writing directly to volatile memory is a clean approach that mirrors the ice driver pattern mentioned in the commit message.

2. **Code simplification**
   - The removal of the separate `if ((ol_flags & RTE_MBUF_F_TX_TCP_SEG) != 0)` checks in both transmit functions is a good cleanup, as the check is now internal to `idpf_set_tso_ctx()`.

### Summary

The patch performs a reasonable refactoring to consolidate context descriptor handling. The main issues are:
- Commit message body formatting (first word capitalization, line length)
- Variable declarations could be moved to block start for style consistency

=== Patch 10/36 ===

## Review of Patch 10/36: net/intel: consolidate checksum mask definition

### Overall Assessment
This patch consolidates duplicate checksum offload mask definitions across multiple Intel drivers into a common header. The change is clean and improves maintainability.

---

### Errors
None identified.

---

### Warnings

**1. Inconsistent flags in consolidated mask vs. original i40e/ice masks**

The new `CI_TX_CKSUM_OFFLOAD_MASK` includes `RTE_MBUF_F_TX_OUTER_UDP_CKSUM` which was NOT present in the original `I40E_TX_CKSUM_OFFLOAD_MASK` or `ICE_TX_CKSUM_OFFLOAD_MASK`:

Original i40e:
```c
#define I40E_TX_CKSUM_OFFLOAD_MASK (RTE_MBUF_F_TX_IP_CKSUM |
		RTE_MBUF_F_TX_L4_MASK |
		RTE_MBUF_F_TX_TCP_SEG |
		RTE_MBUF_F_TX_OUTER_IP_CKSUM)
```

Original ice:
```c
#define ICE_TX_CKSUM_OFFLOAD_MASK (RTE_MBUF_F_TX_IP_CKSUM |
		RTE_MBUF_F_TX_L4_MASK |
		RTE_MBUF_F_TX_TCP_SEG |
		RTE_MBUF_F_TX_UDP_SEG |
		RTE_MBUF_F_TX_OUTER_IP_CKSUM)
```

New common mask:
```c
#define CI_TX_CKSUM_OFFLOAD_MASK (RTE_MBUF_F_TX_IP_CKSUM |
				   RTE_MBUF_F_TX_L4_MASK |
				   RTE_MBUF_F_TX_TCP_SEG |
				   RTE_MBUF_F_TX_UDP_SEG |
				   RTE_MBUF_F_TX_OUTER_IP_CKSUM |
				   RTE_MBUF_F_TX_OUTER_UDP_CKSUM)  /* NEW for i40e/ice */
```

Also, i40e didn't have `RTE_MBUF_F_TX_UDP_SEG`. This expands the behavior for i40e and ice drivers. Please confirm this is intentional and won't cause issues with hardware that doesn't support these offloads.

**2. Alignment inconsistency in macro definition**

The continuation lines in `CI_TX_CKSUM_OFFLOAD_MASK` use inconsistent alignment with extra spaces:
```c
#define CI_TX_CKSUM_OFFLOAD_MASK (RTE_MBUF_F_TX_IP_CKSUM |		 \
				   RTE_MBUF_F_TX_L4_MASK |		 \
```

Should align using tabs consistently per DPDK style, matching the pattern used in the removed driver-specific macros.

---

### Info

**1. Commit message could mention behavioral change**

Since the consolidated mask is a superset of some driver-specific masks, the commit message could note that this extends the checksum offload flags checked for i40e and ice drivers (adding `RTE_MBUF_F_TX_OUTER_UDP_CKSUM` to both, and `RTE_MBUF_F_TX_UDP_SEG` to i40e).

---

### Summary
The consolidation approach is good for maintainability. The main concern is whether the expanded flag mask is intentional for i40e and ice drivers, as this changes behavior by checking additional offload flags. If intentional, a brief mention in the commit message would be helpful.

=== Patch 11/36 ===

## Review of Patch 11/36: net/intel: create common checksum Tx offload function

### Errors

None identified.

### Warnings

1. **Minor alignment inconsistency in function call (i40e_rxtx.c)**
   
   The replacement call has inconsistent alignment:
   ```c
   ci_txd_enable_checksum(ol_flags, &td_cmd,
   					 &td_offset, tx_offload);
   ```
   The continuation line uses tabs that don't align with the opening parenthesis. This should be aligned consistently:
   ```c
   ci_txd_enable_checksum(ol_flags, &td_cmd,
   			&td_offset, tx_offload);
   ```

2. **Functional difference between merged implementations**
   
   The original `i40e_txd_enable_checksum()` did NOT handle `RTE_MBUF_F_TX_UDP_SEG`, but `ice_txd_enable_checksum()` did. The new common function includes the UDP segmentation handling (lines 93-98 of the new header). This extends i40e's behavior - verify this is intentional and that i40e hardware supports UDP segmentation offload in this manner.

### Info

1. **Well-structured consolidation**
   
   The patch effectively merges two nearly identical functions into a shared implementation, reducing code duplication between i40e and ice drivers.

2. **Commit message is clear and appropriate**
   
   The subject is within 60 characters, uses correct prefix format (`net/intel:`), and the body adequately explains the change and notes future extensibility.

3. **Code follows DPDK style**
   
   The new `ci_txd_enable_checksum()` function follows proper formatting with return type on its own line and consistent brace/indentation style.

=== Patch 12/36 ===

## DPDK Patch Review: net/intel: create a common scalar Tx function

### Overall Summary
This patch moves the ice driver's transmit function into a common location for reuse by other Intel drivers. The refactoring is generally well-done, but there are a few issues to address.

---

### Errors

**1. Typo in struct name: "timesstamp" should be "timestamp"**

The struct `ci_timesstamp_queue_fns` has a spelling error ("timesstamp" with double 's'):

```c
struct ci_timesstamp_queue_fns {
```

Should be:
```c
struct ci_timestamp_queue_fns {
```

This typo appears in both the header file and the ice_rxtx.c usage.

---

### Warnings

**1. Behavioral change: `td_cmd` initialization differs between original and common code**

In the original `ice_xmit_pkts()`:
```c
td_cmd = 0;
```

In the new `ci_xmit_pkts()`:
```c
td_cmd = CI_TX_DESC_CMD_ICRC;
```

This changes the behavior - the ice driver previously did not set the ICRC bit initially. This may be intentional for consolidation across drivers, but should be verified and documented in the commit message if it's a functional change.

**2. Unnecessary cast of void pointer**

In `ice_xmit_pkts()`:
```c
struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
```

The cast from `void *` is unnecessary in C:
```c
struct ci_tx_queue *txq = tx_queue;
```

**3. Missing null check in `ice_get_ts_tail()`**

The function directly dereferences `txq->tsq` without checking for NULL:
```c
static uint16_t
ice_get_ts_tail(struct ci_tx_queue *txq)
{
    return txq->tsq->ts_tail;
}
```

While the caller checks `txq->tsq != NULL` before passing `&ts_fns`, this creates a fragile API contract. Consider adding a defensive check or documenting the precondition.

**4. Double blank line in header file**

Line 205-206 in `tx_scalar_fns.h` has an extra blank line before the `#endif`:
```c
	return nb_tx;
}


#endif /* _COMMON_INTEL_TX_SCALAR_FNS_H_ */
```

Should be a single blank line.

---

### Info

**1. Good refactoring pattern**

The use of function pointers for timestamp handling (`ci_timesstamp_queue_fns`) is a clean way to abstract driver-specific behavior while sharing the common transmit logic.

**2. Removed debug logging**

The original code had:
```c
PMD_TX_LOG(DEBUG, "Setting RS bit on TXD id=%4u ...", ...);
```

This debug log statement was removed in the refactoring. This is acceptable but worth noting if debug tracing is important for the ice driver.

**3. Consider const correctness for function pointer struct**

The `ts_fns` local variable in `ice_xmit_pkts()` is already `const`, which is good practice.

---

### Summary

The main issues to address:
1. **Error**: Fix the "timesstamp" typo to "timestamp"
2. **Warning**: Verify the `CI_TX_DESC_CMD_ICRC` behavioral change is intentional
3. **Warning**: Remove unnecessary void* cast

=== Patch 13/36 ===

# DPDK Patch Review: net/i40e: use common scalar Tx function

## Summary
This patch refactors the i40e driver to use a common scalar transmit function (`ci_xmit_pkts`) that was previously moved from the ice driver, eliminating approximately 206 lines of duplicated code.

---

## Errors
None identified.

---

## Warnings

### 1. Missing Release Notes
**Issue**: This patch removes significant driver code (206 lines) and changes the internal implementation of the transmit path. While this is an internal refactoring, changes to driver implementation that could affect behavior typically warrant release notes.

**Assessment**: Since this is purely internal refactoring with no public API changes and the commit message indicates the behavior should be identical, this may be acceptable without release notes. However, if there are any observable differences (e.g., performance characteristics), release notes would be appropriate.

---

## Info

### 1. Code Consolidation
The patch correctly consolidates duplicated transmit logic into a common function `ci_xmit_pkts()`. The comment explaining the NULL parameter for timestamp functions is helpful:
```c
/* i40e does not support timestamp queues, so pass NULL for ts_fns */
return ci_xmit_pkts(tx_queue, tx_pkts, nb_pkts, get_context_desc, NULL);
```

### 2. Commit Message Quality
The commit message is well-written:
- Clear subject line within 60 characters
- Explains the rationale (code deduplication)
- Mentions the benefit (saves hundreds of duplicated lines)
- Proper Signed-off-by tag

---

## Review Checklist

| Item | Status |
|------|--------|
| Subject line ≤60 characters | ✓ Pass (44 chars) |
| Subject lowercase after prefix | ✓ Pass |
| Correct component prefix (`net/i40e:`) | ✓ Pass |
| Imperative mood | ✓ Pass |
| No trailing period | ✓ Pass |
| Body wrapped appropriately | ✓ Pass |
| Signed-off-by present | ✓ Pass |
| SPDX/Copyright (no new files) | N/A |
| Code style | ✓ Pass |
| No forbidden tokens | ✓ Pass |

---

## Overall Assessment

**APPROVE** - This is a clean refactoring patch that reduces code duplication by leveraging a common transmit function. The implementation is straightforward, the comment explaining the NULL parameter is helpful, and the commit message adequately describes the change. The patch follows DPDK coding standards and contribution guidelines.

=== Patch 14/36 ===

## Review of Patch 14/36: net/intel: add IPsec hooks to common Tx function

### Commit Message

**No issues found.** The subject line is within 60 characters, uses correct prefix format, imperative mood, and lowercase. The body adequately explains the change and its design rationale. Signed-off-by is present.

### Code Review

#### Errors

None identified.

#### Warnings

1. **Unnecessary NULL check before IPsec operations in segment length calculation (line 347-350)**
   
   The `calc_segment_len` callback is called unconditionally when `ipsec_ops != NULL`, but the original simple assignment (`slen = m_seg->data_len`) would be correct for non-IPsec packets even when IPsec ops are provided. The callback must handle the non-IPsec case internally.
   
   Consider whether passing `ipsec_md` (which may be NULL) and `ol_flags` is sufficient for the callback to determine the correct behavior, or if the check should be `if (ipsec_ops != NULL && ipsec_md != NULL)` instead.

2. **Missing parameter documentation in typedef comments**
   
   The `tlen` parameter in `calc_ipsec_segment_len_t` typedef (line 131) is not documented in the comment. Its purpose is unclear from context.

#### Info

1. **Consistent NULL check pattern**
   
   The code correctly uses compile-time NULL checks that can be optimized away:
   ```c
   if (ipsec_ops != NULL)
   ```
   This is good for the stated zero-performance-impact goal.

2. **Extra blank line (line 334)**
   
   There's a blank line after the closing brace of the IPsec descriptor block that creates inconsistent spacing. While not strictly wrong, it differs from the style of the context descriptor block above it.

3. **Comment style consistency**
   
   The inline comment at line 347 (`/* Calculate segment length, using IPsec callback if provided */`) is helpful but extends the visual complexity. Consider whether this level of documentation is needed given the self-documenting nature of `calc_segment_len`.

### Summary

This patch cleanly extends the common Tx function to support IPsec offload through a callback mechanism. The design of passing compile-time NULL constants for drivers without IPsec support is sound for optimization. The main concern is ensuring the `calc_segment_len` callback correctly handles the case where IPsec descriptors are available but not used for a particular packet.

=== Patch 15/36 ===

## Review of Patch 15/36: net/intel: support configurable VLAN tag insertion on Tx

### Errors

None identified.

### Warnings

1. **Commit body line potentially exceeds 75 characters**
   - Line "Make the VLAN tag insertion logic configurable in the common code, as to" appears to be close to or at the 75 character limit. Consider rewording for better readability.

2. **Comment style inconsistency in tx_scalar_fns.h**
   - Lines 288-290 use a non-standard multi-line comment format:
   ```c
   /* Descriptor based VLAN/QinQ insertion */
   /* for single vlan offload, only insert in data desc with VLAN_IN_L2TAG1 is set
    * for qinq offload, we always put inner tag in L2Tag1
    */
   ```
   The second comment should either be a proper multi-line comment starting with `/*` on its own line or merged with the first. Also, comments should be proper sentences (capitalize "For", end with period).

3. **Inconsistent condition formatting**
   - In tx_scalar_fns.h lines 291-292, the condition continuation uses double-indent with tabs, but the alignment appears inconsistent:
   ```c
   if (((ol_flags & RTE_MBUF_F_TX_VLAN) && l2tag1_field == CI_VLAN_IN_L2TAG1) ||
   				(ol_flags & RTE_MBUF_F_TX_QINQ)) {
   ```
   The continuation line appears to use excessive indentation.

4. **Cast in i40e_rxtx.c may have been unintentionally removed**
   - The change from `((uint64_t)I40E_TX_CTX_DESC_IL2TAG2 << ...)` to `(I40E_TX_CTX_DESC_IL2TAG2 << ...)` removes an explicit cast. This should be verified to ensure no truncation or sign-extension issues occur. If the cast was unnecessary, this is acceptable cleanup; otherwise it could be a bug.

### Info

1. **Enum naming convention**
   - The enum values `CI_VLAN_IN_L2TAG1` and `CI_VLAN_IN_L2TAG2` follow DPDK uppercase naming conventions correctly.

2. **Doxygen documentation is well-formed**
   - The documentation for `enum ci_tx_l2tag1_field` is clear and properly formatted.

3. **Function parameter order**
   - Adding `l2tag1_field` as the 4th parameter (after `nb_pkts`, before callback functions) is a reasonable placement in the API.

### Summary

The patch is mostly well-formed. The main concerns are the comment style in the modified code section, the condition line indentation, and verifying that the removed uint64_t cast in i40e_rxtx.c is intentional and safe. The commit message and overall structure are acceptable.

=== Patch 16/36 ===

# Code Review: DPDK Patch 16/36

## Review Summary

This patch refactors the iavf driver to use the common scalar Tx function, eliminating duplicated code. The approach is sound - creating callbacks for IPsec and context descriptor handling to integrate with the common infrastructure.

---

## Errors

None found.

---

## Warnings

### 1. Potential NULL Pointer Dereference in `iavf_get_ipsec_desc`

**File:** `drivers/net/intel/iavf/iavf_rxtx.c`, lines 2477-2480

The `RTE_MBUF_DYNFIELD` macro returns a pointer computed from the mbuf address and offset. The check `if (!md)` after this call is ineffective - the macro will return a non-NULL pointer even if the dynfield doesn't contain valid data.

```c
md = RTE_MBUF_DYNFIELD(mbuf, txq->ipsec_crypto_pkt_md_offset,
                       struct iavf_ipsec_crypto_pkt_metadata *);
if (!md)
    return 0;
```

The NULL check should verify that `ipsec_crypto_pkt_md_offset` is valid (not -1 or similar sentinel) rather than checking `md`.

### 2. Unused Variable with Suppression Comment

**File:** `drivers/net/intel/iavf/iavf_rxtx.c`, line 2558

```c
tlen = iavf_fill_ctx_desc_segmentation_field(&cd_type_cmd,
                                             mbuf, ipsec_md);
(void)tlen; /* Suppress unused variable warning */
```

If `tlen` is not used, consider whether the function call is necessary or if the code is incomplete. The cast to void works but suggests the return value should either be used or the function redesigned.

### 3. Implicit Boolean Conversion

**File:** `drivers/net/intel/iavf/iavf_rxtx.c`, line 2543

```c
if (ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD && ipsec_md) {
```

Per guidelines, pointer checks should use explicit comparison: `ipsec_md != NULL`.

### 4. Stack-Allocated Callback Structure

**File:** `drivers/net/intel/iavf/iavf_rxtx.c`, lines 2601-2604

```c
const struct ci_ipsec_ops ipsec_ops = {
    .get_ipsec_desc = iavf_get_ipsec_desc,
    .calc_segment_len = iavf_calc_ipsec_segment_len,
};
```

While this works, creating this structure on every function call adds unnecessary overhead. Consider making this a static const at file scope.

---

## Info

### 1. Significant Code Reduction

The patch removes ~220 lines of code by leveraging the common Tx function, which is a good consolidation effort.

### 2. Consistent Refactoring Pattern

The callback-based approach for `iavf_get_context_desc`, `iavf_get_ipsec_desc`, and `iavf_calc_ipsec_segment_len` provides clean integration points with the common code.

### 3. Function Signature Improvement

Good change making `iavf_calc_context_desc` take `const struct rte_mbuf *` instead of non-const, indicating the function doesn't modify the mbuf.

---

## Summary

The patch is well-structured and achieves its goal of code deduplication. The main concerns are:
1. The ineffective NULL check on the dynfield result
2. Minor style issues with implicit boolean comparisons
3. Performance consideration for the stack-allocated callback structure

=== Patch 17/36 ===

## Review of Patch 17/36: net/i40e: document requirement for QinQ support

### Commit Message

**No issues found.** The commit message is well-formed:
- Subject line is 52 characters, within the 60-character limit
- Uses correct prefix `net/i40e:`
- Lowercase after colon, imperative mood, no trailing period
- Body explains the issue and solution clearly
- Signed-off-by tag present

### Code Review

#### Errors

None identified.

#### Warnings

1. **Line length exceeds 100 characters in driver code**
   
   The warning log messages exceed the 100-character limit:
   ```c
   PMD_DRV_LOG(WARNING, "Port %u: QinQ TX offload is enabled but VLAN extend mode is not set. ",
   ```
   This line is approximately 107 characters.
   
   **Suggested fix:** Break the format string:
   ```c
   PMD_DRV_LOG(WARNING,
       "Port %u: QinQ TX offload enabled but VLAN extend mode not set",
       dev->data->port_id);
   PMD_DRV_LOG(WARNING,
       "Double VLAN insertion requires RTE_ETH_RX_OFFLOAD_VLAN_EXTEND in Rx config");
   ```

2. **Trailing space in log message**
   
   The first `PMD_DRV_LOG` format string ends with a trailing space before the closing quote:
   ```c
   "...VLAN extend mode is not set. ",
   ```
   
   **Suggested fix:** Remove the trailing space.

3. **Consider combining log messages or returning an error**
   
   Two separate WARNING logs for a single condition is unusual. Consider either:
   - Combining into a single log message
   - Returning an error (`-EINVAL`) since the configuration is invalid, rather than just warning

#### Info

1. **Documentation improvement is good**
   
   Adding explicit documentation about the QinQ/VLAN_EXTEND dependency is helpful for users.

2. **Code example in documentation**
   
   The RST code example is clear and demonstrates proper configuration.

### Summary

The patch adds useful documentation and a runtime check for a hardware requirement. The main issues are:
- Line length violations in the warning messages
- Trailing whitespace in a string literal
- Consider whether this should be an error rather than a warning (if the configuration truly won't work)

=== Patch 18/36 ===

## Review of Patch 18/36: net/idpf: use common scalar Tx function

### Overall Assessment
This patch refactors the IDPF driver's single-queue transmit function to use a common scalar Tx implementation. The changes are straightforward code consolidation.

---

### Errors
None identified.

---

### Warnings

1. **Function signature change for `idpf_set_tso_ctx` may need documentation**
   
   The `idpf_set_tso_ctx` function signature changed significantly:
   - `tx_offload` changed from pass-by-value to `const` pointer
   - Added `const struct ci_tx_queue *txq __rte_unused` parameter
   
   This appears to be adapting to a common callback interface. If this callback signature is part of a documented internal API, ensure consistency is maintained across all drivers using it.

2. **Removed include may affect other code paths**
   
   The removal of `#include "../common/rx.h"` should be verified to not break any dependencies. If this header was only used by the removed code, this is fine.

---

### Info

1. **Significant code reduction**
   
   This patch removes ~168 lines of driver-specific Tx code and replaces it with a 3-line call to `ci_xmit_pkts()`. This is good for maintainability.

2. **Parameter formatting improvement**
   
   The reformatting of `idpf_set_tso_ctx` parameters to align properly is a style improvement.

3. **Minor style note on function call**
   
   In the new implementation:
   ```c
   return ci_xmit_pkts(tx_queue, tx_pkts, nb_pkts, CI_VLAN_IN_L2TAG1,
   		idpf_set_tso_ctx, NULL, NULL);
   ```
   The continuation line uses tabs for indentation which is correct per DPDK style.

---

### Commit Message Review
- **Subject line**: "net/idpf: use common scalar Tx function" - 40 characters, good format
- **Body**: Describes the change adequately
- **Signed-off-by**: Present with valid format

The commit message is acceptable, though it could mention this is specifically for "single-queue configuration" which is mentioned in the body but adds useful context.

---

### Summary
This is a clean refactoring patch that consolidates driver-specific Tx code into a common implementation. No blocking issues identified.

=== Patch 19/36 ===

## Review of Patch 19/36: net/intel: avoid writing the final pkt descriptor twice

### Overall Assessment
This is a well-structured optimization patch that reduces memory writes in the scalar Tx datapath.

---

### Errors
None identified.

---

### Warnings

1. **Subject line is 62 characters (exceeds 60 character limit)**
   
   Current: `net/intel: avoid writing the final pkt descriptor twice`
   
   Suggested shorter alternatives:
   - `net/intel: avoid double write to final Tx descriptor` (51 chars)
   - `net/intel: reduce writes for final pkt descriptor` (50 chars)

---

### Info

1. **Clear optimization with good explanation**
   
   The commit message effectively explains the performance issue (3 x 64-bit writes instead of 2) and how the fix addresses it by computing the EOP bit inside the loop.

2. **Code change is straightforward and correct**
   
   The logic change moves the EOP bit computation inside the `do...while` loop, adding it only when `m_seg->next == NULL` (i.e., the last segment). This correctly handles both single-segment and multi-segment packets while eliminating the redundant write.

3. **RS bit handling is appropriately updated**
   
   The RS (Report Status) bit handling is correctly updated to use a direct OR operation on the already-written descriptor rather than going through `td_cmd`, since `td_cmd` is no longer written after the loop.

---

### Summary
This is a clean optimization patch. The only issue is the subject line length being 2 characters over the 60-character limit.

=== Patch 20/36 ===

## Review of Patch 20/36: "eal: add macro for marking assumed alignment"

### Overall Assessment
This patch adds a portability macro wrapping `__builtin_assume_aligned` for use across different toolchains. The implementation is straightforward and appropriate.

---

### Errors
None.

---

### Warnings

1. **Missing documentation for new macro**
   
   The new `__rte_assume_aligned` macro lacks a Doxygen comment explaining its purpose, parameters, and usage. Public macros in `rte_common.h` should be documented.
   
   **Suggested fix:**
   ```c
   /**
    * Mark a pointer as having a known minimum alignment.
    *
    * This allows the compiler to optimize code based on the alignment assumption.
    *
    * @param ptr
    *   Pointer to mark as aligned.
    * @param align
    *   Minimum alignment in bytes (must be a power of 2).
    * @return
    *   The pointer, with alignment information for the compiler.
    */
   #ifdef RTE_TOOLCHAIN_MSVC
   #define __rte_assume_aligned(ptr, align) (ptr)
   #else
   #define __rte_assume_aligned __builtin_assume_aligned
   #endif
   ```

2. **Commit message could be more descriptive**
   
   The commit body mentions this is for "gcc/clang builtin __rte_assume_aligned" but the builtin is actually `__builtin_assume_aligned`. The description also doesn't explain why this macro is needed or what benefits it provides.

---

### Info

1. **MSVC fallback is a no-op**
   
   The MSVC implementation simply returns the pointer unchanged, meaning MSVC builds won't benefit from alignment hints. This is acceptable as a portability measure, but could be noted in documentation. MSVC has `__assume` which could potentially be used, though it works differently.

2. **Consider including `__builtin_assume_aligned` behavior note**
   
   The GCC/Clang builtin can take an optional third parameter for offset. The current macro alias supports this implicitly since it directly maps to the builtin, which is good for flexibility.

---

### Summary
The patch is functionally correct and follows coding style guidelines. The main suggestion is to add Doxygen documentation for the new macro, which is expected for public API additions in DPDK header files.

=== Patch 21/36 ===

## Review of Patch 21/36: net/intel: write descriptors using non-volatile pointers

### Commit Message

**Subject Line:**
- Length: 59 characters ✓ (within 60 character limit)
- Format: Correct prefix `net/intel:` ✓
- Lowercase after prefix ✓
- Imperative mood ✓
- No trailing period ✓

**Body:**
- Describes the change and rationale adequately ✓
- Lines within 75 characters ✓
- Does not start with "It" ✓

**Tags:**
- Signed-off-by present with valid format ✓

### License/Copyright

This patch modifies an existing file without adding new copyright - this is acceptable as it's a modification to existing Intel code.

### Code Style Review

**Errors:** None

**Warnings:**

1. **Variable declaration placement in loop body (lines ~370, ~392)**
   
   The `const uint64_t cmd_type_offset_bsz` declarations appear in the middle of statement blocks. While C99 allows this, DPDK style prefers declarations at the start of blocks for consistency. However, since these are inside loops and immediately used, this is acceptable.

2. **Casting away volatile in `write_txd()` function**
   
   The function casts away volatile from `txd`:
   ```c
   uint64_t *txd_qw = __rte_assume_aligned(RTE_CAST_PTR(void *, txd), 16);
   ```
   
   This is intentional per the commit message ("allow the compiler to optionally merge the stores"), but deserves scrutiny. The comment in the commit message explains the rationale, which is acceptable for TX descriptors where the volatile was likely overly conservative.

**Info:**

1. **Function placement consideration**
   
   The new `write_txd()` function is static inline and placed appropriately before its first use. Good.

2. **Consistency improvement**
   
   The patch consistently replaces three separate direct descriptor write locations with the unified `write_txd()` function, improving maintainability.

### Technical Observations

1. The use of `RTE_CAST_PTR` to remove volatile and then using `__rte_assume_aligned` is a clean way to handle the cast while providing alignment hints to the compiler.

2. The byte-ordering conversion (`rte_cpu_to_le_64`) is correctly moved into the `write_txd()` function, which centralizes this logic.

3. The change from:
   ```c
   ctx_txd[0] = cd_qw0;
   ctx_txd[1] = cd_qw1;
   ```
   to:
   ```c
   write_txd(ctx_txd, cd_qw0, cd_qw1);
   ```
   Note that the original code did NOT have `rte_cpu_to_le_64()` on the context descriptor writes, but the new `write_txd()` function adds it. This could be a behavioral change if `cd_qw0` and `cd_qw1` were already in little-endian format.

### Potential Issue

**Warning: Possible double byte-swap for context descriptors**

Looking at line 315 where `write_txd(ctx_txd, cd_qw0, cd_qw1)` is called - the original code did not apply `rte_cpu_to_le_64()` to `cd_qw0` and `cd_qw1`. If these values are already constructed in little-endian format elsewhere, the new code would incorrectly byte-swap them on big-endian systems.

The reviewer should verify that `cd_qw0` and `cd_qw1` are constructed in host byte order and need conversion, or if they're already in little-endian and this introduces a bug.

### Summary

The patch is generally well-formed and follows DPDK conventions. The main concern is verifying that the byte-ordering change for context descriptors (ctx_txd) is intentional and correct.

=== Patch 22/36 ===

## DPDK Patch Review: net/intel: remove unnecessary flag clearing

### Overall Assessment
This patch is well-structured and follows DPDK guidelines. It removes unnecessary code that clears a flag during Tx ring cleanup, with the rationale that the flag will be automatically cleared during subsequent descriptor writes.

---

### Commit Message Review

**No issues found.**

- **Subject line**: "net/intel: remove unnecessary flag clearing" (45 characters) ✓
- **Format**: Correct prefix `net/intel:`, lowercase, imperative mood, no trailing period ✓
- **Body**: Clearly explains what is being removed and why (performance benefit) ✓
- **Line length**: Body lines are within 75 characters ✓
- **Signed-off-by**: Present with proper format ✓

---

### Code Review

**No issues found.**

The change is straightforward and correct:
- Removes a `memset`-style zeroing operation (`txd[desc_to_clean_to].cmd_type_offset_bsz = 0`) that is unnecessary
- The removed comment correctly explains the original intent, and the commit message explains why this is no longer needed
- The logic for updating `last_desc_cleaned` and `nb_tx_free` is preserved

---

### License/Copyright

**Not applicable** - This patch only modifies existing code; no new files are added, so no SPDX or copyright changes are expected.

---

### Documentation

**Info**: The patch claims "a small performance benefit" - while release notes are not strictly required for performance optimizations in internal code paths, if this is part of a larger series with measurable performance improvements, consider documenting the cumulative benefit in the release notes for the series.

---

### Summary

This is a clean patch that removes unnecessary code. No errors or warnings.

=== Patch 23/36 ===

## DPDK Patch Review: net/intel: mark mid-burst ring cleanup as unlikely

### Summary
This patch adds an `unlikely()` hint to a conditional check in the transmit path to help branch prediction when the descriptor ring cleanup is needed mid-burst.

---

### Errors
None identified.

---

### Warnings
None identified.

---

### Info

1. **Simple and focused change**: This is a straightforward performance optimization that adds appropriate branch prediction hints. The rationale in the commit message is clear - mid-burst cleanup should be rare, so marking it `unlikely` is sensible.

2. **Commit message is well-formed**:
   - Subject line: "net/intel: mark mid-burst ring cleanup as unlikely" (52 characters, within limit)
   - Proper component prefix (`net/intel:`)
   - Lowercase after colon
   - Imperative mood
   - Body explains the "why" concisely
   - Signed-off-by present with valid name and email

3. **Code change is appropriate**: The `unlikely()` macro is the standard DPDK way to provide branch hints, and the usage here follows the pattern of wrapping a comparison with explicit operators (`>`), which is correct.

---

### Verdict
This patch appears to comply with DPDK coding standards and contribution guidelines. The change is minimal, well-justified, and correctly implemented.

=== Patch 24/36 ===

## Review of Patch 24/36: net/intel: add special handling for single desc packets

### Overall Assessment
This patch adds an optimization for single-descriptor packets without TSO. The implementation is functional but has some style issues.

---

### Errors

None identified.

---

### Warnings

1. **Variable declaration after statement (C99 style in middle of block)**
   ```c
   const uint64_t cmd_type_offset_bsz = CI_TX_DESC_DTYPE_DATA |
   ```
   This declaration appears after executable statements within the same block (`if (txe->mbuf)` and the assignment to `*txe`). While C99 allows this, DPDK style prefers declarations at the start of a block or using an inner block. Consider:
   ```c
   if (nb_used == 1 && ...) {
       uint64_t cmd_type_offset_bsz;
       
       txd = &ci_tx_ring[tx_id];
       /* ... */
       cmd_type_offset_bsz = CI_TX_DESC_DTYPE_DATA | ...;
   }
   ```

2. **Use of `goto` for control flow**
   The `goto end_pkt;` pattern, while sometimes acceptable for error handling, is used here as a shortcut past the normal path. This is a minor readability concern, though it may be justified for performance in a hot path. Consider documenting why this approach was chosen if it's performance-critical.

3. **Line length close to limit**
   The lines building `cmd_type_offset_bsz` are long (around 75-80 characters), which is acceptable but approaches readability limits. The formatting is adequate.

---

### Info

1. **Comment style suggestion**
   The comment `/* special case for single descriptor packet, without TSO offload */` could end with a period for consistency with multi-line comment style, though single-line comments without periods are acceptable.

2. **Compound literal for structure assignment**
   The use of a compound literal for `*txe` assignment is clean and modern C:
   ```c
   *txe = (struct ci_tx_entry){
       .mbuf = tx_pkt, .last_id = tx_last, .next_id = tx_id
   };
   ```
   This is good practice with designated initializers.

3. **Label placement**
   The `end_pkt:` label placement is correct, appearing at the appropriate indentation level before the common exit code.

---

### Summary

The patch is acceptable. The main concern is the variable declaration style within the conditional block, which could be improved for clarity. The `goto` usage is a debatable style choice but is a common pattern in performance-critical packet processing code. The commit message is clear and the implementation appears correct for the stated optimization goal.

=== Patch 25/36 ===

## Code Review: Patch 25/36 - net/intel: use separate array for desc status tracking

### Overview
This patch changes the TX descriptor status tracking mechanism to use a separate array indexed by RS threshold boundaries instead of storing last_id in each software ring entry.

### Errors

1. **Variable declaration in middle of block after statements (C99 style inconsistency)**
   
   In `ci_xmit_pkts()` function in `tx_scalar_fns.h`:
   ```c
   txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);

   /* Check if packet crosses into a new RS threshold bucket... */
   uint16_t next_rs_idx = ((tx_last + 1) >> txq->log2_rs_thresh);
   ```
   
   While C99 allows this, mixing declaration styles within the same function (declarations at start and in the middle) can reduce consistency. More importantly, `pkt_rs_idx` is declared at line ~219 inside the loop, which is fine, but the pattern should be consistent.

2. **Missing sw_ring free in error path for cpfl**
   
   In `cpfl_tx_queue_setup()`, the error handling at `err_rs_last_id_alloc` correctly frees `sw_ring`, but this wasn't in the original error path - good catch on fixing this.

### Warnings

1. **Implicit assumption that nb_desc is a power of 2**
   
   The RS bucket calculation assumes `nb_desc / tx_rs_thresh` divides evenly. While `tx_rs_thresh` is now required to be power-of-2, the code should verify `nb_desc` is also suitable. Looking at the checks:
   ```c
   if (nb_desc % tx_rs_thresh != 0) {
       // error
   }
   ```
   This check exists in some drivers but should be verified in all affected drivers to ensure `num_rs_buckets` calculation is correct.

2. **Inconsistent error handling patterns across drivers**
   
   - `i40e_rxtx.c`: Calls `i40e_tx_queue_release(txq)` which will try to free `rs_last_id` even if allocation failed
   - `ice_rxtx.c`: Same pattern with `ice_tx_queue_release(txq)`
   - `iavf_rxtx.c`: Manually frees `sw_ring` and `txq` but doesn't call a release function
   
   The release functions should handle NULL `rs_last_id` safely (which `rte_free(NULL)` does), but the inconsistency could lead to maintenance issues.

3. **Missing documentation update**
   
   This is a significant change to the TX path algorithm. Consider adding or updating documentation about the RS threshold requirements (power-of-2 constraint) in the relevant driver documentation.

4. **Potential integer overflow in bucket index calculation**
   
   In `ci_tx_xmit_cleanup()`:
   ```c
   const uint16_t rs_idx = (last_desc_cleaned == nb_tx_desc - 1) ?
           0 :
           (last_desc_cleaned + 1) >> txq->log2_rs_thresh;
   ```
   If `last_desc_cleaned` is `UINT16_MAX` (though unlikely given it's bounded by `nb_tx_desc`), adding 1 could overflow. The bounds should be safe given the context, but explicit casting or comments would clarify intent.

### Info

1. **Algorithm change is well-documented**
   
   The commit message clearly explains the change from per-descriptor tracking to bucket-based tracking. The inline comments in `ci_xmit_pkts()` explaining the bucket crossing logic are helpful.

2. **Consistent field additions**
   
   The new fields `rs_last_id` and `log2_rs_thresh` are added consistently to the `ci_tx_queue` structure and initialized across all affected drivers.

3. **Memory allocation uses appropriate socket**
   
   All `rte_zmalloc_socket()` calls correctly use the `socket_id` parameter for NUMA-aware allocation.

4. **The removal of `last_id` assignments throughout the xmit loop**
   
   The changes removing `txe->last_id = tx_last;` assignments are consistent with the new design where only RS boundary tracking is needed.

### Summary

The patch is well-structured and implements a cleaner tracking mechanism. The main concerns are:
- Ensuring error handling is consistent across all driver implementations
- Verifying the power-of-2 requirement for `tx_rs_thresh` is enforced in all code paths before this change takes effect
- The pattern of calling release functions during setup failures when allocations are partial needs careful review to ensure no double-frees or null pointer issues

=== Patch 26/36 ===

## Review of Patch 26/36: net/ixgbe: use separate array for desc status tracking

### Commit Message

**No Issues Found**

- Subject line: "net/ixgbe: use separate array for desc status tracking" (54 characters) ✓
- Correct component prefix `net/ixgbe:` ✓
- Lowercase after colon ✓
- Imperative mood ✓
- No trailing period ✓
- Body explains the change and its purpose ✓
- Signed-off-by present ✓

### Code Review

#### Errors

1. **Missing check for division by zero / invalid tx_rs_thresh**
   ```c
   uint16_t num_rs_buckets = nb_desc / tx_rs_thresh;
   ```
   While there is a power-of-2 check added, if `tx_rs_thresh` is 0, this would cause division by zero. The existing validation should catch this, but it's worth verifying the check order.

2. **Potential integer overflow in rs_idx calculation**
   ```c
   const uint16_t rs_idx = (last_desc_cleaned == nb_tx_desc - 1) ?
           0 :
           (last_desc_cleaned + 1) >> txq->log2_rs_thresh;
   ```
   When `last_desc_cleaned` equals `nb_tx_desc - 1`, the code returns 0. However, when it doesn't, `(last_desc_cleaned + 1)` could be `nb_tx_desc`, and the subsequent shift and array access `txq->rs_last_id[rs_idx]` should be validated to ensure it stays within bounds.

3. **Variable `txp` update logic may be incomplete**
   ```c
   if (next_rs_idx != pkt_rs_idx) {
       /* ... */
       txq->rs_last_id[pkt_rs_idx] = tx_last;
   } else
       txp = txd;
   ```
   In the original code, when RS bit was set, `txp = NULL` was executed. The new code removes this, but `txp` is used later for setting `IXGBE_TXD_CMD_RS` on the last descriptor. This logic change needs verification that the transmit completion still works correctly.

#### Warnings

1. **Missing braces for multi-statement else block style**
   ```c
   if (next_rs_idx != pkt_rs_idx) {
       /* multiple statements */
   } else
       txp = txd;
   ```
   Per coding style, when the `if` block has braces, the `else` block should also have braces for consistency.

2. **C99-style declaration in the middle of code block**
   ```c
   /* Track the RS threshold bucket at packet start */
   uint16_t pkt_rs_idx = (uint16_t)(tx_id >> txq->log2_rs_thresh);
   ```
   While C99 declarations are acceptable, this is inconsistent with the existing code in this function which declares variables at the start of the function. Consider moving to function start or ensuring consistency.

3. **Implicit comparison with zero**
   ```c
   if (!(status & rte_cpu_to_le_32(IXGBE_TXD_STAT_DD)))
   ```
   This is a bitwise operation result being used in a boolean context. While common, explicit comparison `== 0` would be clearer.

4. **Array bounds validation for rs_last_id access**
   ```c
   status = txr[txq->rs_last_id[rs_idx]].wb.status;
   ```
   The `rs_idx` calculation should be verified to never exceed `num_rs_buckets - 1`. Consider adding a bounds check or assertion in debug builds.

5. **Missing initialization of rs_last_id array values**
   The array is allocated with `rte_zmalloc_socket()` which zeros it, but the cleanup function `ixgbe_xmit_cleanup` accesses `txq->rs_last_id[rs_idx]` which will be 0 initially. Verify this is safe for the first cleanup pass.

### Info

- The removal of `txe->last_id` assignments and `nb_tx_to_clean` calculation simplifies the code, assuming the bucket-based tracking is correct.
- The comment explaining the RS bit bucket logic is helpful for understanding the algorithm.

### Summary

The patch makes significant changes to the descriptor status tracking mechanism. The main concerns are around the `txp` handling logic change and ensuring array bounds are respected for `rs_last_id` accesses. The power-of-2 requirement for `tx_rs_thresh` is a reasonable addition to support the `log2_rs_thresh` optimization.

=== Patch 27/36 ===

## Review of Patch 27/36: net/intel: drop unused Tx queue used count

### Commit Message Analysis

**Subject Line:**
- "net/intel: drop unused Tx queue used count" = 45 characters ✓
- Lowercase after prefix ✓
- No trailing period ✓
- Imperative mood ✓

**Body:**
- Explains the rationale clearly
- Lines appear within 75 character limit ✓
- Does not start with "It" ✓
- Properly signed off ✓

### Issues Found

#### Errors

None identified.

#### Warnings

1. **Potential typo in commit message body:**
   The commit message contains "ipdf driver" which should be "idpf driver" (line 2 of the second paragraph).

2. **Variable type mismatch in IDPF code:**
   In `drivers/net/intel/idpf/idpf_common_rxtx.c`, the replacement code uses:
   ```c
   txq->rs_compl_count += nb_used;
   ```
   The original code used explicit casting:
   ```c
   txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
   ```
   If `rs_compl_count` is a different type than `uint16_t`, this could represent a change in overflow behavior. The review cannot determine the type of `rs_compl_count` from this patch alone, but this should be verified to ensure consistent behavior.

3. **Magic number in code:**
   The threshold `32` in the IDPF split queue code:
   ```c
   if (txq->rs_compl_count >= 32) {
   ```
   This magic number should ideally be a named constant for maintainability, though this is pre-existing code not introduced by this patch.

#### Info

1. **Clean refactoring:** The patch cleanly removes the `nb_tx_used` field from the structure and all its usages across multiple driver files. The IDPF splitq case appropriately repurposes an existing field (`rs_compl_count`) that was previously only used in vector code paths.

2. **No release notes needed:** This is an internal implementation change that doesn't affect the public API.

### Summary

The patch is well-structured and accomplishes its stated goal. The main concern is the typo "ipdf" in the commit message which should be corrected to "idpf". The code changes appear correct and consistent across all affected files.

=== Patch 28/36 ===

## Review of Patch 28/36: net/intel: remove index for tracking end of packet

### Commit Message

**Subject Line:**
- Length: 54 characters ✓
- Format: `net/intel: remove index for tracking end of packet` - proper prefix and lowercase ✓
- Imperative mood ✓
- No trailing period ✓

**Body:**
- Explains the change clearly - removing `last_id` field and using `mbuf->next == NULL` check instead
- Lines within 75 characters ✓
- Does not start with "It" ✓
- Signed-off-by present ✓

### Code Review

**Errors:** None identified.

**Warnings:**

1. **Inconsistent ternary expression style across files:**
   - In `i40e_rxtx.c`: `(swr_ring[tx_id].mbuf->next == NULL) ? 1 : 0`
   - In `ice_rxtx.c`, `ixgbe_rxtx.c`, `iavf_rxtx.c`: `swr_ring[tx_id].mbuf->next == NULL ? 1 : 0`
   
   Some have parentheses around the condition, some don't. Should be consistent across all files.

2. **Inconsistent blank line placement:**
   - In `iavf_rxtx.c` (lines ~3966-3969) and `ice_rxtx.c` (lines ~3206-3209) and `ixgbe_rxtx.c` (lines ~2413-2416): There's a blank line after `swr_ring[tx_id].mbuf = NULL;` before the closing brace
   - In `i40e_rxtx.c`: No such blank line
   
   Consistency within the patch would be preferred.

**Info:**

1. **Simplification opportunity:** The ternary `condition ? 1 : 0` could be simplified to just `(condition)` since a boolean comparison already yields 0 or 1 in C. For example:
   ```c
   pkt_cnt += (swr_ring[tx_id].mbuf->next == NULL);
   ```
   This is more idiomatic and slightly cleaner.

2. **Logic correctness:** The change correctly moves the `mbuf->next` check before `rte_pktmbuf_free_seg()` since the mbuf data is needed to determine if it's the last segment. This is a necessary reordering.

### Summary

The patch cleanly removes the `last_id` tracking mechanism and replaces it with a direct check of the mbuf's next pointer. The approach is sound - checking `mbuf->next == NULL` to identify the last segment of a packet is correct. The main suggestions are minor style consistency issues across the different driver files being modified.

=== Patch 29/36 ===

## Review of DPDK Patch 29/36: net/intel: merge ring writes in simple Tx for ice and i40e

### Commit Message Analysis

**Subject Line:**
- "net/intel: merge ring writes in simple Tx for ice and i40e" = 56 characters ✓
- Format and case are correct ✓
- Imperative mood ✓

**Body:**
- Clear description of the change
- Properly wrapped
- Signed-off-by present ✓

### Code Review

#### Errors

None identified.

#### Warnings

1. **Missing SPDX header in new header file content (tx_scalar_fns.h additions)**
   
   While the file already exists, the new functions being added should maintain consistency. The file appears to already have proper SPDX licensing from a previous patch in the series - this is fine.

2. **Inconsistency between ICE_TD_CMD and CI_TX_DESC_CMD_DEFAULT**
   
   In `ice_rxtx.h`, `ICE_TD_CMD` was defined as only `CI_TX_DESC_CMD_EOP`, but the new `CI_TX_DESC_CMD_DEFAULT` includes both `CI_TX_DESC_CMD_ICRC | CI_TX_DESC_CMD_EOP`. This changes the behavior for ICE vector paths:
   ```c
   // OLD ICE definition:
   #define ICE_TD_CMD  CI_TX_DESC_CMD_EOP
   
   // NEW common definition:
   #define CI_TX_DESC_CMD_DEFAULT (CI_TX_DESC_CMD_ICRC | CI_TX_DESC_CMD_EOP)
   ```
   
   The i40e definition included ICRC, but ICE did not. This functional change should be explicitly mentioned in the commit message if intentional, or investigated if not. The ICE vector paths (`ice_rxtx_vec_avx2.c`, `ice_rxtx_vec_avx512.c`) now include ICRC when they previously did not.

3. **Removed function `i40e_build_ctob` may be needed elsewhere**
   
   The function `i40e_build_ctob` was removed but it handles `td_offset` and `td_tag` parameters that the new common functions don't support. Verify this function isn't called from other code paths that need those parameters.

#### Info

1. **Good code consolidation**: The patch successfully merges identical code from ice and i40e drivers into a common implementation.

2. **Minor style inconsistency**: In the original i40e code:
   ```c
   mainpart = (nb_pkts & ((uint32_t) ~N_PER_LOOP_MASK));
   ```
   The new common code has slightly different spacing:
   ```c
   mainpart = nb_pkts & ((uint32_t)~N_PER_LOOP_MASK);
   ```
   This is fine and actually cleaner.

3. **Removed comment in ice version**: The original ICE code had a comment explaining the loop:
   ```c
   /**
    * Process most of the packets in chunks of N pkts.  Any
    * leftover packets will get processed one at a time.
    */
   ```
   This explanatory comment could be preserved in the common function for maintainability.

### Summary

The patch is well-structured for code consolidation. The main concern is the **behavioral change for ICE vector transmit paths** where `CI_TX_DESC_CMD_ICRC` is now being set when it wasn't before. This should be verified as intentional and documented in the commit message if it's a deliberate fix.

=== Patch 30/36 ===

## Review of Patch 30/36: net/intel: consolidate ice and i40e buffer free function

### Commit Message

**Subject Line**: `net/intel: consolidate ice and i40e buffer free function`
- Length: 56 characters ✓
- Format: Correct prefix, lowercase, imperative mood ✓
- No trailing period ✓

**Body**: Well-written explanation of the consolidation and notes that the i40e version with batching is used as the base.

**Tags**: Signed-off-by present ✓

### Code Review

#### drivers/net/intel/common/tx.h

The new constant definition is appropriate:
```c
#define CI_TX_MAX_FREE_BUF_SZ 64
```

#### drivers/net/intel/common/tx_scalar_fns.h - ci_tx_free_bufs()

**Warning: Double increment of txep in else branch**

In the non-fast-free path:
```c
} else {
    for (uint16_t i = 0; i < rs_thresh; ++i, ++txep)
        rte_prefetch0((txep + i)->mbuf);

    for (uint16_t i = 0; i < rs_thresh; ++i, ++txep) {
        rte_pktmbuf_free_seg(txep->mbuf);
        txep->mbuf = NULL;
    }
}
```

The first loop increments `txep` `rs_thresh` times, then the second loop starts from the wrong position. After the prefetch loop, `txep` points past the last element. This is a bug introduced during consolidation.

The original i40e code had the same structure, but looking at ice's original code, it correctly used separate index variables. This appears to be a latent bug carried over from i40e.

**Suggested fix**:
```c
} else {
    for (uint16_t i = 0; i < rs_thresh; i++)
        rte_prefetch0((txep + i)->mbuf);

    for (uint16_t i = 0; i < rs_thresh; ++i, ++txep) {
        rte_pktmbuf_free_seg(txep->mbuf);
        txep->mbuf = NULL;
    }
}
```

Remove `++txep` from the prefetch loop (use only `i` for indexing).

**Info: Variable declaration style**

The function uses C99 style variable declarations inside for-loops (`uint16_t i`, `uint16_t j`), which is acceptable per guidelines.

**Info: Explicit NULL comparison**

```c
if (fast_free_mp) {
```
Could be `if (fast_free_mp != NULL)` for consistency with guidelines, though this is a minor style point since `fast_free_mp` is used as a boolean condition.

### Summary

**Errors**: None

**Warnings**:
1. **Bug in else branch**: The prefetch loop increments `txep` but shouldn't - this will cause the subsequent free loop to access memory beyond the intended range, potentially causing memory corruption or crashes. Remove `++txep` from the prefetch for-loop.

**Info**:
1. Consider using explicit `!= NULL` comparison for pointer check on line with `if (fast_free_mp)`

=== Patch 31/36 ===

## Patch Review: net/intel: complete merging simple Tx paths

### Commit Message

**No issues found.** The commit message is well-formed:
- Subject line is 47 characters, within the 60 character limit
- Correct prefix `net/intel:`
- Lowercase after colon
- Imperative mood, no trailing period
- Body adequately describes the change
- Signed-off-by present with valid format

### License/Copyright

**No issues.** This patch modifies existing files and does not add new files requiring SPDX headers.

### Code Style Issues

#### Errors

None identified.

#### Warnings

1. **Inconsistent alignment in function parameters** (tx_scalar_fns.h, lines 137-139 and 195-197):
   The continuation lines use extra spaces for alignment which is fine, but the spacing is slightly inconsistent with typical DPDK style. This is minor.

2. **Implicit integer conversion** (tx_scalar_fns.h, line 208):
   ```c
   nb_pkts -= ret;
   ```
   While technically fine since both are `uint16_t`, this differs from the original code which used explicit casts like `nb_pkts = (uint16_t)(nb_pkts - ret)`. Consider maintaining consistency with surrounding code style.

3. **Missing explicit cast in RTE_MIN** (tx_scalar_fns.h, line 205):
   ```c
   uint16_t ret, num = RTE_MIN(nb_pkts, CI_TX_MAX_BURST);
   ```
   The original i40e code used `(uint16_t)RTE_MIN(...)`. While likely unnecessary if the macro handles this, consistency with the existing codebase is preferred.

### Functional Observations

1. **Register write macro change**: The original code used driver-specific macros (`I40E_PCI_REG_WC_WRITE` and `ICE_PCI_REG_WC_WRITE`), while the common code uses `rte_write32_wc()`. This appears intentional for the common path but should be verified that the semantics are identical.

2. **Code deduplication is correct**: The common `ci_xmit_burst_simple` and `ci_xmit_pkts_simple` functions properly consolidate the nearly-identical implementations from both drivers.

### Documentation

No documentation updates appear necessary for this internal refactoring.

### Summary

This patch is a straightforward code consolidation that moves duplicated Tx simple path code into a common header. The implementation is functionally equivalent to the original driver-specific versions. The patch is ready for merge with only minor style considerations noted above.

=== Patch 32/36 ===

## Review of DPDK Patch 32/36: net/intel: use non-volatile stores in simple Tx function

### Commit Message

**Subject Line:** `net/intel: use non-volatile stores in simple Tx function`
- Length: 54 characters ✓
- Format: Correct prefix and lowercase after colon ✓
- Imperative mood ✓
- No trailing period ✓

**Body:**
- Explains the change and motivation clearly ✓
- Lines within 75 characters ✓
- Does not start with "It" ✓

**Tags:**
- Signed-off-by present with valid format ✓

### Code Review

**File: `drivers/net/intel/common/tx_scalar_fns.h`**

#### Info: Code Simplification
The refactoring removes redundant `ci_tx_fill_hw_ring_tx4()` and `ci_tx_fill_hw_ring_tx1()` functions and consolidates them into calls to the existing `write_txd()` function. This is a good code deduplication.

#### Warning: Extra whitespace in write_txd()
```c
	uint64_t *txd_qw =  __rte_assume_aligned(RTE_CAST_PTR(void *, txd), 16);
```
There are two spaces between `=` and `__rte_assume_aligned`. Should be single space.

#### Warning: Trailing blank line in code block
```c
			write_txd(txdp + idx, rte_mbuf_data_iova(*(pkts + idx)),
				CI_TX_DESC_DTYPE_DATA |
				((uint64_t)CI_TX_DESC_CMD_DEFAULT << CI_TXD_QW1_CMD_S) |
				((uint64_t)(*(pkts + idx))->data_len << CI_TXD_QW1_TX_BUF_SZ_S));

		}
```
There is a trailing blank line before the closing brace in the `leftover` handling block.

#### Info: Long lines in function calls
The `write_txd()` calls have long lines due to the descriptor construction expressions. While these are within the 100-character limit, consider whether extracting the descriptor construction to a local variable would improve readability.

### Summary

The patch is well-structured and achieves its goal of consolidating descriptor writes through a single function to enable non-volatile stores. The changes are functionally correct and the commit message adequately explains the purpose.

**Errors:** None

**Warnings:**
1. Double space in `write_txd()` function at the assignment (line with `uint64_t *txd_qw`)
2. Extra blank line before closing brace in the `leftover` handling block

=== Patch 33/36 ===

## Review of Patch 33/36: net/intel: align scalar simple Tx path with vector logic

### Commit Message

**No Issues Found**

- Subject line: "net/intel: align scalar simple Tx path with vector logic" (59 characters) ✓
- Uses correct prefix format `net/intel:` ✓
- Lowercase after colon ✓
- Imperative mood ✓
- No trailing period ✓
- Body explains the rationale for the change ✓
- Signed-off-by present ✓

### Code Review

**No Errors or Warnings Found**

The patch makes reasonable changes to align the scalar simple Tx path with the vector logic:

1. **Function refactoring**: The `ci_tx_fill_hw_ring()` function is renamed to `ci_tx_fill_hw_ring_simple()` and now takes a `txdp` pointer directly instead of computing it from `txq->tx_tail`. This separates concerns appropriately.

2. **Backlog entry handling moved to caller**: The mbuf storage in the sw_ring (via `ci_tx_backlog_entry()`) is now done in `ci_xmit_burst_simple()` rather than inside the fill function. This matches the vector path approach.

3. **Local variable usage**: The function now uses a local `tx_id` variable rather than repeatedly accessing `txq->tx_tail`, which can improve performance.

4. **Removed unnecessary blank line**: Line 52 in the original had an extra blank line that was removed - this is correct cleanup.

5. **Logic preservation**: The RS (Report Status) bit setting logic and ring wraparound handling remain functionally equivalent while being restructured to match the vector path.

6. **Final write optimization**: The final `rte_write32_wc()` now uses `tx_id` directly rather than `txq->tx_tail`, which is consistent since `tx_id` has just been assigned to `txq->tx_tail`.

### Info

- The comment change from `/* Fill hardware descriptor ring with mbuf data */` to `/* Fill hardware descriptor ring with mbuf data (simple path) */` helps clarify this is for the simple path specifically, which is good documentation practice given the refactoring context.

=== Patch 34/36 ===

# Code Review: DPDK Patch 34/36

## Patch Overview
This patch renames the `vector_tx` flag to `use_vec_entry` to more accurately reflect its purpose - indicating use of the smaller `tx_entry_vec` structure. It also extends this flag's usage to the simple TX path for performance benefits.

## Review Items

### Commit Message
**Status: OK**

- Subject line: "net/intel: use vector SW ring entry for simple path" (53 characters) ✓
- Lowercase after prefix ✓
- Imperative mood ✓
- No trailing period ✓
- Body explains the change adequately ✓
- Signed-off-by present ✓

### Code Review

#### Info: Comment formatting inconsistency in tx.h

In `drivers/net/intel/common/tx.h` at line 368-370:
```c
	/**
	 *  vPMD tx will not set sw_ring's mbuf to NULL after free,
```

The `/**` style is typically for Doxygen documentation of APIs. For implementation comments, a regular `/*` would be more appropriate. This is minor and consistent with existing code, so just noting it.

#### Warning: Potential inconsistency in cpfl_tx_queue_release call

In `drivers/net/intel/cpfl/cpfl_rxtx.c` at line 329:
```c
	ci_txq_release_all_mbufs(q, q->use_vec_entry);
```

The function `ci_txq_release_all_mbufs` takes a `use_ctx` parameter (as seen in the header), but here it's being passed `use_vec_entry`. Looking at the original code, it was `q->vector_tx`. This appears to be passing the wrong semantic meaning - `use_ctx` likely refers to whether context descriptors are used, not whether vector entries are used.

However, reviewing the function signature in tx.h shows `ci_txq_release_all_mbufs(struct ci_tx_queue *txq, bool use_ctx)` - the parameter name suggests it's for context tracking. The patch doesn't modify this function's signature or semantics, so this may have been a pre-existing issue where `vector_tx` was being used to approximate `use_ctx`. This should be verified but is outside the scope of this patch.

#### Info: Boolean assignment style in ixgbe

In `drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c` at line 114:
```c
-	txq->vector_tx = 1;
+	txq->use_vec_entry = true;
```

Good change - using `true` for a boolean field is cleaner than using `1`.

### Code Correctness

The logic changes appear correct:
- `i40e_rxtx.c` and `ice_rxtx.c` now set `use_vec_entry` when either vector TX OR simple TX is allowed, which aligns with the stated purpose
- The simple TX path in `tx_scalar_fns.h` now uses `sw_ring_vec` and `ci_tx_entry_vec` types consistently
- The cleanup path in `ci_txq_release_all_mbufs` is updated to use the new flag name

### Documentation

No documentation updates required for this internal refactoring.

## Summary

**Errors**: None

**Warnings**: None identified as definitive issues

**Info**:
- The patch is a well-structured internal refactoring
- The rename from `vector_tx` to `use_vec_entry` improves code clarity by describing what the flag actually controls
- All usages are consistently updated across the affected drivers

The patch looks correct and ready for integration.

=== Patch 35/36 ===

## Review of Patch 35/36: net/intel: use vector mbuf cleanup from simple scalar path

### Commit Message Analysis

**Subject Line:**
- "net/intel: use vector mbuf cleanup from simple scalar path" = 58 characters ✓
- Proper prefix format ✓
- Lowercase after colon ✓
- Imperative mood ✓
- No trailing period ✓

**Body:**
- Explains the rationale clearly ✓
- Does not start with "It" ✓
- Properly signed off ✓

### Code Review

**Info: Good Code Consolidation**

This patch simplifies the codebase by removing the separate `ci_tx_free_bufs()` function (approximately 50 lines) and replacing it with a thin wrapper (`ci_tx_free_bufs_simple()`) that calls the existing vector cleanup function. This reduces code duplication, which is positive.

**File: drivers/net/intel/common/tx_scalar_fns.h**

1. **Style - Function declarations look correct:**
   - `ci_tx_desc_done_simple()` - properly uses `__rte_always_inline` ✓
   - `ci_tx_free_bufs_simple()` - properly uses `__rte_always_inline` ✓

2. **Line length check on lines 24-25:**
   ```c
   return (txq->ci_tx_ring[idx].cmd_type_offset_bsz & rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
   		rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
   ```
   The first line appears to be around 95-100 characters which is within the 100 character limit. The continuation uses double-indent (tabs) which is correct for continuation lines. ✓

3. **No SPDX header visible in this diff** - This is a modification to an existing file, so the header should already be present at the top of the file. ✓

4. **Function comment style:**
   - Line 29: `/* Free transmitted mbufs using vector-style cleanup */` - Good single-line comment style ✓

5. **Dependency concern:** The new function `ci_tx_free_bufs_simple()` calls `ci_tx_free_bufs_vec()`. This implies `ci_tx_free_bufs_vec()` must be defined/declared before this header is included. Since this is patch 35/36 in a series, this dependency should have been established in earlier patches.

**Files: drivers/net/intel/i40e/i40e_rxtx.c and drivers/net/intel/ice/ice_rxtx.c**

The changes are straightforward function renames from `ci_tx_free_bufs()` to `ci_tx_free_bufs_simple()` - correct and consistent.

### Summary

**Errors:** None

**Warnings:** None

**Info:**
- This is a good refactoring patch that removes code duplication by leveraging existing vector cleanup infrastructure for the scalar path. The approach is sound and the implementation is clean.

=== Patch 36/36 ===

## Review of DPDK Patch: net/idpf: enable simple Tx function

### Commit Message

**No Issues Found**

- Subject line: "net/idpf: enable simple Tx function" (39 characters) ✓
- Lowercase after colon ✓
- Correct component prefix `net/idpf:` ✓
- Imperative mood ✓
- No trailing period ✓
- Body adequately describes the change ✓
- Signed-off-by present with valid format ✓

### Code Review

#### Errors

None identified.

#### Warnings

1. **Extra blank line in idpf_common_rxtx.c**
   
   There are two consecutive blank lines after the `idpf_dp_singleq_xmit_pkts_simple` function (line 1357) and also after the `IDPF_TX_SINGLEQ_SIMPLE` entry (line 1552). DPDK style prefers single blank lines between functions/blocks.

   ```c
   }
   
   
   /* TX prep functions */
   ```
   Should be:
   ```c
   }
   
   /* TX prep functions */
   ```

2. **Inconsistent alignment in function parameter**
   
   In `idpf_common_rxtx.c` line 1350-1351, the continuation line uses extra spaces for alignment:
   ```c
   idpf_dp_singleq_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
   				   uint16_t nb_pkts)
   ```
   The same inconsistent alignment appears in the header file declaration. The parameters should align consistently with the opening parenthesis or use standard double-indent for continuation.

3. **Missing release notes**
   
   This patch adds new functionality (simple Tx support for idpf driver). Consider adding a note to the release notes for this feature enhancement, as it's a user-visible performance optimization path.

#### Info

1. **Comment style suggestion**
   
   In `idpf_rxtx.c` lines 512-515, the comment block could use the multi-line format:
   ```c
   /*
    * Set tx_simple_allowed flag based on queue configuration.
    * For queue 0: explicitly set the flag based on its configuration.
    * For other queues: only set to false if this queue cannot use simple_tx.
    */
   ```
   This is a minor style preference - the current format is acceptable.

2. **goto label naming**
   
   The `goto out;` at line 524 jumps to `out:` at line 528, but this label only precedes a `return 0;`. While functional, this could be simplified to just `return 0;` directly, avoiding the goto. However, this may be intentional for future extensibility or consistency with other queue setup functions.

### Summary

The patch is well-structured and follows DPDK conventions. The main issues are minor style inconsistencies (extra blank lines and parameter alignment). The code logic for enabling the simple Tx path based on queue configuration appears correct.

^ permalink raw reply	[flat|nested] 274+ messages in thread

* RE: [PATCH v3 01/36] net/intel: create common Tx descriptor structure
  2026-01-30 11:41   ` [PATCH v3 01/36] net/intel: create common Tx descriptor structure Bruce Richardson
@ 2026-02-06  9:56     ` Loftus, Ciara
  0 siblings, 0 replies; 274+ messages in thread
From: Loftus, Ciara @ 2026-02-06  9:56 UTC (permalink / raw)
  To: Richardson, Bruce, dev@dpdk.org
  Cc: Richardson, Bruce, Shetty, Praveen, Medvedkin, Vladimir,
	Burakov, Anatoly, Wu, Jingjing

> Subject: [PATCH v3 01/36] net/intel: create common Tx descriptor structure
> 
> The Tx descriptors used by the i40e, iavf, ice and idpf drivers are all
> identical 16-byte descriptors, so define a common struct for them. Since
> original struct definitions are in base code, leave them in place, but
> only use the new structs in DPDK code.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

Acked-by: Ciara Loftus <ciara.loftus@intel.com>

> ---
>  drivers/net/intel/common/tx.h                 | 16 ++++++---
>  drivers/net/intel/cpfl/cpfl_rxtx.c            |  2 +-
>  drivers/net/intel/i40e/i40e_fdir.c            |  4 +--
>  drivers/net/intel/i40e/i40e_rxtx.c            | 26 +++++++-------
>  .../net/intel/i40e/i40e_rxtx_vec_altivec.c    |  6 ++--
>  drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c   |  6 ++--
>  drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c |  6 ++--
>  drivers/net/intel/i40e/i40e_rxtx_vec_neon.c   |  6 ++--
>  drivers/net/intel/iavf/iavf_rxtx.c            | 16 ++++-----
>  drivers/net/intel/iavf/iavf_rxtx.h            |  2 +-
>  drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c   |  6 ++--
>  drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c | 12 +++----
>  drivers/net/intel/ice/ice_dcf_ethdev.c        |  2 +-
>  drivers/net/intel/ice/ice_rxtx.c              | 36 +++++++++----------
>  drivers/net/intel/ice/ice_rxtx_vec_avx2.c     |  6 ++--
>  drivers/net/intel/ice/ice_rxtx_vec_avx512.c   |  6 ++--
>  drivers/net/intel/idpf/idpf_common_rxtx.c     | 20 +++++------
>  drivers/net/intel/idpf/idpf_common_rxtx.h     |  2 +-
>  .../net/intel/idpf/idpf_common_rxtx_avx2.c    |  8 ++---
>  .../net/intel/idpf/idpf_common_rxtx_avx512.c  |  8 ++---
>  drivers/net/intel/idpf/idpf_rxtx.c            |  2 +-
>  drivers/net/intel/idpf/idpf_rxtx_vec_common.h |  2 +-
>  22 files changed, 104 insertions(+), 96 deletions(-)
> 
> diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
> index e295d83e3a..d7561a2bbb 100644
> --- a/drivers/net/intel/common/tx.h
> +++ b/drivers/net/intel/common/tx.h
> @@ -10,6 +10,14 @@
>  #include <rte_ethdev.h>
>  #include <rte_vect.h>
> 
> +/*
> + * Structure of a 16-byte Tx descriptor common across i40e, ice, iavf and idpf
> drivers
> + */
> +struct ci_tx_desc {
> +	uint64_t buffer_addr; /* Address of descriptor's data buf */
> +	uint64_t cmd_type_offset_bsz;
> +};
> +
>  /* forward declaration of the common intel (ci) queue structure */
>  struct ci_tx_queue;
> 
> @@ -33,10 +41,10 @@ typedef void (*ice_tx_release_mbufs_t)(struct
> ci_tx_queue *txq);
> 
>  struct ci_tx_queue {
>  	union { /* TX ring virtual address */
> -		volatile struct i40e_tx_desc *i40e_tx_ring;
> -		volatile struct iavf_tx_desc *iavf_tx_ring;
> -		volatile struct ice_tx_desc *ice_tx_ring;
> -		volatile struct idpf_base_tx_desc *idpf_tx_ring;
> +		volatile struct ci_tx_desc *i40e_tx_ring;
> +		volatile struct ci_tx_desc *iavf_tx_ring;
> +		volatile struct ci_tx_desc *ice_tx_ring;
> +		volatile struct ci_tx_desc *idpf_tx_ring;
>  		volatile union ixgbe_adv_tx_desc *ixgbe_tx_ring;
>  	};
>  	volatile uint8_t *qtx_tail;               /* register address of tail */
> diff --git a/drivers/net/intel/cpfl/cpfl_rxtx.c
> b/drivers/net/intel/cpfl/cpfl_rxtx.c
> index d0438b5da0..78bc3e9b49 100644
> --- a/drivers/net/intel/cpfl/cpfl_rxtx.c
> +++ b/drivers/net/intel/cpfl/cpfl_rxtx.c
> @@ -131,7 +131,7 @@ cpfl_dma_zone_reserve(struct rte_eth_dev *dev,
> uint16_t queue_idx,
>  			ring_size = RTE_ALIGN(len * sizeof(struct
> idpf_flex_tx_sched_desc),
>  					      CPFL_DMA_MEM_ALIGN);
>  		else
> -			ring_size = RTE_ALIGN(len * sizeof(struct
> idpf_base_tx_desc),
> +			ring_size = RTE_ALIGN(len * sizeof(struct ci_tx_desc),
>  					      CPFL_DMA_MEM_ALIGN);
>  		memcpy(ring_name, "cpfl Tx ring", sizeof("cpfl Tx ring"));
>  		break;
> diff --git a/drivers/net/intel/i40e/i40e_fdir.c
> b/drivers/net/intel/i40e/i40e_fdir.c
> index 55d18c5d4a..605df73c9e 100644
> --- a/drivers/net/intel/i40e/i40e_fdir.c
> +++ b/drivers/net/intel/i40e/i40e_fdir.c
> @@ -1377,7 +1377,7 @@ i40e_find_available_buffer(struct rte_eth_dev
> *dev)
>  	 */
>  	if (fdir_info->txq_available_buf_count <= 0) {
>  		uint16_t tmp_tail;
> -		volatile struct i40e_tx_desc *tmp_txdp;
> +		volatile struct ci_tx_desc *tmp_txdp;
> 
>  		tmp_tail = txq->tx_tail;
>  		tmp_txdp = &txq->i40e_tx_ring[tmp_tail + 1];
> @@ -1628,7 +1628,7 @@ i40e_flow_fdir_filter_programming(struct i40e_pf
> *pf,
>  	struct ci_tx_queue *txq = pf->fdir.txq;
>  	struct ci_rx_queue *rxq = pf->fdir.rxq;
>  	const struct i40e_fdir_action *fdir_action = &filter->action;
> -	volatile struct i40e_tx_desc *txdp;
> +	volatile struct ci_tx_desc *txdp;
>  	volatile struct i40e_filter_program_desc *fdirdp;
>  	uint32_t td_cmd;
>  	uint16_t vsi_id;
> diff --git a/drivers/net/intel/i40e/i40e_rxtx.c
> b/drivers/net/intel/i40e/i40e_rxtx.c
> index 1c3586778c..92d49ccb79 100644
> --- a/drivers/net/intel/i40e/i40e_rxtx.c
> +++ b/drivers/net/intel/i40e/i40e_rxtx.c
> @@ -388,7 +388,7 @@ static inline int
>  i40e_xmit_cleanup(struct ci_tx_queue *txq)
>  {
>  	struct ci_tx_entry *sw_ring = txq->sw_ring;
> -	volatile struct i40e_tx_desc *txd = txq->i40e_tx_ring;
> +	volatile struct ci_tx_desc *txd = txq->i40e_tx_ring;
>  	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
>  	uint16_t nb_tx_desc = txq->nb_tx_desc;
>  	uint16_t desc_to_clean_to;
> @@ -1092,8 +1092,8 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
>  	struct ci_tx_queue *txq;
>  	struct ci_tx_entry *sw_ring;
>  	struct ci_tx_entry *txe, *txn;
> -	volatile struct i40e_tx_desc *txd;
> -	volatile struct i40e_tx_desc *txr;
> +	volatile struct ci_tx_desc *txd;
> +	volatile struct ci_tx_desc *txr;
>  	struct rte_mbuf *tx_pkt;
>  	struct rte_mbuf *m_seg;
>  	uint32_t cd_tunneling_params;
> @@ -1398,7 +1398,7 @@ i40e_tx_free_bufs(struct ci_tx_queue *txq)
> 
>  /* Populate 4 descriptors with data from 4 mbufs */
>  static inline void
> -tx4(volatile struct i40e_tx_desc *txdp, struct rte_mbuf **pkts)
> +tx4(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
>  {
>  	uint64_t dma_addr;
>  	uint32_t i;
> @@ -1414,7 +1414,7 @@ tx4(volatile struct i40e_tx_desc *txdp, struct
> rte_mbuf **pkts)
> 
>  /* Populate 1 descriptor with data from 1 mbuf */
>  static inline void
> -tx1(volatile struct i40e_tx_desc *txdp, struct rte_mbuf **pkts)
> +tx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
>  {
>  	uint64_t dma_addr;
> 
> @@ -1431,7 +1431,7 @@ i40e_tx_fill_hw_ring(struct ci_tx_queue *txq,
>  		     struct rte_mbuf **pkts,
>  		     uint16_t nb_pkts)
>  {
> -	volatile struct i40e_tx_desc *txdp = &txq->i40e_tx_ring[txq->tx_tail];
> +	volatile struct ci_tx_desc *txdp = &txq->i40e_tx_ring[txq->tx_tail];
>  	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
>  	const int N_PER_LOOP = 4;
>  	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
> @@ -1459,7 +1459,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
>  	     struct rte_mbuf **tx_pkts,
>  	     uint16_t nb_pkts)
>  {
> -	volatile struct i40e_tx_desc *txr = txq->i40e_tx_ring;
> +	volatile struct ci_tx_desc *txr = txq->i40e_tx_ring;
>  	uint16_t n = 0;
> 
>  	/**
> @@ -2616,7 +2616,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev
> *dev,
>  	}
> 
>  	/* Allocate TX hardware ring descriptors. */
> -	ring_size = sizeof(struct i40e_tx_desc) * I40E_MAX_RING_DESC;
> +	ring_size = sizeof(struct ci_tx_desc) * I40E_MAX_RING_DESC;
>  	ring_size = RTE_ALIGN(ring_size, I40E_DMA_MEM_ALIGN);
>  	tz = rte_eth_dma_zone_reserve(dev, "i40e_tx_ring", queue_idx,
>  			      ring_size, I40E_RING_BASE_ALIGN, socket_id);
> @@ -2640,7 +2640,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev
> *dev,
>  	txq->tx_deferred_start = tx_conf->tx_deferred_start;
> 
>  	txq->tx_ring_dma = tz->iova;
> -	txq->i40e_tx_ring = (struct i40e_tx_desc *)tz->addr;
> +	txq->i40e_tx_ring = (struct ci_tx_desc *)tz->addr;
> 
>  	/* Allocate software ring */
>  	txq->sw_ring =
> @@ -2913,13 +2913,13 @@ i40e_reset_tx_queue(struct ci_tx_queue *txq)
>  	}
> 
>  	txe = txq->sw_ring;
> -	size = sizeof(struct i40e_tx_desc) * txq->nb_tx_desc;
> +	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
>  	for (i = 0; i < size; i++)
>  		((volatile char *)txq->i40e_tx_ring)[i] = 0;
> 
>  	prev = (uint16_t)(txq->nb_tx_desc - 1);
>  	for (i = 0; i < txq->nb_tx_desc; i++) {
> -		volatile struct i40e_tx_desc *txd = &txq->i40e_tx_ring[i];
> +		volatile struct ci_tx_desc *txd = &txq->i40e_tx_ring[i];
> 
>  		txd->cmd_type_offset_bsz =
> 
> 	rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE);
> @@ -3221,7 +3221,7 @@ i40e_fdir_setup_tx_resources(struct i40e_pf *pf)
>  	}
> 
>  	/* Allocate TX hardware ring descriptors. */
> -	ring_size = sizeof(struct i40e_tx_desc) * I40E_FDIR_NUM_TX_DESC;
> +	ring_size = sizeof(struct ci_tx_desc) * I40E_FDIR_NUM_TX_DESC;
>  	ring_size = RTE_ALIGN(ring_size, I40E_DMA_MEM_ALIGN);
> 
>  	tz = rte_eth_dma_zone_reserve(dev, "fdir_tx_ring",
> @@ -3240,7 +3240,7 @@ i40e_fdir_setup_tx_resources(struct i40e_pf *pf)
>  	txq->i40e_vsi = pf->fdir.fdir_vsi;
> 
>  	txq->tx_ring_dma = tz->iova;
> -	txq->i40e_tx_ring = (struct i40e_tx_desc *)tz->addr;
> +	txq->i40e_tx_ring = (struct ci_tx_desc *)tz->addr;
> 
>  	/*
>  	 * don't need to allocate software ring and reset for the fdir
> diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
> b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
> index bbb6d907cf..ef5b252898 100644
> --- a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
> +++ b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
> @@ -446,7 +446,7 @@ i40e_recv_scattered_pkts_vec(void *rx_queue, struct
> rte_mbuf **rx_pkts,
>  }
> 
>  static inline void
> -vtx1(volatile struct i40e_tx_desc *txdp,
> +vtx1(volatile struct ci_tx_desc *txdp,
>  	struct rte_mbuf *pkt, uint64_t flags)
>  {
>  	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
> @@ -459,7 +459,7 @@ vtx1(volatile struct i40e_tx_desc *txdp,
>  }
> 
>  static inline void
> -vtx(volatile struct i40e_tx_desc *txdp,
> +vtx(volatile struct ci_tx_desc *txdp,
>  	struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
>  {
>  	int i;
> @@ -473,7 +473,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct
> rte_mbuf **tx_pkts,
>  			  uint16_t nb_pkts)
>  {
>  	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
> -	volatile struct i40e_tx_desc *txdp;
> +	volatile struct ci_tx_desc *txdp;
>  	struct ci_tx_entry_vec *txep;
>  	uint16_t n, nb_commit, tx_id;
>  	uint64_t flags = I40E_TD_CMD;
> diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
> b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
> index 4e398b3140..137c1f9765 100644
> --- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
> +++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
> @@ -681,7 +681,7 @@ i40e_recv_scattered_pkts_vec_avx2(void *rx_queue,
> struct rte_mbuf **rx_pkts,
> 
> 
>  static inline void
> -vtx1(volatile struct i40e_tx_desc *txdp,
> +vtx1(volatile struct ci_tx_desc *txdp,
>  		struct rte_mbuf *pkt, uint64_t flags)
>  {
>  	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
> @@ -694,7 +694,7 @@ vtx1(volatile struct i40e_tx_desc *txdp,
>  }
> 
>  static inline void
> -vtx(volatile struct i40e_tx_desc *txdp,
> +vtx(volatile struct ci_tx_desc *txdp,
>  		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
>  {
>  	const uint64_t hi_qw_tmpl = (I40E_TX_DESC_DTYPE_DATA |
> @@ -739,7 +739,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue,
> struct rte_mbuf **tx_pkts,
>  			  uint16_t nb_pkts)
>  {
>  	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
> -	volatile struct i40e_tx_desc *txdp;
> +	volatile struct ci_tx_desc *txdp;
>  	struct ci_tx_entry_vec *txep;
>  	uint16_t n, nb_commit, tx_id;
>  	uint64_t flags = I40E_TD_CMD;
> diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
> b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
> index 571987d27a..6971488750 100644
> --- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
> +++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
> @@ -750,7 +750,7 @@ i40e_recv_scattered_pkts_vec_avx512(void
> *rx_queue,
>  }
> 
>  static inline void
> -vtx1(volatile struct i40e_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags)
> +vtx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags)
>  {
>  	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
>  		((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT) |
> @@ -762,7 +762,7 @@ vtx1(volatile struct i40e_tx_desc *txdp, struct
> rte_mbuf *pkt, uint64_t flags)
>  }
> 
>  static inline void
> -vtx(volatile struct i40e_tx_desc *txdp,
> +vtx(volatile struct ci_tx_desc *txdp,
>  	struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
>  {
>  	const uint64_t hi_qw_tmpl = (I40E_TX_DESC_DTYPE_DATA |
> @@ -807,7 +807,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue,
> struct rte_mbuf **tx_pkts,
>  				 uint16_t nb_pkts)
>  {
>  	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
> -	volatile struct i40e_tx_desc *txdp;
> +	volatile struct ci_tx_desc *txdp;
>  	struct ci_tx_entry_vec *txep;
>  	uint16_t n, nb_commit, tx_id;
>  	uint64_t flags = I40E_TD_CMD;
> diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
> b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
> index b5be0c1b59..6404b70c56 100644
> --- a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
> +++ b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
> @@ -597,7 +597,7 @@ i40e_recv_scattered_pkts_vec(void *rx_queue, struct
> rte_mbuf **rx_pkts,
>  }
> 
>  static inline void
> -vtx1(volatile struct i40e_tx_desc *txdp,
> +vtx1(volatile struct ci_tx_desc *txdp,
>  		struct rte_mbuf *pkt, uint64_t flags)
>  {
>  	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
> @@ -609,7 +609,7 @@ vtx1(volatile struct i40e_tx_desc *txdp,
>  }
> 
>  static inline void
> -vtx(volatile struct i40e_tx_desc *txdp, struct rte_mbuf **pkt,
> +vtx(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkt,
>  		uint16_t nb_pkts,  uint64_t flags)
>  {
>  	int i;
> @@ -623,7 +623,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict
> tx_queue,
>  	struct rte_mbuf **__rte_restrict tx_pkts, uint16_t nb_pkts)
>  {
>  	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
> -	volatile struct i40e_tx_desc *txdp;
> +	volatile struct ci_tx_desc *txdp;
>  	struct ci_tx_entry_vec *txep;
>  	uint16_t n, nb_commit, tx_id;
>  	uint64_t flags = I40E_TD_CMD;
> diff --git a/drivers/net/intel/iavf/iavf_rxtx.c
> b/drivers/net/intel/iavf/iavf_rxtx.c
> index 4b763627bc..e4421a9932 100644
> --- a/drivers/net/intel/iavf/iavf_rxtx.c
> +++ b/drivers/net/intel/iavf/iavf_rxtx.c
> @@ -267,7 +267,7 @@ reset_tx_queue(struct ci_tx_queue *txq)
>  	}
> 
>  	txe = txq->sw_ring;
> -	size = sizeof(struct iavf_tx_desc) * txq->nb_tx_desc;
> +	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
>  	for (i = 0; i < size; i++)
>  		((volatile char *)txq->iavf_tx_ring)[i] = 0;
> 
> @@ -827,7 +827,7 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
>  	}
> 
>  	/* Allocate TX hardware ring descriptors. */
> -	ring_size = sizeof(struct iavf_tx_desc) * IAVF_MAX_RING_DESC;
> +	ring_size = sizeof(struct ci_tx_desc) * IAVF_MAX_RING_DESC;
>  	ring_size = RTE_ALIGN(ring_size, IAVF_DMA_MEM_ALIGN);
>  	mz = rte_eth_dma_zone_reserve(dev, "iavf_tx_ring", queue_idx,
>  				      ring_size, IAVF_RING_BASE_ALIGN,
> @@ -839,7 +839,7 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
>  		return -ENOMEM;
>  	}
>  	txq->tx_ring_dma = mz->iova;
> -	txq->iavf_tx_ring = (struct iavf_tx_desc *)mz->addr;
> +	txq->iavf_tx_ring = (struct ci_tx_desc *)mz->addr;
> 
>  	txq->mz = mz;
>  	reset_tx_queue(txq);
> @@ -2333,7 +2333,7 @@ iavf_xmit_cleanup(struct ci_tx_queue *txq)
>  	uint16_t desc_to_clean_to;
>  	uint16_t nb_tx_to_clean;
> 
> -	volatile struct iavf_tx_desc *txd = txq->iavf_tx_ring;
> +	volatile struct ci_tx_desc *txd = txq->iavf_tx_ring;
> 
>  	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
>  	if (desc_to_clean_to >= nb_tx_desc)
> @@ -2723,7 +2723,7 @@ iavf_calc_pkt_desc(struct rte_mbuf *tx_pkt)
>  }
> 
>  static inline void
> -iavf_fill_data_desc(volatile struct iavf_tx_desc *desc,
> +iavf_fill_data_desc(volatile struct ci_tx_desc *desc,
>  	uint64_t desc_template,	uint16_t buffsz,
>  	uint64_t buffer_addr)
>  {
> @@ -2756,7 +2756,7 @@ uint16_t
>  iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
>  {
>  	struct ci_tx_queue *txq = tx_queue;
> -	volatile struct iavf_tx_desc *txr = txq->iavf_tx_ring;
> +	volatile struct ci_tx_desc *txr = txq->iavf_tx_ring;
>  	struct ci_tx_entry *txe_ring = txq->sw_ring;
>  	struct ci_tx_entry *txe, *txn;
>  	struct rte_mbuf *mb, *mb_seg;
> @@ -2774,7 +2774,7 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
>  	txe = &txe_ring[desc_idx];
> 
>  	for (idx = 0; idx < nb_pkts; idx++) {
> -		volatile struct iavf_tx_desc *ddesc;
> +		volatile struct ci_tx_desc *ddesc;
>  		struct iavf_ipsec_crypto_pkt_metadata *ipsec_md;
> 
>  		uint16_t nb_desc_ctx, nb_desc_ipsec;
> @@ -2895,7 +2895,7 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
>  		mb_seg = mb;
> 
>  		do {
> -			ddesc = (volatile struct iavf_tx_desc *)
> +			ddesc = (volatile struct ci_tx_desc *)
>  					&txr[desc_idx];
> 
>  			txn = &txe_ring[txe->next_id];
> diff --git a/drivers/net/intel/iavf/iavf_rxtx.h
> b/drivers/net/intel/iavf/iavf_rxtx.h
> index e1f78dcde0..dd6d884fc1 100644
> --- a/drivers/net/intel/iavf/iavf_rxtx.h
> +++ b/drivers/net/intel/iavf/iavf_rxtx.h
> @@ -678,7 +678,7 @@ void iavf_dump_tx_descriptor(const struct
> ci_tx_queue *txq,
>  			    const volatile void *desc, uint16_t tx_id)
>  {
>  	const char *name;
> -	const volatile struct iavf_tx_desc *tx_desc = desc;
> +	const volatile struct ci_tx_desc *tx_desc = desc;
>  	enum iavf_tx_desc_dtype_value type;
> 
> 
> diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
> b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
> index e29958e0bc..5b62d51cf7 100644
> --- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
> +++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
> @@ -1630,7 +1630,7 @@
> iavf_recv_scattered_pkts_vec_avx2_flex_rxd_offload(void *rx_queue,
> 
> 
>  static __rte_always_inline void
> -iavf_vtx1(volatile struct iavf_tx_desc *txdp,
> +iavf_vtx1(volatile struct ci_tx_desc *txdp,
>  	  struct rte_mbuf *pkt, uint64_t flags, bool offload, uint8_t vlan_flag)
>  {
>  	uint64_t high_qw =
> @@ -1646,7 +1646,7 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp,
>  }
> 
>  static __rte_always_inline void
> -iavf_vtx(volatile struct iavf_tx_desc *txdp,
> +iavf_vtx(volatile struct ci_tx_desc *txdp,
>  	 struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags, bool
> offload, uint8_t vlan_flag)
>  {
>  	const uint64_t hi_qw_tmpl = (IAVF_TX_DESC_DTYPE_DATA |
> @@ -1713,7 +1713,7 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue,
> struct rte_mbuf **tx_pkts,
>  			       uint16_t nb_pkts, bool offload)
>  {
>  	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
> -	volatile struct iavf_tx_desc *txdp;
> +	volatile struct ci_tx_desc *txdp;
>  	struct ci_tx_entry_vec *txep;
>  	uint16_t n, nb_commit, tx_id;
>  	/* bit2 is reserved and must be set to 1 according to Spec */
> diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
> b/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
> index 7c0907b7cf..d79d96c7b7 100644
> --- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
> +++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
> @@ -1840,7 +1840,7 @@ tx_backlog_entry_avx512(struct ci_tx_entry_vec
> *txep,
>  }
> 
>  static __rte_always_inline void
> -iavf_vtx1(volatile struct iavf_tx_desc *txdp,
> +iavf_vtx1(volatile struct ci_tx_desc *txdp,
>  	  struct rte_mbuf *pkt, uint64_t flags,
>  	  bool offload, uint8_t vlan_flag)
>  {
> @@ -1859,7 +1859,7 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp,
>  #define IAVF_TX_LEN_MASK 0xAA
>  #define IAVF_TX_OFF_MASK 0x55
>  static __rte_always_inline void
> -iavf_vtx(volatile struct iavf_tx_desc *txdp,
> +iavf_vtx(volatile struct ci_tx_desc *txdp,
>  		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags,
>  		bool offload, uint8_t vlan_flag)
>  {
> @@ -2068,7 +2068,7 @@ iavf_fill_ctx_desc_tunnelling_field(volatile uint64_t
> *qw0,
>  }
> 
>  static __rte_always_inline void
> -ctx_vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt,
> +ctx_vtx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf *pkt,
>  		uint64_t flags, bool offload, uint8_t vlan_flag)
>  {
>  	uint64_t high_ctx_qw = IAVF_TX_DESC_DTYPE_CONTEXT;
> @@ -2106,7 +2106,7 @@ ctx_vtx1(volatile struct iavf_tx_desc *txdp, struct
> rte_mbuf *pkt,
>  }
> 
>  static __rte_always_inline void
> -ctx_vtx(volatile struct iavf_tx_desc *txdp,
> +ctx_vtx(volatile struct ci_tx_desc *txdp,
>  		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags,
>  		bool offload, uint8_t vlan_flag)
>  {
> @@ -2203,7 +2203,7 @@ iavf_xmit_fixed_burst_vec_avx512(void
> *tx_queue, struct rte_mbuf **tx_pkts,
>  				 uint16_t nb_pkts, bool offload)
>  {
>  	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
> -	volatile struct iavf_tx_desc *txdp;
> +	volatile struct ci_tx_desc *txdp;
>  	struct ci_tx_entry_vec *txep;
>  	uint16_t n, nb_commit, tx_id;
>  	/* bit2 is reserved and must be set to 1 according to Spec */
> @@ -2271,7 +2271,7 @@ iavf_xmit_fixed_burst_vec_avx512_ctx(void
> *tx_queue, struct rte_mbuf **tx_pkts,
>  				 uint16_t nb_pkts, bool offload)
>  {
>  	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
> -	volatile struct iavf_tx_desc *txdp;
> +	volatile struct ci_tx_desc *txdp;
>  	struct ci_tx_entry_vec *txep;
>  	uint16_t n, nb_commit, nb_mbuf, tx_id;
>  	/* bit2 is reserved and must be set to 1 according to Spec */
> diff --git a/drivers/net/intel/ice/ice_dcf_ethdev.c
> b/drivers/net/intel/ice/ice_dcf_ethdev.c
> index 81da5a4656..ab1d499cef 100644
> --- a/drivers/net/intel/ice/ice_dcf_ethdev.c
> +++ b/drivers/net/intel/ice/ice_dcf_ethdev.c
> @@ -399,7 +399,7 @@ reset_tx_queue(struct ci_tx_queue *txq)
>  	}
> 
>  	txe = txq->sw_ring;
> -	size = sizeof(struct ice_tx_desc) * txq->nb_tx_desc;
> +	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
>  	for (i = 0; i < size; i++)
>  		((volatile char *)txq->ice_tx_ring)[i] = 0;
> 
> diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
> index f3bc79423d..74b80e7df3 100644
> --- a/drivers/net/intel/ice/ice_rxtx.c
> +++ b/drivers/net/intel/ice/ice_rxtx.c
> @@ -1115,13 +1115,13 @@ ice_reset_tx_queue(struct ci_tx_queue *txq)
>  	}
> 
>  	txe = txq->sw_ring;
> -	size = sizeof(struct ice_tx_desc) * txq->nb_tx_desc;
> +	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
>  	for (i = 0; i < size; i++)
>  		((volatile char *)txq->ice_tx_ring)[i] = 0;
> 
>  	prev = (uint16_t)(txq->nb_tx_desc - 1);
>  	for (i = 0; i < txq->nb_tx_desc; i++) {
> -		volatile struct ice_tx_desc *txd = &txq->ice_tx_ring[i];
> +		volatile struct ci_tx_desc *txd = &txq->ice_tx_ring[i];
> 
>  		txd->cmd_type_offset_bsz =
>  			rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE);
> @@ -1623,7 +1623,7 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
>  	}
> 
>  	/* Allocate TX hardware ring descriptors. */
> -	ring_size = sizeof(struct ice_tx_desc) *
> ICE_MAX_NUM_DESC_BY_MAC(hw);
> +	ring_size = sizeof(struct ci_tx_desc) *
> ICE_MAX_NUM_DESC_BY_MAC(hw);
>  	ring_size = RTE_ALIGN(ring_size, ICE_DMA_MEM_ALIGN);
>  	tz = rte_eth_dma_zone_reserve(dev, "ice_tx_ring", queue_idx,
>  				      ring_size, ICE_RING_BASE_ALIGN,
> @@ -2619,7 +2619,7 @@ ice_fdir_setup_tx_resources(struct ice_pf *pf)
>  	}
> 
>  	/* Allocate TX hardware ring descriptors. */
> -	ring_size = sizeof(struct ice_tx_desc) * ICE_FDIR_NUM_TX_DESC;
> +	ring_size = sizeof(struct ci_tx_desc) * ICE_FDIR_NUM_TX_DESC;
>  	ring_size = RTE_ALIGN(ring_size, ICE_DMA_MEM_ALIGN);
> 
>  	tz = rte_eth_dma_zone_reserve(dev, "fdir_tx_ring",
> @@ -2638,7 +2638,7 @@ ice_fdir_setup_tx_resources(struct ice_pf *pf)
>  	txq->ice_vsi = pf->fdir.fdir_vsi;
> 
>  	txq->tx_ring_dma = tz->iova;
> -	txq->ice_tx_ring = (struct ice_tx_desc *)tz->addr;
> +	txq->ice_tx_ring = (struct ci_tx_desc *)tz->addr;
>  	/*
>  	 * don't need to allocate software ring and reset for the fdir
>  	 * program queue just set the queue has been configured.
> @@ -3027,7 +3027,7 @@ static inline int
>  ice_xmit_cleanup(struct ci_tx_queue *txq)
>  {
>  	struct ci_tx_entry *sw_ring = txq->sw_ring;
> -	volatile struct ice_tx_desc *txd = txq->ice_tx_ring;
> +	volatile struct ci_tx_desc *txd = txq->ice_tx_ring;
>  	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
>  	uint16_t nb_tx_desc = txq->nb_tx_desc;
>  	uint16_t desc_to_clean_to;
> @@ -3148,8 +3148,8 @@ uint16_t
>  ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
>  {
>  	struct ci_tx_queue *txq;
> -	volatile struct ice_tx_desc *ice_tx_ring;
> -	volatile struct ice_tx_desc *txd;
> +	volatile struct ci_tx_desc *ice_tx_ring;
> +	volatile struct ci_tx_desc *txd;
>  	struct ci_tx_entry *sw_ring;
>  	struct ci_tx_entry *txe, *txn;
>  	struct rte_mbuf *tx_pkt;
> @@ -3312,7 +3312,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
> 
>  			while ((ol_flags & (RTE_MBUF_F_TX_TCP_SEG |
> RTE_MBUF_F_TX_UDP_SEG)) &&
>  				unlikely(slen > ICE_MAX_DATA_PER_TXD)) {
> -				txd->buf_addr =
> rte_cpu_to_le_64(buf_dma_addr);
> +				txd->buffer_addr =
> rte_cpu_to_le_64(buf_dma_addr);
>  				txd->cmd_type_offset_bsz =
>  				rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DATA
> |
>  				((uint64_t)td_cmd << ICE_TXD_QW1_CMD_S)
> |
> @@ -3331,7 +3331,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
>  				txn = &sw_ring[txe->next_id];
>  			}
> 
> -			txd->buf_addr = rte_cpu_to_le_64(buf_dma_addr);
> +			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
>  			txd->cmd_type_offset_bsz =
>  				rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DATA
> |
>  				((uint64_t)td_cmd << ICE_TXD_QW1_CMD_S)
> |
> @@ -3563,14 +3563,14 @@ ice_tx_done_cleanup(void *txq, uint32_t
> free_cnt)
> 
>  /* Populate 4 descriptors with data from 4 mbufs */
>  static inline void
> -tx4(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkts)
> +tx4(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
>  {
>  	uint64_t dma_addr;
>  	uint32_t i;
> 
>  	for (i = 0; i < 4; i++, txdp++, pkts++) {
>  		dma_addr = rte_mbuf_data_iova(*pkts);
> -		txdp->buf_addr = rte_cpu_to_le_64(dma_addr);
> +		txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
>  		txdp->cmd_type_offset_bsz =
>  			ice_build_ctob((uint32_t)ICE_TD_CMD, 0,
>  				       (*pkts)->data_len, 0);
> @@ -3579,12 +3579,12 @@ tx4(volatile struct ice_tx_desc *txdp, struct
> rte_mbuf **pkts)
> 
>  /* Populate 1 descriptor with data from 1 mbuf */
>  static inline void
> -tx1(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkts)
> +tx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
>  {
>  	uint64_t dma_addr;
> 
>  	dma_addr = rte_mbuf_data_iova(*pkts);
> -	txdp->buf_addr = rte_cpu_to_le_64(dma_addr);
> +	txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
>  	txdp->cmd_type_offset_bsz =
>  		ice_build_ctob((uint32_t)ICE_TD_CMD, 0,
>  			       (*pkts)->data_len, 0);
> @@ -3594,7 +3594,7 @@ static inline void
>  ice_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
>  		    uint16_t nb_pkts)
>  {
> -	volatile struct ice_tx_desc *txdp = &txq->ice_tx_ring[txq->tx_tail];
> +	volatile struct ci_tx_desc *txdp = &txq->ice_tx_ring[txq->tx_tail];
>  	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
>  	const int N_PER_LOOP = 4;
>  	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
> @@ -3627,7 +3627,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
>  	     struct rte_mbuf **tx_pkts,
>  	     uint16_t nb_pkts)
>  {
> -	volatile struct ice_tx_desc *txr = txq->ice_tx_ring;
> +	volatile struct ci_tx_desc *txr = txq->ice_tx_ring;
>  	uint16_t n = 0;
> 
>  	/**
> @@ -4882,7 +4882,7 @@ ice_fdir_programming(struct ice_pf *pf, struct
> ice_fltr_desc *fdir_desc)
>  	struct ci_tx_queue *txq = pf->fdir.txq;
>  	struct ci_rx_queue *rxq = pf->fdir.rxq;
>  	volatile struct ice_fltr_desc *fdirdp;
> -	volatile struct ice_tx_desc *txdp;
> +	volatile struct ci_tx_desc *txdp;
>  	uint32_t td_cmd;
>  	uint16_t i;
> 
> @@ -4892,7 +4892,7 @@ ice_fdir_programming(struct ice_pf *pf, struct
> ice_fltr_desc *fdir_desc)
>  	fdirdp->dtype_cmd_vsi_fdid = fdir_desc->dtype_cmd_vsi_fdid;
> 
>  	txdp = &txq->ice_tx_ring[txq->tx_tail + 1];
> -	txdp->buf_addr = rte_cpu_to_le_64(pf->fdir.dma_addr);
> +	txdp->buffer_addr = rte_cpu_to_le_64(pf->fdir.dma_addr);
>  	td_cmd = ICE_TX_DESC_CMD_EOP |
>  		ICE_TX_DESC_CMD_RS  |
>  		ICE_TX_DESC_CMD_DUMMY;
> diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
> b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
> index 0ba1d557ca..bef7bb00ba 100644
> --- a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
> +++ b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
> @@ -774,7 +774,7 @@ ice_recv_scattered_pkts_vec_avx2_offload(void
> *rx_queue,
>  }
> 
>  static __rte_always_inline void
> -ice_vtx1(volatile struct ice_tx_desc *txdp,
> +ice_vtx1(volatile struct ci_tx_desc *txdp,
>  	 struct rte_mbuf *pkt, uint64_t flags, bool offload)
>  {
>  	uint64_t high_qw =
> @@ -789,7 +789,7 @@ ice_vtx1(volatile struct ice_tx_desc *txdp,
>  }
> 
>  static __rte_always_inline void
> -ice_vtx(volatile struct ice_tx_desc *txdp,
> +ice_vtx(volatile struct ci_tx_desc *txdp,
>  	struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags, bool offload)
>  {
>  	const uint64_t hi_qw_tmpl = (ICE_TX_DESC_DTYPE_DATA |
> @@ -852,7 +852,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue,
> struct rte_mbuf **tx_pkts,
>  			      uint16_t nb_pkts, bool offload)
>  {
>  	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
> -	volatile struct ice_tx_desc *txdp;
> +	volatile struct ci_tx_desc *txdp;
>  	struct ci_tx_entry_vec *txep;
>  	uint16_t n, nb_commit, tx_id;
>  	uint64_t flags = ICE_TD_CMD;
> diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
> b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
> index 7c6fe82072..1f6bf5fc8e 100644
> --- a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
> +++ b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
> @@ -847,7 +847,7 @@ ice_recv_scattered_pkts_vec_avx512_offload(void
> *rx_queue,
>  }
> 
>  static __rte_always_inline void
> -ice_vtx1(volatile struct ice_tx_desc *txdp,
> +ice_vtx1(volatile struct ci_tx_desc *txdp,
>  	 struct rte_mbuf *pkt, uint64_t flags, bool do_offload)
>  {
>  	uint64_t high_qw =
> @@ -863,7 +863,7 @@ ice_vtx1(volatile struct ice_tx_desc *txdp,
>  }
> 
>  static __rte_always_inline void
> -ice_vtx(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkt,
> +ice_vtx(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkt,
>  	uint16_t nb_pkts,  uint64_t flags, bool do_offload)
>  {
>  	const uint64_t hi_qw_tmpl = (ICE_TX_DESC_DTYPE_DATA |
> @@ -916,7 +916,7 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue,
> struct rte_mbuf **tx_pkts,
>  				uint16_t nb_pkts, bool do_offload)
>  {
>  	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
> -	volatile struct ice_tx_desc *txdp;
> +	volatile struct ci_tx_desc *txdp;
>  	struct ci_tx_entry_vec *txep;
>  	uint16_t n, nb_commit, tx_id;
>  	uint64_t flags = ICE_TD_CMD;
> diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c
> b/drivers/net/intel/idpf/idpf_common_rxtx.c
> index 797ee515dd..be3c1ef216 100644
> --- a/drivers/net/intel/idpf/idpf_common_rxtx.c
> +++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
> @@ -264,13 +264,13 @@ idpf_qc_single_tx_queue_reset(struct ci_tx_queue
> *txq)
>  	}
> 
>  	txe = txq->sw_ring;
> -	size = sizeof(struct idpf_base_tx_desc) * txq->nb_tx_desc;
> +	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
>  	for (i = 0; i < size; i++)
>  		((volatile char *)txq->idpf_tx_ring)[i] = 0;
> 
>  	prev = (uint16_t)(txq->nb_tx_desc - 1);
>  	for (i = 0; i < txq->nb_tx_desc; i++) {
> -		txq->idpf_tx_ring[i].qw1 =
> +		txq->idpf_tx_ring[i].cmd_type_offset_bsz =
> 
> 	rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE);
>  		txe[i].mbuf =  NULL;
>  		txe[i].last_id = i;
> @@ -1335,14 +1335,14 @@ idpf_xmit_cleanup(struct ci_tx_queue *txq)
>  	uint16_t desc_to_clean_to;
>  	uint16_t nb_tx_to_clean;
> 
> -	volatile struct idpf_base_tx_desc *txd = txq->idpf_tx_ring;
> +	volatile struct ci_tx_desc *txd = txq->idpf_tx_ring;
> 
>  	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
>  	if (desc_to_clean_to >= nb_tx_desc)
>  		desc_to_clean_to = (uint16_t)(desc_to_clean_to -
> nb_tx_desc);
> 
>  	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
> -	if ((txd[desc_to_clean_to].qw1 &
> +	if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
>  	     rte_cpu_to_le_64(IDPF_TXD_QW1_DTYPE_M)) !=
>  	    rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE)) {
>  		TX_LOG(DEBUG, "TX descriptor %4u is not done "
> @@ -1358,7 +1358,7 @@ idpf_xmit_cleanup(struct ci_tx_queue *txq)
>  		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
>  					    last_desc_cleaned);
> 
> -	txd[desc_to_clean_to].qw1 = 0;
> +	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
> 
>  	txq->last_desc_cleaned = desc_to_clean_to;
>  	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
> @@ -1372,8 +1372,8 @@ uint16_t
>  idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>  			  uint16_t nb_pkts)
>  {
> -	volatile struct idpf_base_tx_desc *txd;
> -	volatile struct idpf_base_tx_desc *txr;
> +	volatile struct ci_tx_desc *txd;
> +	volatile struct ci_tx_desc *txr;
>  	union idpf_tx_offload tx_offload = {0};
>  	struct ci_tx_entry *txe, *txn;
>  	struct ci_tx_entry *sw_ring;
> @@ -1491,8 +1491,8 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct
> rte_mbuf **tx_pkts,
>  			/* Setup TX Descriptor */
>  			slen = m_seg->data_len;
>  			buf_dma_addr = rte_mbuf_data_iova(m_seg);
> -			txd->buf_addr = rte_cpu_to_le_64(buf_dma_addr);
> -			txd->qw1 =
> rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DATA |
> +			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
> +			txd->cmd_type_offset_bsz =
> rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DATA |
>  				((uint64_t)td_cmd  <<
> IDPF_TXD_QW1_CMD_S) |
>  				((uint64_t)td_offset <<
> IDPF_TXD_QW1_OFFSET_S) |
>  				((uint64_t)slen <<
> IDPF_TXD_QW1_TX_BUF_SZ_S));
> @@ -1519,7 +1519,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct
> rte_mbuf **tx_pkts,
>  			txq->nb_tx_used = 0;
>  		}
> 
> -		txd->qw1 |= rte_cpu_to_le_16(td_cmd <<
> IDPF_TXD_QW1_CMD_S);
> +		txd->cmd_type_offset_bsz |= rte_cpu_to_le_16(td_cmd <<
> IDPF_TXD_QW1_CMD_S);
>  	}
> 
>  end_of_tx:
> diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.h
> b/drivers/net/intel/idpf/idpf_common_rxtx.h
> index 7c6ff5d047..2f2fa153b2 100644
> --- a/drivers/net/intel/idpf/idpf_common_rxtx.h
> +++ b/drivers/net/intel/idpf/idpf_common_rxtx.h
> @@ -182,7 +182,7 @@ union idpf_tx_offload {
>  };
> 
>  union idpf_tx_desc {
> -	struct idpf_base_tx_desc *tx_ring;
> +	struct ci_tx_desc *tx_ring;
>  	struct idpf_flex_tx_sched_desc *desc_ring;
>  	struct idpf_splitq_tx_compl_desc *compl_ring;
>  };
> diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
> b/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
> index 21c8f79254..5f5d538dcb 100644
> --- a/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
> +++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
> @@ -483,7 +483,7 @@ idpf_dp_singleq_recv_pkts_avx2(void *rx_queue,
> struct rte_mbuf **rx_pkts, uint16
>  }
> 
>  static inline void
> -idpf_singleq_vtx1(volatile struct idpf_base_tx_desc *txdp,
> +idpf_singleq_vtx1(volatile struct ci_tx_desc *txdp,
>  		  struct rte_mbuf *pkt, uint64_t flags)
>  {
>  	uint64_t high_qw =
> @@ -497,7 +497,7 @@ idpf_singleq_vtx1(volatile struct idpf_base_tx_desc
> *txdp,
>  }
> 
>  static inline void
> -idpf_singleq_vtx(volatile struct idpf_base_tx_desc *txdp,
> +idpf_singleq_vtx(volatile struct ci_tx_desc *txdp,
>  		 struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
>  {
>  	const uint64_t hi_qw_tmpl = (IDPF_TX_DESC_DTYPE_DATA |
> @@ -556,7 +556,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void
> *tx_queue, struct rte_mbuf **tx_pkts
>  				       uint16_t nb_pkts)
>  {
>  	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
> -	volatile struct idpf_base_tx_desc *txdp;
> +	volatile struct ci_tx_desc *txdp;
>  	struct ci_tx_entry_vec *txep;
>  	uint16_t n, nb_commit, tx_id;
>  	uint64_t flags = IDPF_TX_DESC_CMD_EOP;
> @@ -604,7 +604,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void
> *tx_queue, struct rte_mbuf **tx_pkts
> 
>  	tx_id = (uint16_t)(tx_id + nb_commit);
>  	if (tx_id > txq->tx_next_rs) {
> -		txq->idpf_tx_ring[txq->tx_next_rs].qw1 |=
> +		txq->idpf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
> 
> 	rte_cpu_to_le_64(((uint64_t)IDPF_TX_DESC_CMD_RS) <<
>  					 IDPF_TXD_QW1_CMD_S);
>  		txq->tx_next_rs =
> diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
> b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
> index bc2cadd738..c1ec3d1222 100644
> --- a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
> +++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
> @@ -1000,7 +1000,7 @@ idpf_dp_splitq_recv_pkts_avx512(void *rx_queue,
> struct rte_mbuf **rx_pkts,
>  }
> 
>  static __rte_always_inline void
> -idpf_singleq_vtx1(volatile struct idpf_base_tx_desc *txdp,
> +idpf_singleq_vtx1(volatile struct ci_tx_desc *txdp,
>  	  struct rte_mbuf *pkt, uint64_t flags)
>  {
>  	uint64_t high_qw =
> @@ -1016,7 +1016,7 @@ idpf_singleq_vtx1(volatile struct idpf_base_tx_desc
> *txdp,
>  #define IDPF_TX_LEN_MASK 0xAA
>  #define IDPF_TX_OFF_MASK 0x55
>  static __rte_always_inline void
> -idpf_singleq_vtx(volatile struct idpf_base_tx_desc *txdp,
> +idpf_singleq_vtx(volatile struct ci_tx_desc *txdp,
>  	 struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
>  {
>  	const uint64_t hi_qw_tmpl = (IDPF_TX_DESC_DTYPE_DATA  |
> @@ -1072,7 +1072,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void
> *tx_queue, struct rte_mbuf **tx_pk
>  					 uint16_t nb_pkts)
>  {
>  	struct ci_tx_queue *txq = tx_queue;
> -	volatile struct idpf_base_tx_desc *txdp;
> +	volatile struct ci_tx_desc *txdp;
>  	struct ci_tx_entry_vec *txep;
>  	uint16_t n, nb_commit, tx_id;
>  	uint64_t flags = IDPF_TX_DESC_CMD_EOP;
> @@ -1123,7 +1123,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void
> *tx_queue, struct rte_mbuf **tx_pk
> 
>  	tx_id = (uint16_t)(tx_id + nb_commit);
>  	if (tx_id > txq->tx_next_rs) {
> -		txq->idpf_tx_ring[txq->tx_next_rs].qw1 |=
> +		txq->idpf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
> 
> 	rte_cpu_to_le_64(((uint64_t)IDPF_TX_DESC_CMD_RS) <<
>  					 IDPF_TXD_QW1_CMD_S);
>  		txq->tx_next_rs =
> diff --git a/drivers/net/intel/idpf/idpf_rxtx.c
> b/drivers/net/intel/idpf/idpf_rxtx.c
> index cee454244f..8aa44585fe 100644
> --- a/drivers/net/intel/idpf/idpf_rxtx.c
> +++ b/drivers/net/intel/idpf/idpf_rxtx.c
> @@ -72,7 +72,7 @@ idpf_dma_zone_reserve(struct rte_eth_dev *dev,
> uint16_t queue_idx,
>  			ring_size = RTE_ALIGN(len * sizeof(struct
> idpf_flex_tx_sched_desc),
>  					      IDPF_DMA_MEM_ALIGN);
>  		else
> -			ring_size = RTE_ALIGN(len * sizeof(struct
> idpf_base_tx_desc),
> +			ring_size = RTE_ALIGN(len * sizeof(struct ci_tx_desc),
>  					      IDPF_DMA_MEM_ALIGN);
>  		rte_memcpy(ring_name, "idpf Tx ring", sizeof("idpf Tx ring"));
>  		break;
> diff --git a/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
> b/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
> index 425f0792a1..4702061484 100644
> --- a/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
> +++ b/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
> @@ -31,7 +31,7 @@ idpf_tx_desc_done(struct ci_tx_queue *txq, uint16_t
> idx)
>  	if (txq->complq != NULL)
>  		return 1;
> 
> -	return (txq->idpf_tx_ring[idx].qw1 &
> +	return (txq->idpf_tx_ring[idx].cmd_type_offset_bsz &
>  			rte_cpu_to_le_64(IDPF_TXD_QW1_DTYPE_M)) ==
> 
> 	rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE);
>  }
> --
> 2.51.0


^ permalink raw reply	[flat|nested] 274+ messages in thread

* RE: [PATCH v3 02/36] net/intel: use common Tx ring structure
  2026-01-30 11:41   ` [PATCH v3 02/36] net/intel: use common Tx ring structure Bruce Richardson
@ 2026-02-06  9:59     ` Loftus, Ciara
  0 siblings, 0 replies; 274+ messages in thread
From: Loftus, Ciara @ 2026-02-06  9:59 UTC (permalink / raw)
  To: Richardson, Bruce, dev@dpdk.org
  Cc: Richardson, Bruce, Shetty, Praveen, Medvedkin, Vladimir,
	Burakov, Anatoly, Wu, Jingjing

> Subject: [PATCH v3 02/36] net/intel: use common Tx ring structure
> 
> Rather than having separate per-driver ring pointers in a union, since
> we now have a common descriptor type, we can merge all but the ixgbe
> pointer into one value.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

Acked-by: Ciara Loftus <ciara.loftus@intel.com>

> ---
>  drivers/net/intel/common/tx.h                 |  5 +--
>  drivers/net/intel/cpfl/cpfl_rxtx.c            |  2 +-
>  drivers/net/intel/i40e/i40e_fdir.c            |  6 ++--
>  drivers/net/intel/i40e/i40e_rxtx.c            | 22 ++++++------
>  .../net/intel/i40e/i40e_rxtx_vec_altivec.c    |  6 ++--
>  drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c   |  6 ++--
>  drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c |  6 ++--
>  drivers/net/intel/i40e/i40e_rxtx_vec_common.h |  2 +-
>  drivers/net/intel/i40e/i40e_rxtx_vec_neon.c   |  6 ++--
>  drivers/net/intel/iavf/iavf_rxtx.c            | 14 ++++----
>  drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c   |  6 ++--
>  drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c | 12 +++----
>  drivers/net/intel/iavf/iavf_rxtx_vec_common.h |  2 +-
>  drivers/net/intel/ice/ice_dcf_ethdev.c        |  4 +--
>  drivers/net/intel/ice/ice_rxtx.c              | 34 +++++++++----------
>  drivers/net/intel/ice/ice_rxtx_vec_avx2.c     |  6 ++--
>  drivers/net/intel/ice/ice_rxtx_vec_avx512.c   |  6 ++--
>  drivers/net/intel/ice/ice_rxtx_vec_common.h   |  2 +-
>  drivers/net/intel/idpf/idpf_common_rxtx.c     |  8 ++---
>  .../net/intel/idpf/idpf_common_rxtx_avx2.c    |  6 ++--
>  .../net/intel/idpf/idpf_common_rxtx_avx512.c  |  6 ++--
>  drivers/net/intel/idpf/idpf_rxtx.c            |  2 +-
>  drivers/net/intel/idpf/idpf_rxtx_vec_common.h |  2 +-
>  23 files changed, 84 insertions(+), 87 deletions(-)
> 
> diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
> index d7561a2bbb..8cf63e59ab 100644
> --- a/drivers/net/intel/common/tx.h
> +++ b/drivers/net/intel/common/tx.h
> @@ -41,10 +41,7 @@ typedef void (*ice_tx_release_mbufs_t)(struct
> ci_tx_queue *txq);
> 
>  struct ci_tx_queue {
>  	union { /* TX ring virtual address */
> -		volatile struct ci_tx_desc *i40e_tx_ring;
> -		volatile struct ci_tx_desc *iavf_tx_ring;
> -		volatile struct ci_tx_desc *ice_tx_ring;
> -		volatile struct ci_tx_desc *idpf_tx_ring;
> +		volatile struct ci_tx_desc *ci_tx_ring;
>  		volatile union ixgbe_adv_tx_desc *ixgbe_tx_ring;
>  	};
>  	volatile uint8_t *qtx_tail;               /* register address of tail */
> diff --git a/drivers/net/intel/cpfl/cpfl_rxtx.c
> b/drivers/net/intel/cpfl/cpfl_rxtx.c
> index 78bc3e9b49..bc5bec65f0 100644
> --- a/drivers/net/intel/cpfl/cpfl_rxtx.c
> +++ b/drivers/net/intel/cpfl/cpfl_rxtx.c
> @@ -606,7 +606,7 @@ cpfl_tx_queue_setup(struct rte_eth_dev *dev,
> uint16_t queue_idx,
>  	}
> 
>  	if (!is_splitq) {
> -		txq->idpf_tx_ring = mz->addr;
> +		txq->ci_tx_ring = mz->addr;
>  		idpf_qc_single_tx_queue_reset(txq);
>  	} else {
>  		txq->desc_ring = mz->addr;
> diff --git a/drivers/net/intel/i40e/i40e_fdir.c
> b/drivers/net/intel/i40e/i40e_fdir.c
> index 605df73c9e..8a01aec0e2 100644
> --- a/drivers/net/intel/i40e/i40e_fdir.c
> +++ b/drivers/net/intel/i40e/i40e_fdir.c
> @@ -1380,7 +1380,7 @@ i40e_find_available_buffer(struct rte_eth_dev
> *dev)
>  		volatile struct ci_tx_desc *tmp_txdp;
> 
>  		tmp_tail = txq->tx_tail;
> -		tmp_txdp = &txq->i40e_tx_ring[tmp_tail + 1];
> +		tmp_txdp = &txq->ci_tx_ring[tmp_tail + 1];
> 
>  		do {
>  			if ((tmp_txdp->cmd_type_offset_bsz &
> @@ -1637,7 +1637,7 @@ i40e_flow_fdir_filter_programming(struct i40e_pf
> *pf,
> 
>  	PMD_DRV_LOG(INFO, "filling filter programming descriptor.");
>  	fdirdp = (volatile struct i40e_filter_program_desc *)
> -				(&txq->i40e_tx_ring[txq->tx_tail]);
> +				(&txq->ci_tx_ring[txq->tx_tail]);
> 
>  	fdirdp->qindex_flex_ptype_vsi =
>  			rte_cpu_to_le_32((fdir_action->rx_queue <<
> @@ -1707,7 +1707,7 @@ i40e_flow_fdir_filter_programming(struct i40e_pf
> *pf,
>  	fdirdp->fd_id = rte_cpu_to_le_32(filter->soft_id);
> 
>  	PMD_DRV_LOG(INFO, "filling transmit descriptor.");
> -	txdp = &txq->i40e_tx_ring[txq->tx_tail + 1];
> +	txdp = &txq->ci_tx_ring[txq->tx_tail + 1];
>  	txdp->buffer_addr = rte_cpu_to_le_64(pf->fdir.dma_addr[txq->tx_tail
> >> 1]);
> 
>  	td_cmd = I40E_TX_DESC_CMD_EOP |
> diff --git a/drivers/net/intel/i40e/i40e_rxtx.c
> b/drivers/net/intel/i40e/i40e_rxtx.c
> index 92d49ccb79..210fc0201e 100644
> --- a/drivers/net/intel/i40e/i40e_rxtx.c
> +++ b/drivers/net/intel/i40e/i40e_rxtx.c
> @@ -388,7 +388,7 @@ static inline int
>  i40e_xmit_cleanup(struct ci_tx_queue *txq)
>  {
>  	struct ci_tx_entry *sw_ring = txq->sw_ring;
> -	volatile struct ci_tx_desc *txd = txq->i40e_tx_ring;
> +	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
>  	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
>  	uint16_t nb_tx_desc = txq->nb_tx_desc;
>  	uint16_t desc_to_clean_to;
> @@ -1112,7 +1112,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
> 
>  	txq = tx_queue;
>  	sw_ring = txq->sw_ring;
> -	txr = txq->i40e_tx_ring;
> +	txr = txq->ci_tx_ring;
>  	tx_id = txq->tx_tail;
>  	txe = &sw_ring[tx_id];
> 
> @@ -1347,7 +1347,7 @@ i40e_tx_free_bufs(struct ci_tx_queue *txq)
>  	const uint16_t k = RTE_ALIGN_FLOOR(tx_rs_thresh,
> I40E_TX_MAX_FREE_BUF_SZ);
>  	const uint16_t m = tx_rs_thresh % I40E_TX_MAX_FREE_BUF_SZ;
> 
> -	if ((txq->i40e_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
> +	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
>  			rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) !=
> 
> 	rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE))
>  		return 0;
> @@ -1431,7 +1431,7 @@ i40e_tx_fill_hw_ring(struct ci_tx_queue *txq,
>  		     struct rte_mbuf **pkts,
>  		     uint16_t nb_pkts)
>  {
> -	volatile struct ci_tx_desc *txdp = &txq->i40e_tx_ring[txq->tx_tail];
> +	volatile struct ci_tx_desc *txdp = &txq->ci_tx_ring[txq->tx_tail];
>  	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
>  	const int N_PER_LOOP = 4;
>  	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
> @@ -1459,7 +1459,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
>  	     struct rte_mbuf **tx_pkts,
>  	     uint16_t nb_pkts)
>  {
> -	volatile struct ci_tx_desc *txr = txq->i40e_tx_ring;
> +	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
>  	uint16_t n = 0;
> 
>  	/**
> @@ -2421,7 +2421,7 @@ i40e_dev_tx_descriptor_status(void *tx_queue,
> uint16_t offset)
>  			desc -= txq->nb_tx_desc;
>  	}
> 
> -	status = &txq->i40e_tx_ring[desc].cmd_type_offset_bsz;
> +	status = &txq->ci_tx_ring[desc].cmd_type_offset_bsz;
>  	mask = rte_le_to_cpu_64(I40E_TXD_QW1_DTYPE_MASK);
>  	expect = rte_cpu_to_le_64(
>  		I40E_TX_DESC_DTYPE_DESC_DONE <<
> I40E_TXD_QW1_DTYPE_SHIFT);
> @@ -2618,7 +2618,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev
> *dev,
>  	/* Allocate TX hardware ring descriptors. */
>  	ring_size = sizeof(struct ci_tx_desc) * I40E_MAX_RING_DESC;
>  	ring_size = RTE_ALIGN(ring_size, I40E_DMA_MEM_ALIGN);
> -	tz = rte_eth_dma_zone_reserve(dev, "i40e_tx_ring", queue_idx,
> +	tz = rte_eth_dma_zone_reserve(dev, "ci_tx_ring", queue_idx,
>  			      ring_size, I40E_RING_BASE_ALIGN, socket_id);
>  	if (!tz) {
>  		i40e_tx_queue_release(txq);
> @@ -2640,7 +2640,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev
> *dev,
>  	txq->tx_deferred_start = tx_conf->tx_deferred_start;
> 
>  	txq->tx_ring_dma = tz->iova;
> -	txq->i40e_tx_ring = (struct ci_tx_desc *)tz->addr;
> +	txq->ci_tx_ring = (struct ci_tx_desc *)tz->addr;
> 
>  	/* Allocate software ring */
>  	txq->sw_ring =
> @@ -2915,11 +2915,11 @@ i40e_reset_tx_queue(struct ci_tx_queue *txq)
>  	txe = txq->sw_ring;
>  	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
>  	for (i = 0; i < size; i++)
> -		((volatile char *)txq->i40e_tx_ring)[i] = 0;
> +		((volatile char *)txq->ci_tx_ring)[i] = 0;
> 
>  	prev = (uint16_t)(txq->nb_tx_desc - 1);
>  	for (i = 0; i < txq->nb_tx_desc; i++) {
> -		volatile struct ci_tx_desc *txd = &txq->i40e_tx_ring[i];
> +		volatile struct ci_tx_desc *txd = &txq->ci_tx_ring[i];
> 
>  		txd->cmd_type_offset_bsz =
> 
> 	rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE);
> @@ -3240,7 +3240,7 @@ i40e_fdir_setup_tx_resources(struct i40e_pf *pf)
>  	txq->i40e_vsi = pf->fdir.fdir_vsi;
> 
>  	txq->tx_ring_dma = tz->iova;
> -	txq->i40e_tx_ring = (struct ci_tx_desc *)tz->addr;
> +	txq->ci_tx_ring = (struct ci_tx_desc *)tz->addr;
> 
>  	/*
>  	 * don't need to allocate software ring and reset for the fdir
> diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
> b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
> index ef5b252898..81e9e2bc0b 100644
> --- a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
> +++ b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
> @@ -489,7 +489,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct
> rte_mbuf **tx_pkts,
>  		return 0;
> 
>  	tx_id = txq->tx_tail;
> -	txdp = &txq->i40e_tx_ring[tx_id];
> +	txdp = &txq->ci_tx_ring[tx_id];
>  	txep = &txq->sw_ring_vec[tx_id];
> 
>  	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
> @@ -509,7 +509,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct
> rte_mbuf **tx_pkts,
>  		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
> 
>  		/* avoid reach the end of ring */
> -		txdp = &txq->i40e_tx_ring[tx_id];
> +		txdp = &txq->ci_tx_ring[tx_id];
>  		txep = &txq->sw_ring_vec[tx_id];
>  	}
> 
> @@ -519,7 +519,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct
> rte_mbuf **tx_pkts,
> 
>  	tx_id = (uint16_t)(tx_id + nb_commit);
>  	if (tx_id > txq->tx_next_rs) {
> -		txq->i40e_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
> +		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
> 
> 	rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
> 
> 	I40E_TXD_QW1_CMD_SHIFT);
>  		txq->tx_next_rs =
> diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
> b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
> index 137c1f9765..f054bd41bf 100644
> --- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
> +++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
> @@ -753,7 +753,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue,
> struct rte_mbuf **tx_pkts,
>  		return 0;
> 
>  	tx_id = txq->tx_tail;
> -	txdp = &txq->i40e_tx_ring[tx_id];
> +	txdp = &txq->ci_tx_ring[tx_id];
>  	txep = &txq->sw_ring_vec[tx_id];
> 
>  	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
> @@ -774,7 +774,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue,
> struct rte_mbuf **tx_pkts,
>  		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
> 
>  		/* avoid reach the end of ring */
> -		txdp = &txq->i40e_tx_ring[tx_id];
> +		txdp = &txq->ci_tx_ring[tx_id];
>  		txep = &txq->sw_ring_vec[tx_id];
>  	}
> 
> @@ -784,7 +784,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue,
> struct rte_mbuf **tx_pkts,
> 
>  	tx_id = (uint16_t)(tx_id + nb_commit);
>  	if (tx_id > txq->tx_next_rs) {
> -		txq->i40e_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
> +		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
> 
> 	rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
> 
> 	I40E_TXD_QW1_CMD_SHIFT);
>  		txq->tx_next_rs =
> diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
> b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
> index 6971488750..9a967faeee 100644
> --- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
> +++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
> @@ -821,7 +821,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue,
> struct rte_mbuf **tx_pkts,
>  		return 0;
> 
>  	tx_id = txq->tx_tail;
> -	txdp = &txq->i40e_tx_ring[tx_id];
> +	txdp = &txq->ci_tx_ring[tx_id];
>  	txep = (void *)txq->sw_ring;
>  	txep += tx_id;
> 
> @@ -843,7 +843,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue,
> struct rte_mbuf **tx_pkts,
>  		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
> 
>  		/* avoid reach the end of ring */
> -		txdp = txq->i40e_tx_ring;
> +		txdp = txq->ci_tx_ring;
>  		txep = (void *)txq->sw_ring;
>  	}
> 
> @@ -853,7 +853,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue,
> struct rte_mbuf **tx_pkts,
> 
>  	tx_id = (uint16_t)(tx_id + nb_commit);
>  	if (tx_id > txq->tx_next_rs) {
> -		txq->i40e_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
> +		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
> 
> 	rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
> 
> 	I40E_TXD_QW1_CMD_SHIFT);
>  		txq->tx_next_rs =
> diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
> b/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
> index 14651f2f06..1fd7fc75bf 100644
> --- a/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
> +++ b/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
> @@ -15,7 +15,7 @@
>  static inline int
>  i40e_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
>  {
> -	return (txq->i40e_tx_ring[idx].cmd_type_offset_bsz &
> +	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
>  			rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) ==
> 
> 	rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE);
>  }
> diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
> b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
> index 6404b70c56..0b95152232 100644
> --- a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
> +++ b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
> @@ -638,7 +638,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict
> tx_queue,
>  		return 0;
> 
>  	tx_id = txq->tx_tail;
> -	txdp = &txq->i40e_tx_ring[tx_id];
> +	txdp = &txq->ci_tx_ring[tx_id];
>  	txep = &txq->sw_ring_vec[tx_id];
> 
>  	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
> @@ -658,7 +658,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict
> tx_queue,
>  		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
> 
>  		/* avoid reach the end of ring */
> -		txdp = &txq->i40e_tx_ring[tx_id];
> +		txdp = &txq->ci_tx_ring[tx_id];
>  		txep = &txq->sw_ring_vec[tx_id];
>  	}
> 
> @@ -668,7 +668,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict
> tx_queue,
> 
>  	tx_id = (uint16_t)(tx_id + nb_commit);
>  	if (tx_id > txq->tx_next_rs) {
> -		txq->i40e_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
> +		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
> 
> 	rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
> 
> 	I40E_TXD_QW1_CMD_SHIFT);
>  		txq->tx_next_rs =
> diff --git a/drivers/net/intel/iavf/iavf_rxtx.c
> b/drivers/net/intel/iavf/iavf_rxtx.c
> index e4421a9932..807bc92a45 100644
> --- a/drivers/net/intel/iavf/iavf_rxtx.c
> +++ b/drivers/net/intel/iavf/iavf_rxtx.c
> @@ -269,11 +269,11 @@ reset_tx_queue(struct ci_tx_queue *txq)
>  	txe = txq->sw_ring;
>  	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
>  	for (i = 0; i < size; i++)
> -		((volatile char *)txq->iavf_tx_ring)[i] = 0;
> +		((volatile char *)txq->ci_tx_ring)[i] = 0;
> 
>  	prev = (uint16_t)(txq->nb_tx_desc - 1);
>  	for (i = 0; i < txq->nb_tx_desc; i++) {
> -		txq->iavf_tx_ring[i].cmd_type_offset_bsz =
> +		txq->ci_tx_ring[i].cmd_type_offset_bsz =
> 
> 	rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE);
>  		txe[i].mbuf =  NULL;
>  		txe[i].last_id = i;
> @@ -829,7 +829,7 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
>  	/* Allocate TX hardware ring descriptors. */
>  	ring_size = sizeof(struct ci_tx_desc) * IAVF_MAX_RING_DESC;
>  	ring_size = RTE_ALIGN(ring_size, IAVF_DMA_MEM_ALIGN);
> -	mz = rte_eth_dma_zone_reserve(dev, "iavf_tx_ring", queue_idx,
> +	mz = rte_eth_dma_zone_reserve(dev, "ci_tx_ring", queue_idx,
>  				      ring_size, IAVF_RING_BASE_ALIGN,
>  				      socket_id);
>  	if (!mz) {
> @@ -839,7 +839,7 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
>  		return -ENOMEM;
>  	}
>  	txq->tx_ring_dma = mz->iova;
> -	txq->iavf_tx_ring = (struct ci_tx_desc *)mz->addr;
> +	txq->ci_tx_ring = (struct ci_tx_desc *)mz->addr;
> 
>  	txq->mz = mz;
>  	reset_tx_queue(txq);
> @@ -2333,7 +2333,7 @@ iavf_xmit_cleanup(struct ci_tx_queue *txq)
>  	uint16_t desc_to_clean_to;
>  	uint16_t nb_tx_to_clean;
> 
> -	volatile struct ci_tx_desc *txd = txq->iavf_tx_ring;
> +	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
> 
>  	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
>  	if (desc_to_clean_to >= nb_tx_desc)
> @@ -2756,7 +2756,7 @@ uint16_t
>  iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
>  {
>  	struct ci_tx_queue *txq = tx_queue;
> -	volatile struct ci_tx_desc *txr = txq->iavf_tx_ring;
> +	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
>  	struct ci_tx_entry *txe_ring = txq->sw_ring;
>  	struct ci_tx_entry *txe, *txn;
>  	struct rte_mbuf *mb, *mb_seg;
> @@ -4462,7 +4462,7 @@ iavf_dev_tx_desc_status(void *tx_queue, uint16_t
> offset)
>  			desc -= txq->nb_tx_desc;
>  	}
> 
> -	status = &txq->iavf_tx_ring[desc].cmd_type_offset_bsz;
> +	status = &txq->ci_tx_ring[desc].cmd_type_offset_bsz;
>  	mask = rte_le_to_cpu_64(IAVF_TXD_QW1_DTYPE_MASK);
>  	expect = rte_cpu_to_le_64(
>  		 IAVF_TX_DESC_DTYPE_DESC_DONE <<
> IAVF_TXD_QW1_DTYPE_SHIFT);
> diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
> b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
> index 5b62d51cf7..89ce841b9e 100644
> --- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
> +++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
> @@ -1729,7 +1729,7 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue,
> struct rte_mbuf **tx_pkts,
>  	nb_commit = nb_pkts;
> 
>  	tx_id = txq->tx_tail;
> -	txdp = &txq->iavf_tx_ring[tx_id];
> +	txdp = &txq->ci_tx_ring[tx_id];
>  	txep = &txq->sw_ring_vec[tx_id];
> 
>  	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
> @@ -1750,7 +1750,7 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue,
> struct rte_mbuf **tx_pkts,
>  		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
> 
>  		/* avoid reach the end of ring */
> -		txdp = &txq->iavf_tx_ring[tx_id];
> +		txdp = &txq->ci_tx_ring[tx_id];
>  		txep = &txq->sw_ring_vec[tx_id];
>  	}
> 
> @@ -1760,7 +1760,7 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue,
> struct rte_mbuf **tx_pkts,
> 
>  	tx_id = (uint16_t)(tx_id + nb_commit);
>  	if (tx_id > txq->tx_next_rs) {
> -		txq->iavf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
> +		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
> 
> 	rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
>  					 IAVF_TXD_QW1_CMD_SHIFT);
>  		txq->tx_next_rs =
> diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
> b/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
> index d79d96c7b7..ad1b0b90cd 100644
> --- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
> +++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
> @@ -2219,7 +2219,7 @@ iavf_xmit_fixed_burst_vec_avx512(void
> *tx_queue, struct rte_mbuf **tx_pkts,
>  	nb_commit = nb_pkts;
> 
>  	tx_id = txq->tx_tail;
> -	txdp = &txq->iavf_tx_ring[tx_id];
> +	txdp = &txq->ci_tx_ring[tx_id];
>  	txep = (void *)txq->sw_ring;
>  	txep += tx_id;
> 
> @@ -2241,7 +2241,7 @@ iavf_xmit_fixed_burst_vec_avx512(void
> *tx_queue, struct rte_mbuf **tx_pkts,
>  		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
> 
>  		/* avoid reach the end of ring */
> -		txdp = &txq->iavf_tx_ring[tx_id];
> +		txdp = &txq->ci_tx_ring[tx_id];
>  		txep = (void *)txq->sw_ring;
>  		txep += tx_id;
>  	}
> @@ -2252,7 +2252,7 @@ iavf_xmit_fixed_burst_vec_avx512(void
> *tx_queue, struct rte_mbuf **tx_pkts,
> 
>  	tx_id = (uint16_t)(tx_id + nb_commit);
>  	if (tx_id > txq->tx_next_rs) {
> -		txq->iavf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
> +		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
> 
> 	rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
>  					 IAVF_TXD_QW1_CMD_SHIFT);
>  		txq->tx_next_rs =
> @@ -2288,7 +2288,7 @@ iavf_xmit_fixed_burst_vec_avx512_ctx(void
> *tx_queue, struct rte_mbuf **tx_pkts,
> 
>  	nb_pkts = nb_commit >> 1;
>  	tx_id = txq->tx_tail;
> -	txdp = &txq->iavf_tx_ring[tx_id];
> +	txdp = &txq->ci_tx_ring[tx_id];
>  	txep = (void *)txq->sw_ring;
>  	txep += (tx_id >> 1);
> 
> @@ -2309,7 +2309,7 @@ iavf_xmit_fixed_burst_vec_avx512_ctx(void
> *tx_queue, struct rte_mbuf **tx_pkts,
>  		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
>  		tx_id = 0;
>  		/* avoid reach the end of ring */
> -		txdp = txq->iavf_tx_ring;
> +		txdp = txq->ci_tx_ring;
>  		txep = (void *)txq->sw_ring;
>  	}
> 
> @@ -2320,7 +2320,7 @@ iavf_xmit_fixed_burst_vec_avx512_ctx(void
> *tx_queue, struct rte_mbuf **tx_pkts,
>  	tx_id = (uint16_t)(tx_id + nb_commit);
> 
>  	if (tx_id > txq->tx_next_rs) {
> -		txq->iavf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
> +		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
> 
> 	rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
>  					 IAVF_TXD_QW1_CMD_SHIFT);
>  		txq->tx_next_rs =
> diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
> b/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
> index f1ea57034f..1832b76f89 100644
> --- a/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
> +++ b/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
> @@ -14,7 +14,7 @@
>  static inline int
>  iavf_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
>  {
> -	return (txq->iavf_tx_ring[idx].cmd_type_offset_bsz &
> +	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
>  			rte_cpu_to_le_64(IAVF_TXD_QW1_DTYPE_MASK)) ==
> 
> 	rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE);
>  }
> diff --git a/drivers/net/intel/ice/ice_dcf_ethdev.c
> b/drivers/net/intel/ice/ice_dcf_ethdev.c
> index ab1d499cef..5f537b4c12 100644
> --- a/drivers/net/intel/ice/ice_dcf_ethdev.c
> +++ b/drivers/net/intel/ice/ice_dcf_ethdev.c
> @@ -401,11 +401,11 @@ reset_tx_queue(struct ci_tx_queue *txq)
>  	txe = txq->sw_ring;
>  	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
>  	for (i = 0; i < size; i++)
> -		((volatile char *)txq->ice_tx_ring)[i] = 0;
> +		((volatile char *)txq->ci_tx_ring)[i] = 0;
> 
>  	prev = (uint16_t)(txq->nb_tx_desc - 1);
>  	for (i = 0; i < txq->nb_tx_desc; i++) {
> -		txq->ice_tx_ring[i].cmd_type_offset_bsz =
> +		txq->ci_tx_ring[i].cmd_type_offset_bsz =
> 
> 	rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE);
>  		txe[i].mbuf =  NULL;
>  		txe[i].last_id = i;
> diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
> index 74b80e7df3..e3ffbdb587 100644
> --- a/drivers/net/intel/ice/ice_rxtx.c
> +++ b/drivers/net/intel/ice/ice_rxtx.c
> @@ -1117,11 +1117,11 @@ ice_reset_tx_queue(struct ci_tx_queue *txq)
>  	txe = txq->sw_ring;
>  	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
>  	for (i = 0; i < size; i++)
> -		((volatile char *)txq->ice_tx_ring)[i] = 0;
> +		((volatile char *)txq->ci_tx_ring)[i] = 0;
> 
>  	prev = (uint16_t)(txq->nb_tx_desc - 1);
>  	for (i = 0; i < txq->nb_tx_desc; i++) {
> -		volatile struct ci_tx_desc *txd = &txq->ice_tx_ring[i];
> +		volatile struct ci_tx_desc *txd = &txq->ci_tx_ring[i];
> 
>  		txd->cmd_type_offset_bsz =
>  			rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE);
> @@ -1625,7 +1625,7 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
>  	/* Allocate TX hardware ring descriptors. */
>  	ring_size = sizeof(struct ci_tx_desc) *
> ICE_MAX_NUM_DESC_BY_MAC(hw);
>  	ring_size = RTE_ALIGN(ring_size, ICE_DMA_MEM_ALIGN);
> -	tz = rte_eth_dma_zone_reserve(dev, "ice_tx_ring", queue_idx,
> +	tz = rte_eth_dma_zone_reserve(dev, "ci_tx_ring", queue_idx,
>  				      ring_size, ICE_RING_BASE_ALIGN,
>  				      socket_id);
>  	if (!tz) {
> @@ -1649,7 +1649,7 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
>  	txq->tx_deferred_start = tx_conf->tx_deferred_start;
> 
>  	txq->tx_ring_dma = tz->iova;
> -	txq->ice_tx_ring = tz->addr;
> +	txq->ci_tx_ring = tz->addr;
> 
>  	/* Allocate software ring */
>  	txq->sw_ring =
> @@ -2555,7 +2555,7 @@ ice_tx_descriptor_status(void *tx_queue, uint16_t
> offset)
>  			desc -= txq->nb_tx_desc;
>  	}
> 
> -	status = &txq->ice_tx_ring[desc].cmd_type_offset_bsz;
> +	status = &txq->ci_tx_ring[desc].cmd_type_offset_bsz;
>  	mask = rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M);
>  	expect = rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE <<
>  				  ICE_TXD_QW1_DTYPE_S);
> @@ -2638,7 +2638,7 @@ ice_fdir_setup_tx_resources(struct ice_pf *pf)
>  	txq->ice_vsi = pf->fdir.fdir_vsi;
> 
>  	txq->tx_ring_dma = tz->iova;
> -	txq->ice_tx_ring = (struct ci_tx_desc *)tz->addr;
> +	txq->ci_tx_ring = (struct ci_tx_desc *)tz->addr;
>  	/*
>  	 * don't need to allocate software ring and reset for the fdir
>  	 * program queue just set the queue has been configured.
> @@ -3027,7 +3027,7 @@ static inline int
>  ice_xmit_cleanup(struct ci_tx_queue *txq)
>  {
>  	struct ci_tx_entry *sw_ring = txq->sw_ring;
> -	volatile struct ci_tx_desc *txd = txq->ice_tx_ring;
> +	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
>  	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
>  	uint16_t nb_tx_desc = txq->nb_tx_desc;
>  	uint16_t desc_to_clean_to;
> @@ -3148,7 +3148,7 @@ uint16_t
>  ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
>  {
>  	struct ci_tx_queue *txq;
> -	volatile struct ci_tx_desc *ice_tx_ring;
> +	volatile struct ci_tx_desc *ci_tx_ring;
>  	volatile struct ci_tx_desc *txd;
>  	struct ci_tx_entry *sw_ring;
>  	struct ci_tx_entry *txe, *txn;
> @@ -3171,7 +3171,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
> 
>  	txq = tx_queue;
>  	sw_ring = txq->sw_ring;
> -	ice_tx_ring = txq->ice_tx_ring;
> +	ci_tx_ring = txq->ci_tx_ring;
>  	tx_id = txq->tx_tail;
>  	txe = &sw_ring[tx_id];
> 
> @@ -3257,7 +3257,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
>  			/* Setup TX context descriptor if required */
>  			volatile struct ice_tx_ctx_desc *ctx_txd =
>  				(volatile struct ice_tx_ctx_desc *)
> -					&ice_tx_ring[tx_id];
> +					&ci_tx_ring[tx_id];
>  			uint16_t cd_l2tag2 = 0;
>  			uint64_t cd_type_cmd_tso_mss =
> ICE_TX_DESC_DTYPE_CTX;
> 
> @@ -3299,7 +3299,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
>  		m_seg = tx_pkt;
> 
>  		do {
> -			txd = &ice_tx_ring[tx_id];
> +			txd = &ci_tx_ring[tx_id];
>  			txn = &sw_ring[txe->next_id];
> 
>  			if (txe->mbuf)
> @@ -3327,7 +3327,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
>  				txe->last_id = tx_last;
>  				tx_id = txe->next_id;
>  				txe = txn;
> -				txd = &ice_tx_ring[tx_id];
> +				txd = &ci_tx_ring[tx_id];
>  				txn = &sw_ring[txe->next_id];
>  			}
> 
> @@ -3410,7 +3410,7 @@ ice_tx_free_bufs(struct ci_tx_queue *txq)
>  	struct ci_tx_entry *txep;
>  	uint16_t i;
> 
> -	if ((txq->ice_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
> +	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
>  	     rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M)) !=
>  	    rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE))
>  		return 0;
> @@ -3594,7 +3594,7 @@ static inline void
>  ice_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
>  		    uint16_t nb_pkts)
>  {
> -	volatile struct ci_tx_desc *txdp = &txq->ice_tx_ring[txq->tx_tail];
> +	volatile struct ci_tx_desc *txdp = &txq->ci_tx_ring[txq->tx_tail];
>  	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
>  	const int N_PER_LOOP = 4;
>  	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
> @@ -3627,7 +3627,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
>  	     struct rte_mbuf **tx_pkts,
>  	     uint16_t nb_pkts)
>  {
> -	volatile struct ci_tx_desc *txr = txq->ice_tx_ring;
> +	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
>  	uint16_t n = 0;
> 
>  	/**
> @@ -4887,11 +4887,11 @@ ice_fdir_programming(struct ice_pf *pf, struct
> ice_fltr_desc *fdir_desc)
>  	uint16_t i;
> 
>  	fdirdp = (volatile struct ice_fltr_desc *)
> -		(&txq->ice_tx_ring[txq->tx_tail]);
> +		(&txq->ci_tx_ring[txq->tx_tail]);
>  	fdirdp->qidx_compq_space_stat = fdir_desc->qidx_compq_space_stat;
>  	fdirdp->dtype_cmd_vsi_fdid = fdir_desc->dtype_cmd_vsi_fdid;
> 
> -	txdp = &txq->ice_tx_ring[txq->tx_tail + 1];
> +	txdp = &txq->ci_tx_ring[txq->tx_tail + 1];
>  	txdp->buffer_addr = rte_cpu_to_le_64(pf->fdir.dma_addr);
>  	td_cmd = ICE_TX_DESC_CMD_EOP |
>  		ICE_TX_DESC_CMD_RS  |
> diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
> b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
> index bef7bb00ba..0a1df0b2f6 100644
> --- a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
> +++ b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
> @@ -869,7 +869,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue,
> struct rte_mbuf **tx_pkts,
>  		return 0;
> 
>  	tx_id = txq->tx_tail;
> -	txdp = &txq->ice_tx_ring[tx_id];
> +	txdp = &txq->ci_tx_ring[tx_id];
>  	txep = &txq->sw_ring_vec[tx_id];
> 
>  	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
> @@ -890,7 +890,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue,
> struct rte_mbuf **tx_pkts,
>  		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
> 
>  		/* avoid reach the end of ring */
> -		txdp = &txq->ice_tx_ring[tx_id];
> +		txdp = &txq->ci_tx_ring[tx_id];
>  		txep = &txq->sw_ring_vec[tx_id];
>  	}
> 
> @@ -900,7 +900,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue,
> struct rte_mbuf **tx_pkts,
> 
>  	tx_id = (uint16_t)(tx_id + nb_commit);
>  	if (tx_id > txq->tx_next_rs) {
> -		txq->ice_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
> +		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
>  			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS)
> <<
>  					 ICE_TXD_QW1_CMD_S);
>  		txq->tx_next_rs =
> diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
> b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
> index 1f6bf5fc8e..d42f41461f 100644
> --- a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
> +++ b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
> @@ -933,7 +933,7 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue,
> struct rte_mbuf **tx_pkts,
>  		return 0;
> 
>  	tx_id = txq->tx_tail;
> -	txdp = &txq->ice_tx_ring[tx_id];
> +	txdp = &txq->ci_tx_ring[tx_id];
>  	txep = (void *)txq->sw_ring;
>  	txep += tx_id;
> 
> @@ -955,7 +955,7 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue,
> struct rte_mbuf **tx_pkts,
>  		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
> 
>  		/* avoid reach the end of ring */
> -		txdp = txq->ice_tx_ring;
> +		txdp = txq->ci_tx_ring;
>  		txep = (void *)txq->sw_ring;
>  	}
> 
> @@ -965,7 +965,7 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue,
> struct rte_mbuf **tx_pkts,
> 
>  	tx_id = (uint16_t)(tx_id + nb_commit);
>  	if (tx_id > txq->tx_next_rs) {
> -		txq->ice_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
> +		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
>  			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS)
> <<
>  					 ICE_TXD_QW1_CMD_S);
>  		txq->tx_next_rs =
> diff --git a/drivers/net/intel/ice/ice_rxtx_vec_common.h
> b/drivers/net/intel/ice/ice_rxtx_vec_common.h
> index ff46a8fb49..8ba591e403 100644
> --- a/drivers/net/intel/ice/ice_rxtx_vec_common.h
> +++ b/drivers/net/intel/ice/ice_rxtx_vec_common.h
> @@ -11,7 +11,7 @@
>  static inline int
>  ice_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
>  {
> -	return (txq->ice_tx_ring[idx].cmd_type_offset_bsz &
> +	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
>  			rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M)) ==
> 
> 	rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE);
>  }
> diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c
> b/drivers/net/intel/idpf/idpf_common_rxtx.c
> index be3c1ef216..51074bda3a 100644
> --- a/drivers/net/intel/idpf/idpf_common_rxtx.c
> +++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
> @@ -266,11 +266,11 @@ idpf_qc_single_tx_queue_reset(struct ci_tx_queue
> *txq)
>  	txe = txq->sw_ring;
>  	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
>  	for (i = 0; i < size; i++)
> -		((volatile char *)txq->idpf_tx_ring)[i] = 0;
> +		((volatile char *)txq->ci_tx_ring)[i] = 0;
> 
>  	prev = (uint16_t)(txq->nb_tx_desc - 1);
>  	for (i = 0; i < txq->nb_tx_desc; i++) {
> -		txq->idpf_tx_ring[i].cmd_type_offset_bsz =
> +		txq->ci_tx_ring[i].cmd_type_offset_bsz =
> 
> 	rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE);
>  		txe[i].mbuf =  NULL;
>  		txe[i].last_id = i;
> @@ -1335,7 +1335,7 @@ idpf_xmit_cleanup(struct ci_tx_queue *txq)
>  	uint16_t desc_to_clean_to;
>  	uint16_t nb_tx_to_clean;
> 
> -	volatile struct ci_tx_desc *txd = txq->idpf_tx_ring;
> +	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
> 
>  	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
>  	if (desc_to_clean_to >= nb_tx_desc)
> @@ -1398,7 +1398,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct
> rte_mbuf **tx_pkts,
>  		return nb_tx;
> 
>  	sw_ring = txq->sw_ring;
> -	txr = txq->idpf_tx_ring;
> +	txr = txq->ci_tx_ring;
>  	tx_id = txq->tx_tail;
>  	txe = &sw_ring[tx_id];
> 
> diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
> b/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
> index 5f5d538dcb..04efee3722 100644
> --- a/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
> +++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
> @@ -573,7 +573,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void
> *tx_queue, struct rte_mbuf **tx_pkts
>  		return 0;
> 
>  	tx_id = txq->tx_tail;
> -	txdp = &txq->idpf_tx_ring[tx_id];
> +	txdp = &txq->ci_tx_ring[tx_id];
>  	txep = &txq->sw_ring_vec[tx_id];
> 
>  	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
> @@ -594,7 +594,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void
> *tx_queue, struct rte_mbuf **tx_pkts
>  		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
> 
>  		/* avoid reach the end of ring */
> -		txdp = &txq->idpf_tx_ring[tx_id];
> +		txdp = &txq->ci_tx_ring[tx_id];
>  		txep = &txq->sw_ring_vec[tx_id];
>  	}
> 
> @@ -604,7 +604,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void
> *tx_queue, struct rte_mbuf **tx_pkts
> 
>  	tx_id = (uint16_t)(tx_id + nb_commit);
>  	if (tx_id > txq->tx_next_rs) {
> -		txq->idpf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
> +		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
> 
> 	rte_cpu_to_le_64(((uint64_t)IDPF_TX_DESC_CMD_RS) <<
>  					 IDPF_TXD_QW1_CMD_S);
>  		txq->tx_next_rs =
> diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
> b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
> index c1ec3d1222..d5e5a2ca5f 100644
> --- a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
> +++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
> @@ -1090,7 +1090,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void
> *tx_queue, struct rte_mbuf **tx_pk
>  		return 0;
> 
>  	tx_id = txq->tx_tail;
> -	txdp = &txq->idpf_tx_ring[tx_id];
> +	txdp = &txq->ci_tx_ring[tx_id];
>  	txep = (void *)txq->sw_ring;
>  	txep += tx_id;
> 
> @@ -1112,7 +1112,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void
> *tx_queue, struct rte_mbuf **tx_pk
>  		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
> 
>  		/* avoid reach the end of ring */
> -		txdp = &txq->idpf_tx_ring[tx_id];
> +		txdp = &txq->ci_tx_ring[tx_id];
>  		txep = (void *)txq->sw_ring;
>  		txep += tx_id;
>  	}
> @@ -1123,7 +1123,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void
> *tx_queue, struct rte_mbuf **tx_pk
> 
>  	tx_id = (uint16_t)(tx_id + nb_commit);
>  	if (tx_id > txq->tx_next_rs) {
> -		txq->idpf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
> +		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
> 
> 	rte_cpu_to_le_64(((uint64_t)IDPF_TX_DESC_CMD_RS) <<
>  					 IDPF_TXD_QW1_CMD_S);
>  		txq->tx_next_rs =
> diff --git a/drivers/net/intel/idpf/idpf_rxtx.c
> b/drivers/net/intel/idpf/idpf_rxtx.c
> index 8aa44585fe..0de54d9305 100644
> --- a/drivers/net/intel/idpf/idpf_rxtx.c
> +++ b/drivers/net/intel/idpf/idpf_rxtx.c
> @@ -481,7 +481,7 @@ idpf_tx_queue_setup(struct rte_eth_dev *dev,
> uint16_t queue_idx,
>  	}
> 
>  	if (!is_splitq) {
> -		txq->idpf_tx_ring = mz->addr;
> +		txq->ci_tx_ring = mz->addr;
>  		idpf_qc_single_tx_queue_reset(txq);
>  	} else {
>  		txq->desc_ring = mz->addr;
> diff --git a/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
> b/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
> index 4702061484..b5e8574667 100644
> --- a/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
> +++ b/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
> @@ -31,7 +31,7 @@ idpf_tx_desc_done(struct ci_tx_queue *txq, uint16_t
> idx)
>  	if (txq->complq != NULL)
>  		return 1;
> 
> -	return (txq->idpf_tx_ring[idx].cmd_type_offset_bsz &
> +	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
>  			rte_cpu_to_le_64(IDPF_TXD_QW1_DTYPE_M)) ==
> 
> 	rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE);
>  }
> --
> 2.51.0


^ permalink raw reply	[flat|nested] 274+ messages in thread

* RE: [PATCH v3 03/36] net/intel: create common post-Tx cleanup function
  2026-01-30 11:41   ` [PATCH v3 03/36] net/intel: create common post-Tx cleanup function Bruce Richardson
@ 2026-02-06 10:07     ` Loftus, Ciara
  2026-02-09 10:41       ` Bruce Richardson
  0 siblings, 1 reply; 274+ messages in thread
From: Loftus, Ciara @ 2026-02-06 10:07 UTC (permalink / raw)
  To: Richardson, Bruce, dev@dpdk.org
  Cc: Richardson, Bruce, Medvedkin, Vladimir, Burakov, Anatoly,
	Wu, Jingjing, Shetty,  Praveen



> -----Original Message-----
> From: Bruce Richardson <bruce.richardson@intel.com>
> Sent: Friday 30 January 2026 11:42
> To: dev@dpdk.org
> Cc: Richardson, Bruce <bruce.richardson@intel.com>; Medvedkin, Vladimir
> <vladimir.medvedkin@intel.com>; Burakov, Anatoly
> <anatoly.burakov@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Shetty,
> Praveen <praveen.shetty@intel.com>
> Subject: [PATCH v3 03/36] net/intel: create common post-Tx cleanup function
> 
> The code used in ice, iavf, idpf and i40e for doing cleanup of mbufs
> after they had been transmitted was identical. Therefore deduplicate it
> by moving to common and remove the driver-specific versions.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>  drivers/net/intel/common/tx.h             | 53 ++++++++++++++++++++
>  drivers/net/intel/i40e/i40e_rxtx.c        | 49 ++----------------
>  drivers/net/intel/iavf/iavf_rxtx.c        | 50 ++-----------------
>  drivers/net/intel/ice/ice_rxtx.c          | 60 ++---------------------
>  drivers/net/intel/idpf/idpf_common_rxtx.c | 46 ++---------------
>  5 files changed, 71 insertions(+), 187 deletions(-)
> 
> diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
> index 8cf63e59ab..a89412c195 100644
> --- a/drivers/net/intel/common/tx.h
> +++ b/drivers/net/intel/common/tx.h
> @@ -259,6 +259,59 @@ ci_tx_free_bufs_vec(struct ci_tx_queue *txq,
> ci_desc_done_fn desc_done, bool ctx
>  	return txq->tx_rs_thresh;
>  }
> 
> +/*
> + * Common transmit descriptor cleanup function for Intel drivers.
> + * Used by ice, i40e, iavf, and idpf drivers.
> + *
> + * Returns:
> + *   0 on success
> + *  -1 if cleanup cannot proceed (descriptors not yet processed by HW)
> + */
> +static __rte_always_inline int
> +ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
> +{
> +	struct ci_tx_entry *sw_ring = txq->sw_ring;
> +	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
> +	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
> +	uint16_t nb_tx_desc = txq->nb_tx_desc;
> +	uint16_t desc_to_clean_to;
> +	uint16_t nb_tx_to_clean;
> +
> +	/* Determine the last descriptor needing to be cleaned */
> +	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
> +	if (desc_to_clean_to >= nb_tx_desc)
> +		desc_to_clean_to = (uint16_t)(desc_to_clean_to -
> nb_tx_desc);
> +
> +	/* Check to make sure the last descriptor to clean is done */

This comment is similar to the next one. Maybe merge them?

> +	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
> +
> +	/* Check if descriptor is done - all drivers use 0xF as done value in bits
> 3:0 */
> +	if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
> rte_cpu_to_le_64(0xFUL)) !=
> +			rte_cpu_to_le_64(0xFUL)) {
> +		/* Descriptor not yet processed by hardware */
> +		return -1;
> +	}
> +
> +	/* Figure out how many descriptors will be cleaned */
> +	if (last_desc_cleaned > desc_to_clean_to)
> +		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned)
> + desc_to_clean_to);
> +	else
> +		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
> last_desc_cleaned);
> +
> +	/* The last descriptor to clean is done, so that means all the
> +	 * descriptors from the last descriptor that was cleaned
> +	 * up to the last descriptor with the RS bit set
> +	 * are done. Only reset the threshold descriptor.
> +	 */
> +	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
> +
> +	/* Update the txq to reflect the last descriptor that was cleaned */
> +	txq->last_desc_cleaned = desc_to_clean_to;
> +	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
> +
> +	return 0;
> +}
> +
>  static inline void
>  ci_txq_release_all_mbufs(struct ci_tx_queue *txq, bool use_ctx)
>  {
> diff --git a/drivers/net/intel/i40e/i40e_rxtx.c
> b/drivers/net/intel/i40e/i40e_rxtx.c
> index 210fc0201e..2760e76e99 100644
> --- a/drivers/net/intel/i40e/i40e_rxtx.c
> +++ b/drivers/net/intel/i40e/i40e_rxtx.c
> @@ -384,45 +384,6 @@ i40e_build_ctob(uint32_t td_cmd,
>  			((uint64_t)td_tag  <<
> I40E_TXD_QW1_L2TAG1_SHIFT));
>  }
> 
> -static inline int
> -i40e_xmit_cleanup(struct ci_tx_queue *txq)
> -{
> -	struct ci_tx_entry *sw_ring = txq->sw_ring;
> -	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
> -	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
> -	uint16_t nb_tx_desc = txq->nb_tx_desc;
> -	uint16_t desc_to_clean_to;
> -	uint16_t nb_tx_to_clean;
> -
> -	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
> -	if (desc_to_clean_to >= nb_tx_desc)
> -		desc_to_clean_to = (uint16_t)(desc_to_clean_to -
> nb_tx_desc);
> -
> -	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
> -	if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
> -			rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) !=
> -
> 	rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE)) {
> -		PMD_TX_LOG(DEBUG, "TX descriptor %4u is not done "
> -			   "(port=%d queue=%d)", desc_to_clean_to,
> -			   txq->port_id, txq->queue_id);

These logs are lost in each of the drivers. I'm not sure if they're terribly
helpful though so I think it's fine that they're dropped.

> -		return -1;
> -	}
> -
> -	if (last_desc_cleaned > desc_to_clean_to)
> -		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned)
> +
> -							desc_to_clean_to);
> -	else
> -		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
> -					last_desc_cleaned);
> -
> -	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
> -
> -	txq->last_desc_cleaned = desc_to_clean_to;
> -	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
> -
> -	return 0;
> -}
> -
>  static inline int
>  #ifdef RTE_LIBRTE_I40E_RX_ALLOW_BULK_ALLOC
>  check_rx_burst_bulk_alloc_preconditions(struct ci_rx_queue *rxq)
> @@ -1118,7 +1079,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
> 
>  	/* Check if the descriptor ring needs to be cleaned. */
>  	if (txq->nb_tx_free < txq->tx_free_thresh)
> -		(void)i40e_xmit_cleanup(txq);
> +		(void)ci_tx_xmit_cleanup(txq);
> 
>  	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
>  		td_cmd = 0;
> @@ -1159,14 +1120,14 @@ i40e_xmit_pkts(void *tx_queue, struct
> rte_mbuf **tx_pkts, uint16_t nb_pkts)
>  			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
> 
>  		if (nb_used > txq->nb_tx_free) {
> -			if (i40e_xmit_cleanup(txq) != 0) {
> +			if (ci_tx_xmit_cleanup(txq) != 0) {
>  				if (nb_tx == 0)
>  					return 0;
>  				goto end_of_tx;
>  			}
>  			if (unlikely(nb_used > txq->tx_rs_thresh)) {
>  				while (nb_used > txq->nb_tx_free) {
> -					if (i40e_xmit_cleanup(txq) != 0) {
> +					if (ci_tx_xmit_cleanup(txq) != 0) {
>  						if (nb_tx == 0)
>  							return 0;
>  						goto end_of_tx;
> @@ -2808,7 +2769,7 @@ i40e_tx_done_cleanup_full(struct ci_tx_queue
> *txq,
>  	tx_last = txq->tx_tail;
>  	tx_id  = swr_ring[tx_last].next_id;
> 
> -	if (txq->nb_tx_free == 0 && i40e_xmit_cleanup(txq))
> +	if (txq->nb_tx_free == 0 && ci_tx_xmit_cleanup(txq))
>  		return 0;
> 
>  	nb_tx_to_clean = txq->nb_tx_free;
> @@ -2842,7 +2803,7 @@ i40e_tx_done_cleanup_full(struct ci_tx_queue
> *txq,
>  			break;
> 
>  		if (pkt_cnt < free_cnt) {
> -			if (i40e_xmit_cleanup(txq))
> +			if (ci_tx_xmit_cleanup(txq))
>  				break;
> 
>  			nb_tx_to_clean = txq->nb_tx_free - nb_tx_free_last;
> diff --git a/drivers/net/intel/iavf/iavf_rxtx.c
> b/drivers/net/intel/iavf/iavf_rxtx.c
> index 807bc92a45..560abfc1ef 100644
> --- a/drivers/net/intel/iavf/iavf_rxtx.c
> +++ b/drivers/net/intel/iavf/iavf_rxtx.c
> @@ -2324,46 +2324,6 @@ iavf_recv_pkts_bulk_alloc(void *rx_queue,
>  	return nb_rx;
>  }
> 
> -static inline int
> -iavf_xmit_cleanup(struct ci_tx_queue *txq)
> -{
> -	struct ci_tx_entry *sw_ring = txq->sw_ring;
> -	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
> -	uint16_t nb_tx_desc = txq->nb_tx_desc;
> -	uint16_t desc_to_clean_to;
> -	uint16_t nb_tx_to_clean;
> -
> -	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
> -
> -	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
> -	if (desc_to_clean_to >= nb_tx_desc)
> -		desc_to_clean_to = (uint16_t)(desc_to_clean_to -
> nb_tx_desc);
> -
> -	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
> -	if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
> -			rte_cpu_to_le_64(IAVF_TXD_QW1_DTYPE_MASK)) !=
> -
> 	rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE)) {
> -		PMD_TX_LOG(DEBUG, "TX descriptor %4u is not done "
> -			   "(port=%d queue=%d)", desc_to_clean_to,
> -			   txq->port_id, txq->queue_id);
> -		return -1;
> -	}
> -
> -	if (last_desc_cleaned > desc_to_clean_to)
> -		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned)
> +
> -							desc_to_clean_to);
> -	else
> -		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
> -					last_desc_cleaned);
> -
> -	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
> -
> -	txq->last_desc_cleaned = desc_to_clean_to;
> -	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
> -
> -	return 0;
> -}
> -
>  /* Check if the context descriptor is needed for TX offloading */
>  static inline uint16_t
>  iavf_calc_context_desc(struct rte_mbuf *mb, uint8_t vlan_flag)
> @@ -2768,7 +2728,7 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
> 
>  	/* Check if the descriptor ring needs to be cleaned. */
>  	if (txq->nb_tx_free < txq->tx_free_thresh)
> -		iavf_xmit_cleanup(txq);
> +		ci_tx_xmit_cleanup(txq);
> 
>  	desc_idx = txq->tx_tail;
>  	txe = &txe_ring[desc_idx];
> @@ -2823,14 +2783,14 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
>  			txq->port_id, txq->queue_id, desc_idx, desc_idx_last);
> 
>  		if (nb_desc_required > txq->nb_tx_free) {
> -			if (iavf_xmit_cleanup(txq)) {
> +			if (ci_tx_xmit_cleanup(txq)) {
>  				if (idx == 0)
>  					return 0;
>  				goto end_of_tx;
>  			}
>  			if (unlikely(nb_desc_required > txq->tx_rs_thresh)) {
>  				while (nb_desc_required > txq->nb_tx_free) {
> -					if (iavf_xmit_cleanup(txq)) {
> +					if (ci_tx_xmit_cleanup(txq)) {
>  						if (idx == 0)
>  							return 0;
>  						goto end_of_tx;
> @@ -4300,7 +4260,7 @@ iavf_tx_done_cleanup_full(struct ci_tx_queue *txq,
>  	tx_id = txq->tx_tail;
>  	tx_last = tx_id;
> 
> -	if (txq->nb_tx_free == 0 && iavf_xmit_cleanup(txq))
> +	if (txq->nb_tx_free == 0 && ci_tx_xmit_cleanup(txq))
>  		return 0;
> 
>  	nb_tx_to_clean = txq->nb_tx_free;
> @@ -4332,7 +4292,7 @@ iavf_tx_done_cleanup_full(struct ci_tx_queue *txq,
>  			break;
> 
>  		if (pkt_cnt < free_cnt) {
> -			if (iavf_xmit_cleanup(txq))
> +			if (ci_tx_xmit_cleanup(txq))
>  				break;
> 
>  			nb_tx_to_clean = txq->nb_tx_free - nb_tx_free_last;
> diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
> index e3ffbdb587..7a33e1e980 100644
> --- a/drivers/net/intel/ice/ice_rxtx.c
> +++ b/drivers/net/intel/ice/ice_rxtx.c
> @@ -3023,56 +3023,6 @@ ice_txd_enable_checksum(uint64_t ol_flags,
>  	}
>  }
> 
> -static inline int
> -ice_xmit_cleanup(struct ci_tx_queue *txq)
> -{
> -	struct ci_tx_entry *sw_ring = txq->sw_ring;
> -	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
> -	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
> -	uint16_t nb_tx_desc = txq->nb_tx_desc;
> -	uint16_t desc_to_clean_to;
> -	uint16_t nb_tx_to_clean;
> -
> -	/* Determine the last descriptor needing to be cleaned */
> -	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
> -	if (desc_to_clean_to >= nb_tx_desc)
> -		desc_to_clean_to = (uint16_t)(desc_to_clean_to -
> nb_tx_desc);
> -
> -	/* Check to make sure the last descriptor to clean is done */
> -	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
> -	if (!(txd[desc_to_clean_to].cmd_type_offset_bsz &
> -	    rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE))) {
> -		PMD_TX_LOG(DEBUG, "TX descriptor %4u is not done "
> -			   "(port=%d queue=%d) value=0x%"PRIx64,
> -			   desc_to_clean_to,
> -			   txq->port_id, txq->queue_id,
> -			   txd[desc_to_clean_to].cmd_type_offset_bsz);
> -		/* Failed to clean any descriptors */
> -		return -1;
> -	}
> -
> -	/* Figure out how many descriptors will be cleaned */
> -	if (last_desc_cleaned > desc_to_clean_to)
> -		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned)
> +
> -					    desc_to_clean_to);
> -	else
> -		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
> -					    last_desc_cleaned);
> -
> -	/* The last descriptor to clean is done, so that means all the
> -	 * descriptors from the last descriptor that was cleaned
> -	 * up to the last descriptor with the RS bit set
> -	 * are done. Only reset the threshold descriptor.
> -	 */
> -	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
> -
> -	/* Update the txq to reflect the last descriptor that was cleaned */
> -	txq->last_desc_cleaned = desc_to_clean_to;
> -	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
> -
> -	return 0;
> -}
> -
>  /* Construct the tx flags */
>  static inline uint64_t
>  ice_build_ctob(uint32_t td_cmd,
> @@ -3180,7 +3130,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
> 
>  	/* Check if the descriptor ring needs to be cleaned. */
>  	if (txq->nb_tx_free < txq->tx_free_thresh)
> -		(void)ice_xmit_cleanup(txq);
> +		(void)ci_tx_xmit_cleanup(txq);
> 
>  	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
>  		tx_pkt = *tx_pkts++;
> @@ -3217,14 +3167,14 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
>  			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
> 
>  		if (nb_used > txq->nb_tx_free) {
> -			if (ice_xmit_cleanup(txq) != 0) {
> +			if (ci_tx_xmit_cleanup(txq) != 0) {
>  				if (nb_tx == 0)
>  					return 0;
>  				goto end_of_tx;
>  			}
>  			if (unlikely(nb_used > txq->tx_rs_thresh)) {
>  				while (nb_used > txq->nb_tx_free) {
> -					if (ice_xmit_cleanup(txq) != 0) {
> +					if (ci_tx_xmit_cleanup(txq) != 0) {
>  						if (nb_tx == 0)
>  							return 0;
>  						goto end_of_tx;
> @@ -3459,7 +3409,7 @@ ice_tx_done_cleanup_full(struct ci_tx_queue *txq,
>  	tx_last = txq->tx_tail;
>  	tx_id  = swr_ring[tx_last].next_id;
> 
> -	if (txq->nb_tx_free == 0 && ice_xmit_cleanup(txq))
> +	if (txq->nb_tx_free == 0 && ci_tx_xmit_cleanup(txq))
>  		return 0;
> 
>  	nb_tx_to_clean = txq->nb_tx_free;
> @@ -3493,7 +3443,7 @@ ice_tx_done_cleanup_full(struct ci_tx_queue *txq,
>  			break;
> 
>  		if (pkt_cnt < free_cnt) {
> -			if (ice_xmit_cleanup(txq))
> +			if (ci_tx_xmit_cleanup(txq))
>  				break;
> 
>  			nb_tx_to_clean = txq->nb_tx_free - nb_tx_free_last;
> diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c
> b/drivers/net/intel/idpf/idpf_common_rxtx.c
> index 51074bda3a..23666539ab 100644
> --- a/drivers/net/intel/idpf/idpf_common_rxtx.c
> +++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
> @@ -1326,46 +1326,6 @@ idpf_dp_singleq_recv_scatter_pkts(void
> *rx_queue, struct rte_mbuf **rx_pkts,
>  	return nb_rx;
>  }
> 
> -static inline int
> -idpf_xmit_cleanup(struct ci_tx_queue *txq)
> -{
> -	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
> -	struct ci_tx_entry *sw_ring = txq->sw_ring;
> -	uint16_t nb_tx_desc = txq->nb_tx_desc;
> -	uint16_t desc_to_clean_to;
> -	uint16_t nb_tx_to_clean;
> -
> -	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
> -
> -	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
> -	if (desc_to_clean_to >= nb_tx_desc)
> -		desc_to_clean_to = (uint16_t)(desc_to_clean_to -
> nb_tx_desc);
> -
> -	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
> -	if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
> -	     rte_cpu_to_le_64(IDPF_TXD_QW1_DTYPE_M)) !=
> -	    rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE)) {
> -		TX_LOG(DEBUG, "TX descriptor %4u is not done "
> -		       "(port=%d queue=%d)", desc_to_clean_to,
> -		       txq->port_id, txq->queue_id);
> -		return -1;
> -	}
> -
> -	if (last_desc_cleaned > desc_to_clean_to)
> -		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned)
> +
> -					    desc_to_clean_to);
> -	else
> -		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
> -					    last_desc_cleaned);
> -
> -	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
> -
> -	txq->last_desc_cleaned = desc_to_clean_to;
> -	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
> -
> -	return 0;
> -}
> -
>  /* TX function */
>  RTE_EXPORT_INTERNAL_SYMBOL(idpf_dp_singleq_xmit_pkts)
>  uint16_t
> @@ -1404,7 +1364,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct
> rte_mbuf **tx_pkts,
> 
>  	/* Check if the descriptor ring needs to be cleaned. */
>  	if (txq->nb_tx_free < txq->tx_free_thresh)
> -		(void)idpf_xmit_cleanup(txq);
> +		(void)ci_tx_xmit_cleanup(txq);
> 
>  	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
>  		td_cmd = 0;
> @@ -1437,14 +1397,14 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue,
> struct rte_mbuf **tx_pkts,
>  		       txq->port_id, txq->queue_id, tx_id, tx_last);
> 
>  		if (nb_used > txq->nb_tx_free) {
> -			if (idpf_xmit_cleanup(txq) != 0) {
> +			if (ci_tx_xmit_cleanup(txq) != 0) {
>  				if (nb_tx == 0)
>  					return 0;
>  				goto end_of_tx;
>  			}
>  			if (unlikely(nb_used > txq->tx_rs_thresh)) {
>  				while (nb_used > txq->nb_tx_free) {
> -					if (idpf_xmit_cleanup(txq) != 0) {
> +					if (ci_tx_xmit_cleanup(txq) != 0) {
>  						if (nb_tx == 0)
>  							return 0;
>  						goto end_of_tx;
> --
> 2.51.0


^ permalink raw reply	[flat|nested] 274+ messages in thread

* RE: [PATCH v3 04/36] net/intel: consolidate definitions for Tx desc fields
  2026-01-30 11:41   ` [PATCH v3 04/36] net/intel: consolidate definitions for Tx desc fields Bruce Richardson
@ 2026-02-06 10:14     ` Loftus, Ciara
  2026-02-09 10:43       ` Bruce Richardson
  0 siblings, 1 reply; 274+ messages in thread
From: Loftus, Ciara @ 2026-02-06 10:14 UTC (permalink / raw)
  To: Richardson, Bruce, dev@dpdk.org
  Cc: Richardson, Bruce, Medvedkin, Vladimir, Burakov, Anatoly,
	Wu, Jingjing, Shetty,  Praveen



> -----Original Message-----
> From: Bruce Richardson <bruce.richardson@intel.com>
> Sent: Friday 30 January 2026 11:42
> To: dev@dpdk.org
> Cc: Richardson, Bruce <bruce.richardson@intel.com>; Medvedkin, Vladimir
> <vladimir.medvedkin@intel.com>; Burakov, Anatoly
> <anatoly.burakov@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Shetty,
> Praveen <praveen.shetty@intel.com>
> Subject: [PATCH v3 04/36] net/intel: consolidate definitions for Tx desc fields
> 
> The offsets of the various fields within the Tx descriptors are common
> for i40e, iavf, ice and idpf, so put a single set of defines in tx.h and
> use those throughout all drivers. (NOTE: there was a small difference in
> mask of CMD field between drivers depending on whether reserved fields
> or not were included. Those can be ignored as those bits are unused in
> the drivers for which they are reserved). Similarly, the various flag
> fields, such as End-of-packet (EOP) and Report-status (RS) are the same,
> as are offload definitions so consolidate them.
> 
> Original definitions are in base code, and are left in place because of
> that, but are unused.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>  drivers/net/intel/common/tx.h                 |  64 +++++++-
>  drivers/net/intel/i40e/i40e_fdir.c            |  24 +--
>  drivers/net/intel/i40e/i40e_rxtx.c            |  92 ++++++------
>  drivers/net/intel/i40e/i40e_rxtx.h            |  17 +--
>  .../net/intel/i40e/i40e_rxtx_vec_altivec.c    |  11 +-
>  drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c   |  22 ++-
>  drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c |  38 ++---
>  drivers/net/intel/i40e/i40e_rxtx_vec_common.h |   4 +-
>  drivers/net/intel/i40e/i40e_rxtx_vec_neon.c   |  11 +-
>  drivers/net/intel/iavf/iavf_rxtx.c            |  68 +++++----
>  drivers/net/intel/iavf/iavf_rxtx.h            |  20 +--
>  drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c   |  41 ++----
>  drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c |  80 ++++------
>  drivers/net/intel/iavf/iavf_rxtx_vec_common.h |  34 ++---
>  drivers/net/intel/ice/ice_dcf_ethdev.c        |   2 +-
>  drivers/net/intel/ice/ice_rxtx.c              | 137 ++++++++----------
>  drivers/net/intel/ice/ice_rxtx.h              |  15 +-
>  drivers/net/intel/ice/ice_rxtx_vec_avx2.c     |  41 ++----
>  drivers/net/intel/ice/ice_rxtx_vec_avx512.c   |  39 ++---
>  drivers/net/intel/ice/ice_rxtx_vec_common.h   |  41 +++---
>  drivers/net/intel/idpf/idpf_common_rxtx.c     |  22 +--
>  drivers/net/intel/idpf/idpf_common_rxtx.h     |  12 --
>  .../net/intel/idpf/idpf_common_rxtx_avx2.c    |  41 ++----
>  .../net/intel/idpf/idpf_common_rxtx_avx512.c  |  41 ++----
>  drivers/net/intel/idpf/idpf_rxtx_vec_common.h |   4 +-
>  25 files changed, 408 insertions(+), 513 deletions(-)
> 
> diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
> index a89412c195..03245d4fba 100644
> --- a/drivers/net/intel/common/tx.h
> +++ b/drivers/net/intel/common/tx.h
> @@ -10,6 +10,66 @@
>  #include <rte_ethdev.h>
>  #include <rte_vect.h>
> 
> +/* Common TX Descriptor QW1 Field Definitions */
> +#define CI_TXD_QW1_DTYPE_S      0
> +#define CI_TXD_QW1_DTYPE_M      (0xFUL << CI_TXD_QW1_DTYPE_S)
> +#define CI_TXD_QW1_CMD_S        4
> +#define CI_TXD_QW1_CMD_M        (0xFFFUL << CI_TXD_QW1_CMD_S)

This define is unused in the series.

> +#define CI_TXD_QW1_OFFSET_S     16
> +#define CI_TXD_QW1_OFFSET_M     (0x3FFFFULL <<
> CI_TXD_QW1_OFFSET_S)
> +#define CI_TXD_QW1_TX_BUF_SZ_S  34
> +#define CI_TXD_QW1_TX_BUF_SZ_M  (0x3FFFULL <<
> CI_TXD_QW1_TX_BUF_SZ_S)
> +#define CI_TXD_QW1_L2TAG1_S     48
> +#define CI_TXD_QW1_L2TAG1_M     (0xFFFFULL << CI_TXD_QW1_L2TAG1_S)
> +
> +/* Common Descriptor Types */
> +#define CI_TX_DESC_DTYPE_DATA           0x0
> +#define CI_TX_DESC_DTYPE_CTX            0x1

This define is also unused, although there is scope to use it in
patch 7 net/ice: refactor context descriptor handling

> +#define CI_TX_DESC_DTYPE_DESC_DONE      0xF
> +
> +/* Common TX Descriptor Command Flags */
> +#define CI_TX_DESC_CMD_EOP              0x0001
> +#define CI_TX_DESC_CMD_RS               0x0002
> +#define CI_TX_DESC_CMD_ICRC             0x0004
> +#define CI_TX_DESC_CMD_IL2TAG1          0x0008
> +#define CI_TX_DESC_CMD_DUMMY            0x0010
> +#define CI_TX_DESC_CMD_IIPT_IPV6        0x0020
> +#define CI_TX_DESC_CMD_IIPT_IPV4        0x0040
> +#define CI_TX_DESC_CMD_IIPT_IPV4_CSUM   0x0060
> +#define CI_TX_DESC_CMD_L4T_EOFT_TCP     0x0100
> +#define CI_TX_DESC_CMD_L4T_EOFT_SCTP    0x0200
> +#define CI_TX_DESC_CMD_L4T_EOFT_UDP     0x0300
> +
> +/* Common TX Context Descriptor Commands */
> +#define CI_TX_CTX_DESC_TSO              0x01
> +#define CI_TX_CTX_DESC_TSYN             0x02
> +#define CI_TX_CTX_DESC_IL2TAG2          0x04
> +
> +/* Common TX Descriptor Length Field Shifts */
> +#define CI_TX_DESC_LEN_MACLEN_S         0  /* 7 BITS */
> +#define CI_TX_DESC_LEN_IPLEN_S          7  /* 7 BITS */
> +#define CI_TX_DESC_LEN_L4_LEN_S         14 /* 4 BITS */
> +
> +/* Common maximum data per TX descriptor */
> +#define CI_MAX_DATA_PER_TXD     (CI_TXD_QW1_TX_BUF_SZ_M >>
> CI_TXD_QW1_TX_BUF_SZ_S)
> +
> +/**
> + * Common TX offload union for Intel drivers.
> + * Supports both basic offloads (l2_len, l3_len, l4_len, tso_segsz) and
> + * extended offloads (outer_l2_len, outer_l3_len) for tunneling support.
> + */
> +union ci_tx_offload {
> +	uint64_t data;
> +	struct {
> +		uint64_t l2_len:7;        /**< L2 (MAC) Header Length. */
> +		uint64_t l3_len:9;        /**< L3 (IP) Header Length. */
> +		uint64_t l4_len:8;        /**< L4 Header Length. */
> +		uint64_t tso_segsz:16;    /**< TCP TSO segment size */
> +		uint64_t outer_l2_len:8;  /**< outer L2 Header Length */
> +		uint64_t outer_l3_len:16; /**< outer L3 Header Length */
> +	};
> +};
> +
>  /*
>   * Structure of a 16-byte Tx descriptor common across i40e, ice, iavf and idpf
> drivers
>   */
> @@ -286,8 +346,8 @@ ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
>  	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
> 
>  	/* Check if descriptor is done - all drivers use 0xF as done value in bits
> 3:0 */

I think this comment referencing 0xF is out of place now that we're not using 0xF
rather CI_TX_DESC_DTYPE_DESC_DONE in the code below.

> -	if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
> rte_cpu_to_le_64(0xFUL)) !=
> -			rte_cpu_to_le_64(0xFUL)) {
> +	if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
> rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
> +			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
> {
>  		/* Descriptor not yet processed by hardware */
>  		return -1;
>  	}
> diff --git a/drivers/net/intel/i40e/i40e_fdir.c
> b/drivers/net/intel/i40e/i40e_fdir.c
> index 8a01aec0e2..3b099d5a9e 100644
> --- a/drivers/net/intel/i40e/i40e_fdir.c
> +++ b/drivers/net/intel/i40e/i40e_fdir.c
> @@ -916,11 +916,11 @@ i40e_build_ctob(uint32_t td_cmd,
>  		unsigned int size,
>  		uint32_t td_tag)
>  {
> -	return rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DATA |
> -			((uint64_t)td_cmd  << I40E_TXD_QW1_CMD_SHIFT) |
> -			((uint64_t)td_offset <<
> I40E_TXD_QW1_OFFSET_SHIFT) |
> -			((uint64_t)size  <<
> I40E_TXD_QW1_TX_BUF_SZ_SHIFT) |
> -			((uint64_t)td_tag  <<
> I40E_TXD_QW1_L2TAG1_SHIFT));
> +	return rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
> +			((uint64_t)td_cmd  << CI_TXD_QW1_CMD_S) |
> +			((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
> +			((uint64_t)size  << CI_TXD_QW1_TX_BUF_SZ_S) |
> +			((uint64_t)td_tag  << CI_TXD_QW1_L2TAG1_S));
>  }
> 
>  /*
> @@ -1384,8 +1384,8 @@ i40e_find_available_buffer(struct rte_eth_dev
> *dev)
> 
>  		do {
>  			if ((tmp_txdp->cmd_type_offset_bsz &
> -
> 	rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) ==
> -
> 	rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE))
> +
> 	rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
> +
> 	rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
>  				fdir_info->txq_available_buf_count++;
>  			else
>  				break;
> @@ -1710,9 +1710,9 @@ i40e_flow_fdir_filter_programming(struct i40e_pf
> *pf,
>  	txdp = &txq->ci_tx_ring[txq->tx_tail + 1];
>  	txdp->buffer_addr = rte_cpu_to_le_64(pf->fdir.dma_addr[txq->tx_tail
> >> 1]);
> 
> -	td_cmd = I40E_TX_DESC_CMD_EOP |
> -		 I40E_TX_DESC_CMD_RS  |
> -		 I40E_TX_DESC_CMD_DUMMY;
> +	td_cmd = CI_TX_DESC_CMD_EOP |
> +		 CI_TX_DESC_CMD_RS  |
> +		 CI_TX_DESC_CMD_DUMMY;
> 
>  	txdp->cmd_type_offset_bsz =
>  		i40e_build_ctob(td_cmd, 0, I40E_FDIR_PKT_LEN, 0);
> @@ -1731,8 +1731,8 @@ i40e_flow_fdir_filter_programming(struct i40e_pf
> *pf,
>  	if (wait_status) {
>  		for (i = 0; i < I40E_FDIR_MAX_WAIT_US; i++) {
>  			if ((txdp->cmd_type_offset_bsz &
> -
> 	rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) ==
> -
> 	rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE))
> +
> 	rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
> +
> 	rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
>  				break;
>  			rte_delay_us(1);
>  		}
> diff --git a/drivers/net/intel/i40e/i40e_rxtx.c
> b/drivers/net/intel/i40e/i40e_rxtx.c
> index 2760e76e99..f96c5c7f1e 100644
> --- a/drivers/net/intel/i40e/i40e_rxtx.c
> +++ b/drivers/net/intel/i40e/i40e_rxtx.c
> @@ -45,7 +45,7 @@
>  /* Base address of the HW descriptor ring should be 128B aligned. */
>  #define I40E_RING_BASE_ALIGN	128
> 
> -#define I40E_TXD_CMD (I40E_TX_DESC_CMD_EOP |
> I40E_TX_DESC_CMD_RS)
> +#define I40E_TXD_CMD (CI_TX_DESC_CMD_EOP | CI_TX_DESC_CMD_RS)
> 
>  #ifdef RTE_LIBRTE_IEEE1588
>  #define I40E_TX_IEEE1588_TMST RTE_MBUF_F_TX_IEEE1588_TMST
> @@ -260,7 +260,7 @@ i40e_rxd_build_fdir(volatile union ci_rx_desc *rxdp,
> struct rte_mbuf *mb)
> 
>  static inline void
>  i40e_parse_tunneling_params(uint64_t ol_flags,
> -			    union i40e_tx_offload tx_offload,
> +			    union ci_tx_offload tx_offload,
>  			    uint32_t *cd_tunneling)
>  {
>  	/* EIPT: External (outer) IP header type */
> @@ -319,51 +319,51 @@ static inline void
>  i40e_txd_enable_checksum(uint64_t ol_flags,
>  			uint32_t *td_cmd,
>  			uint32_t *td_offset,
> -			union i40e_tx_offload tx_offload)
> +			union ci_tx_offload tx_offload)
>  {
>  	/* Set MACLEN */
>  	if (!(ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK))
>  		*td_offset |= (tx_offload.l2_len >> 1)
> -			<< I40E_TX_DESC_LENGTH_MACLEN_SHIFT;
> +			<< CI_TX_DESC_LEN_MACLEN_S;
> 
>  	/* Enable L3 checksum offloads */
>  	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
> -		*td_cmd |= I40E_TX_DESC_CMD_IIPT_IPV4_CSUM;
> +		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
>  		*td_offset |= (tx_offload.l3_len >> 2)
> -				<< I40E_TX_DESC_LENGTH_IPLEN_SHIFT;
> +				<< CI_TX_DESC_LEN_IPLEN_S;
>  	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
> -		*td_cmd |= I40E_TX_DESC_CMD_IIPT_IPV4;
> +		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
>  		*td_offset |= (tx_offload.l3_len >> 2)
> -				<< I40E_TX_DESC_LENGTH_IPLEN_SHIFT;
> +				<< CI_TX_DESC_LEN_IPLEN_S;
>  	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
> -		*td_cmd |= I40E_TX_DESC_CMD_IIPT_IPV6;
> +		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
>  		*td_offset |= (tx_offload.l3_len >> 2)
> -				<< I40E_TX_DESC_LENGTH_IPLEN_SHIFT;
> +				<< CI_TX_DESC_LEN_IPLEN_S;
>  	}
> 
>  	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
> -		*td_cmd |= I40E_TX_DESC_CMD_L4T_EOFT_TCP;
> +		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
>  		*td_offset |= (tx_offload.l4_len >> 2)
> -			<< I40E_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
> +			<< CI_TX_DESC_LEN_L4_LEN_S;
>  		return;
>  	}
> 
>  	/* Enable L4 checksum offloads */
>  	switch (ol_flags & RTE_MBUF_F_TX_L4_MASK) {
>  	case RTE_MBUF_F_TX_TCP_CKSUM:
> -		*td_cmd |= I40E_TX_DESC_CMD_L4T_EOFT_TCP;
> +		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
>  		*td_offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
> -				I40E_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
> +				CI_TX_DESC_LEN_L4_LEN_S;
>  		break;
>  	case RTE_MBUF_F_TX_SCTP_CKSUM:
> -		*td_cmd |= I40E_TX_DESC_CMD_L4T_EOFT_SCTP;
> +		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
>  		*td_offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
> -				I40E_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
> +				CI_TX_DESC_LEN_L4_LEN_S;
>  		break;
>  	case RTE_MBUF_F_TX_UDP_CKSUM:
> -		*td_cmd |= I40E_TX_DESC_CMD_L4T_EOFT_UDP;
> +		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
>  		*td_offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
> -				I40E_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
> +				CI_TX_DESC_LEN_L4_LEN_S;
>  		break;
>  	default:
>  		break;
> @@ -377,11 +377,11 @@ i40e_build_ctob(uint32_t td_cmd,
>  		unsigned int size,
>  		uint32_t td_tag)
>  {
> -	return rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DATA |
> -			((uint64_t)td_cmd  << I40E_TXD_QW1_CMD_SHIFT) |
> -			((uint64_t)td_offset <<
> I40E_TXD_QW1_OFFSET_SHIFT) |
> -			((uint64_t)size  <<
> I40E_TXD_QW1_TX_BUF_SZ_SHIFT) |
> -			((uint64_t)td_tag  <<
> I40E_TXD_QW1_L2TAG1_SHIFT));
> +	return rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
> +			((uint64_t)td_cmd  << CI_TXD_QW1_CMD_S) |
> +			((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
> +			((uint64_t)size  << CI_TXD_QW1_TX_BUF_SZ_S) |
> +			((uint64_t)td_tag  << CI_TXD_QW1_L2TAG1_S));
>  }
> 
>  static inline int
> @@ -1004,7 +1004,7 @@ i40e_calc_context_desc(uint64_t flags)
> 
>  /* set i40e TSO context descriptor */
>  static inline uint64_t
> -i40e_set_tso_ctx(struct rte_mbuf *mbuf, union i40e_tx_offload tx_offload)
> +i40e_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
>  {
>  	uint64_t ctx_desc = 0;
>  	uint32_t cd_cmd, hdr_len, cd_tso_len;
> @@ -1029,9 +1029,6 @@ i40e_set_tso_ctx(struct rte_mbuf *mbuf, union
> i40e_tx_offload tx_offload)
>  	return ctx_desc;
>  }
> 
> -/* HW requires that Tx buffer size ranges from 1B up to (16K-1)B. */
> -#define I40E_MAX_DATA_PER_TXD \
> -	(I40E_TXD_QW1_TX_BUF_SZ_MASK >>
> I40E_TXD_QW1_TX_BUF_SZ_SHIFT)
>  /* Calculate the number of TX descriptors needed for each pkt */
>  static inline uint16_t
>  i40e_calc_pkt_desc(struct rte_mbuf *tx_pkt)
> @@ -1040,7 +1037,7 @@ i40e_calc_pkt_desc(struct rte_mbuf *tx_pkt)
>  	uint16_t count = 0;
> 
>  	while (txd != NULL) {
> -		count += DIV_ROUND_UP(txd->data_len,
> I40E_MAX_DATA_PER_TXD);
> +		count += DIV_ROUND_UP(txd->data_len,
> CI_MAX_DATA_PER_TXD);
>  		txd = txd->next;
>  	}
> 
> @@ -1069,7 +1066,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
>  	uint16_t tx_last;
>  	uint16_t slen;
>  	uint64_t buf_dma_addr;
> -	union i40e_tx_offload tx_offload = {0};
> +	union ci_tx_offload tx_offload = {0};
> 
>  	txq = tx_queue;
>  	sw_ring = txq->sw_ring;
> @@ -1138,18 +1135,18 @@ i40e_xmit_pkts(void *tx_queue, struct
> rte_mbuf **tx_pkts, uint16_t nb_pkts)
> 
>  		/* Descriptor based VLAN insertion */
>  		if (ol_flags & (RTE_MBUF_F_TX_VLAN |
> RTE_MBUF_F_TX_QINQ)) {
> -			td_cmd |= I40E_TX_DESC_CMD_IL2TAG1;
> +			td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
>  			td_tag = tx_pkt->vlan_tci;
>  		}
> 
>  		/* Always enable CRC offload insertion */
> -		td_cmd |= I40E_TX_DESC_CMD_ICRC;
> +		td_cmd |= CI_TX_DESC_CMD_ICRC;
> 
>  		/* Fill in tunneling parameters if necessary */
>  		cd_tunneling_params = 0;
>  		if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) {
>  			td_offset |= (tx_offload.outer_l2_len >> 1)
> -					<<
> I40E_TX_DESC_LENGTH_MACLEN_SHIFT;
> +					<< CI_TX_DESC_LEN_MACLEN_S;
>  			i40e_parse_tunneling_params(ol_flags, tx_offload,
>  						    &cd_tunneling_params);
>  		}
> @@ -1229,16 +1226,16 @@ i40e_xmit_pkts(void *tx_queue, struct
> rte_mbuf **tx_pkts, uint16_t nb_pkts)
>  			buf_dma_addr = rte_mbuf_data_iova(m_seg);
> 
>  			while ((ol_flags & RTE_MBUF_F_TX_TCP_SEG) &&
> -				unlikely(slen > I40E_MAX_DATA_PER_TXD)) {
> +				unlikely(slen > CI_MAX_DATA_PER_TXD)) {
>  				txd->buffer_addr =
>  					rte_cpu_to_le_64(buf_dma_addr);
>  				txd->cmd_type_offset_bsz =
>  					i40e_build_ctob(td_cmd,
> -					td_offset, I40E_MAX_DATA_PER_TXD,
> +					td_offset, CI_MAX_DATA_PER_TXD,
>  					td_tag);
> 
> -				buf_dma_addr += I40E_MAX_DATA_PER_TXD;
> -				slen -= I40E_MAX_DATA_PER_TXD;
> +				buf_dma_addr += CI_MAX_DATA_PER_TXD;
> +				slen -= CI_MAX_DATA_PER_TXD;
> 
>  				txe->last_id = tx_last;
>  				tx_id = txe->next_id;
> @@ -1265,7 +1262,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
>  		} while (m_seg != NULL);
> 
>  		/* The last packet data descriptor needs End Of Packet (EOP)
> */
> -		td_cmd |= I40E_TX_DESC_CMD_EOP;
> +		td_cmd |= CI_TX_DESC_CMD_EOP;
>  		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
>  		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
> 
> @@ -1275,15 +1272,14 @@ i40e_xmit_pkts(void *tx_queue, struct
> rte_mbuf **tx_pkts, uint16_t nb_pkts)
>  				   "%4u (port=%d queue=%d)",
>  				   tx_last, txq->port_id, txq->queue_id);
> 
> -			td_cmd |= I40E_TX_DESC_CMD_RS;
> +			td_cmd |= CI_TX_DESC_CMD_RS;
> 
>  			/* Update txq RS bit counters */
>  			txq->nb_tx_used = 0;
>  		}
> 
>  		txd->cmd_type_offset_bsz |=
> -			rte_cpu_to_le_64(((uint64_t)td_cmd) <<
> -					I40E_TXD_QW1_CMD_SHIFT);
> +			rte_cpu_to_le_64(((uint64_t)td_cmd) <<
> CI_TXD_QW1_CMD_S);
>  	}
> 
>  end_of_tx:
> @@ -1309,8 +1305,8 @@ i40e_tx_free_bufs(struct ci_tx_queue *txq)
>  	const uint16_t m = tx_rs_thresh % I40E_TX_MAX_FREE_BUF_SZ;
> 
>  	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
> -			rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) !=
> -
> 	rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE))
> +			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
> +			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
>  		return 0;
> 
>  	txep = &txq->sw_ring[txq->tx_next_dd - (tx_rs_thresh - 1)];
> @@ -1441,8 +1437,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
>  		n = (uint16_t)(txq->nb_tx_desc - txq->tx_tail);
>  		i40e_tx_fill_hw_ring(txq, tx_pkts, n);
>  		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
> -
> 	rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
> -
> 	I40E_TXD_QW1_CMD_SHIFT);
> +			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS)
> << CI_TXD_QW1_CMD_S);
>  		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
>  		txq->tx_tail = 0;
>  	}
> @@ -1454,8 +1449,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
>  	/* Determine if RS bit needs to be set */
>  	if (txq->tx_tail > txq->tx_next_rs) {
>  		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
> -
> 	rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
> -
> 	I40E_TXD_QW1_CMD_SHIFT);
> +			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS)
> << CI_TXD_QW1_CMD_S);
>  		txq->tx_next_rs =
>  			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
>  		if (txq->tx_next_rs >= txq->nb_tx_desc)
> @@ -2383,9 +2377,9 @@ i40e_dev_tx_descriptor_status(void *tx_queue,
> uint16_t offset)
>  	}
> 
>  	status = &txq->ci_tx_ring[desc].cmd_type_offset_bsz;
> -	mask = rte_le_to_cpu_64(I40E_TXD_QW1_DTYPE_MASK);
> +	mask = rte_le_to_cpu_64(CI_TXD_QW1_DTYPE_M);
>  	expect = rte_cpu_to_le_64(
> -		I40E_TX_DESC_DTYPE_DESC_DONE <<
> I40E_TXD_QW1_DTYPE_SHIFT);
> +		CI_TX_DESC_DTYPE_DESC_DONE << CI_TXD_QW1_DTYPE_S);
>  	if ((*status & mask) == expect)
>  		return RTE_ETH_TX_DESC_DONE;
> 
> @@ -2883,7 +2877,7 @@ i40e_reset_tx_queue(struct ci_tx_queue *txq)
>  		volatile struct ci_tx_desc *txd = &txq->ci_tx_ring[i];
> 
>  		txd->cmd_type_offset_bsz =
> -
> 	rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE);
> +			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
>  		txe[i].mbuf =  NULL;
>  		txe[i].last_id = i;
>  		txe[prev].next_id = i;
> diff --git a/drivers/net/intel/i40e/i40e_rxtx.h
> b/drivers/net/intel/i40e/i40e_rxtx.h
> index ed173d8f17..307ffa3049 100644
> --- a/drivers/net/intel/i40e/i40e_rxtx.h
> +++ b/drivers/net/intel/i40e/i40e_rxtx.h
> @@ -47,8 +47,8 @@
>  #define I40E_RX_DESC_EXT_STATUS_FLEXBL_MASK   0x03
>  #define I40E_RX_DESC_EXT_STATUS_FLEXBL_FLEX   0x01
> 
> -#define I40E_TD_CMD (I40E_TX_DESC_CMD_ICRC |\
> -		     I40E_TX_DESC_CMD_EOP)
> +#define I40E_TD_CMD (CI_TX_DESC_CMD_ICRC |\
> +		     CI_TX_DESC_CMD_EOP)
> 
>  enum i40e_header_split_mode {
>  	i40e_header_split_none = 0,
> @@ -110,19 +110,6 @@ enum i40e_header_split_mode {
> 
>  #define I40E_TX_VECTOR_OFFLOADS
> RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE
> 
> -/** Offload features */
> -union i40e_tx_offload {
> -	uint64_t data;
> -	struct {
> -		uint64_t l2_len:7; /**< L2 (MAC) Header Length. */
> -		uint64_t l3_len:9; /**< L3 (IP) Header Length. */
> -		uint64_t l4_len:8; /**< L4 Header Length. */
> -		uint64_t tso_segsz:16; /**< TCP TSO segment size */
> -		uint64_t outer_l2_len:8; /**< outer L2 Header Length */
> -		uint64_t outer_l3_len:16; /**< outer L3 Header Length */
> -	};
> -};
> -
>  int i40e_dev_rx_queue_start(struct rte_eth_dev *dev, uint16_t rx_queue_id);
>  int i40e_dev_rx_queue_stop(struct rte_eth_dev *dev, uint16_t rx_queue_id);
>  int i40e_dev_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id);
> diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
> b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
> index 81e9e2bc0b..4c36748d94 100644
> --- a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
> +++ b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
> @@ -449,9 +449,9 @@ static inline void
>  vtx1(volatile struct ci_tx_desc *txdp,
>  	struct rte_mbuf *pkt, uint64_t flags)
>  {
> -	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
> -		((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT) |
> -		((uint64_t)pkt->data_len <<
> I40E_TXD_QW1_TX_BUF_SZ_SHIFT));
> +	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
> +		((uint64_t)flags << CI_TXD_QW1_CMD_S) |
> +		((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
> 
>  	__vector unsigned long descriptor = (__vector unsigned long){
>  		pkt->buf_iova + pkt->data_off, high_qw};
> @@ -477,7 +477,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct
> rte_mbuf **tx_pkts,
>  	struct ci_tx_entry_vec *txep;
>  	uint16_t n, nb_commit, tx_id;
>  	uint64_t flags = I40E_TD_CMD;
> -	uint64_t rs = I40E_TX_DESC_CMD_RS | I40E_TD_CMD;
> +	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
>  	int i;
> 
>  	if (txq->nb_tx_free < txq->tx_free_thresh)
> @@ -520,8 +520,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct
> rte_mbuf **tx_pkts,
>  	tx_id = (uint16_t)(tx_id + nb_commit);
>  	if (tx_id > txq->tx_next_rs) {
>  		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
> -
> 	rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
> -
> 	I40E_TXD_QW1_CMD_SHIFT);
> +			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS)
> << CI_TXD_QW1_CMD_S);
>  		txq->tx_next_rs =
>  			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
>  	}
> diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
> b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
> index f054bd41bf..502a1842c6 100644
> --- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
> +++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
> @@ -684,9 +684,9 @@ static inline void
>  vtx1(volatile struct ci_tx_desc *txdp,
>  		struct rte_mbuf *pkt, uint64_t flags)
>  {
> -	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
> -			((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT) |
> -			((uint64_t)pkt->data_len <<
> I40E_TXD_QW1_TX_BUF_SZ_SHIFT));
> +	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
> +			((uint64_t)flags << CI_TXD_QW1_CMD_S) |
> +			((uint64_t)pkt->data_len <<
> CI_TXD_QW1_TX_BUF_SZ_S));
> 
>  	__m128i descriptor = _mm_set_epi64x(high_qw,
>  				pkt->buf_iova + pkt->data_off);
> @@ -697,8 +697,7 @@ static inline void
>  vtx(volatile struct ci_tx_desc *txdp,
>  		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
>  {
> -	const uint64_t hi_qw_tmpl = (I40E_TX_DESC_DTYPE_DATA |
> -			((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT));
> +	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags <<
> CI_TXD_QW1_CMD_S));
> 
>  	/* if unaligned on 32-bit boundary, do one to align */
>  	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
> @@ -709,13 +708,13 @@ vtx(volatile struct ci_tx_desc *txdp,
>  	/* do two at a time while possible, in bursts */
>  	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
>  		uint64_t hi_qw3 = hi_qw_tmpl |
> -				((uint64_t)pkt[3]->data_len <<
> I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
> +				((uint64_t)pkt[3]->data_len <<
> CI_TXD_QW1_TX_BUF_SZ_S);
>  		uint64_t hi_qw2 = hi_qw_tmpl |
> -				((uint64_t)pkt[2]->data_len <<
> I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
> +				((uint64_t)pkt[2]->data_len <<
> CI_TXD_QW1_TX_BUF_SZ_S);
>  		uint64_t hi_qw1 = hi_qw_tmpl |
> -				((uint64_t)pkt[1]->data_len <<
> I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
> +				((uint64_t)pkt[1]->data_len <<
> CI_TXD_QW1_TX_BUF_SZ_S);
>  		uint64_t hi_qw0 = hi_qw_tmpl |
> -				((uint64_t)pkt[0]->data_len <<
> I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
> +				((uint64_t)pkt[0]->data_len <<
> CI_TXD_QW1_TX_BUF_SZ_S);
> 
>  		__m256i desc2_3 = _mm256_set_epi64x(
>  				hi_qw3, pkt[3]->buf_iova + pkt[3]->data_off,
> @@ -743,7 +742,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue,
> struct rte_mbuf **tx_pkts,
>  	struct ci_tx_entry_vec *txep;
>  	uint16_t n, nb_commit, tx_id;
>  	uint64_t flags = I40E_TD_CMD;
> -	uint64_t rs = I40E_TX_DESC_CMD_RS | I40E_TD_CMD;
> +	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
> 
>  	if (txq->nb_tx_free < txq->tx_free_thresh)
>  		ci_tx_free_bufs_vec(txq, i40e_tx_desc_done, false);
> @@ -785,8 +784,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue,
> struct rte_mbuf **tx_pkts,
>  	tx_id = (uint16_t)(tx_id + nb_commit);
>  	if (tx_id > txq->tx_next_rs) {
>  		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
> -
> 	rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
> -
> 	I40E_TXD_QW1_CMD_SHIFT);
> +			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS)
> << CI_TXD_QW1_CMD_S);
>  		txq->tx_next_rs =
>  			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
>  	}
> diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
> b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
> index 9a967faeee..d48ff9f51e 100644
> --- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
> +++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
> @@ -752,9 +752,9 @@ i40e_recv_scattered_pkts_vec_avx512(void
> *rx_queue,
>  static inline void
>  vtx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags)
>  {
> -	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
> -		((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT) |
> -		((uint64_t)pkt->data_len <<
> I40E_TXD_QW1_TX_BUF_SZ_SHIFT));
> +	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
> +		((uint64_t)flags << CI_TXD_QW1_CMD_S) |
> +		((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
> 
>  	__m128i descriptor = _mm_set_epi64x(high_qw,
>  				pkt->buf_iova + pkt->data_off);
> @@ -765,26 +765,17 @@ static inline void
>  vtx(volatile struct ci_tx_desc *txdp,
>  	struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
>  {
> -	const uint64_t hi_qw_tmpl = (I40E_TX_DESC_DTYPE_DATA |
> -			((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT));
> +	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags <<
> CI_TXD_QW1_CMD_S));
> 
>  	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
> -		uint64_t hi_qw3 =
> -			hi_qw_tmpl |
> -			((uint64_t)pkt[3]->data_len <<
> -			 I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
> -		uint64_t hi_qw2 =
> -			hi_qw_tmpl |
> -			((uint64_t)pkt[2]->data_len <<
> -			 I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
> -		uint64_t hi_qw1 =
> -			hi_qw_tmpl |
> -			((uint64_t)pkt[1]->data_len <<
> -			 I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
> -		uint64_t hi_qw0 =
> -			hi_qw_tmpl |
> -			((uint64_t)pkt[0]->data_len <<
> -			 I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
> +		uint64_t hi_qw3 = hi_qw_tmpl |
> +			((uint64_t)pkt[3]->data_len <<
> CI_TXD_QW1_TX_BUF_SZ_S);
> +		uint64_t hi_qw2 = hi_qw_tmpl |
> +			((uint64_t)pkt[2]->data_len <<
> CI_TXD_QW1_TX_BUF_SZ_S);
> +		uint64_t hi_qw1 = hi_qw_tmpl |
> +			((uint64_t)pkt[1]->data_len <<
> CI_TXD_QW1_TX_BUF_SZ_S);
> +		uint64_t hi_qw0 = hi_qw_tmpl |
> +			((uint64_t)pkt[0]->data_len <<
> CI_TXD_QW1_TX_BUF_SZ_S);
> 
>  		__m512i desc0_3 =
>  			_mm512_set_epi64
> @@ -811,7 +802,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue,
> struct rte_mbuf **tx_pkts,
>  	struct ci_tx_entry_vec *txep;
>  	uint16_t n, nb_commit, tx_id;
>  	uint64_t flags = I40E_TD_CMD;
> -	uint64_t rs = I40E_TX_DESC_CMD_RS | I40E_TD_CMD;
> +	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
> 
>  	if (txq->nb_tx_free < txq->tx_free_thresh)
>  		ci_tx_free_bufs_vec(txq, i40e_tx_desc_done, false);
> @@ -854,8 +845,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue,
> struct rte_mbuf **tx_pkts,
>  	tx_id = (uint16_t)(tx_id + nb_commit);
>  	if (tx_id > txq->tx_next_rs) {
>  		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
> -
> 	rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
> -
> 	I40E_TXD_QW1_CMD_SHIFT);
> +			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS)
> << CI_TXD_QW1_CMD_S);
>  		txq->tx_next_rs =
>  			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
>  	}
> diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
> b/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
> index 1fd7fc75bf..292a39501e 100644
> --- a/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
> +++ b/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
> @@ -16,8 +16,8 @@ static inline int
>  i40e_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
>  {
>  	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
> -			rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) ==
> -
> 	rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE);
> +			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
> +
> 	rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
>  }
> 
>  static inline void
> diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
> b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
> index 0b95152232..be4c64942e 100644
> --- a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
> +++ b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
> @@ -600,9 +600,9 @@ static inline void
>  vtx1(volatile struct ci_tx_desc *txdp,
>  		struct rte_mbuf *pkt, uint64_t flags)
>  {
> -	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
> -			((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT) |
> -			((uint64_t)pkt->data_len <<
> I40E_TXD_QW1_TX_BUF_SZ_SHIFT));
> +	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
> +			((uint64_t)flags << CI_TXD_QW1_CMD_S) |
> +			((uint64_t)pkt->data_len <<
> CI_TXD_QW1_TX_BUF_SZ_S));
> 
>  	uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off, high_qw};
>  	vst1q_u64(RTE_CAST_PTR(uint64_t *, txdp), descriptor);
> @@ -627,7 +627,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict
> tx_queue,
>  	struct ci_tx_entry_vec *txep;
>  	uint16_t n, nb_commit, tx_id;
>  	uint64_t flags = I40E_TD_CMD;
> -	uint64_t rs = I40E_TX_DESC_CMD_RS | I40E_TD_CMD;
> +	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
>  	int i;
> 
>  	if (txq->nb_tx_free < txq->tx_free_thresh)
> @@ -669,8 +669,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict
> tx_queue,
>  	tx_id = (uint16_t)(tx_id + nb_commit);
>  	if (tx_id > txq->tx_next_rs) {
>  		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
> -
> 	rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
> -
> 	I40E_TXD_QW1_CMD_SHIFT);
> +			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS)
> << CI_TXD_QW1_CMD_S);
>  		txq->tx_next_rs =
>  			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
>  	}
> diff --git a/drivers/net/intel/iavf/iavf_rxtx.c
> b/drivers/net/intel/iavf/iavf_rxtx.c
> index 560abfc1ef..947b6c24d2 100644
> --- a/drivers/net/intel/iavf/iavf_rxtx.c
> +++ b/drivers/net/intel/iavf/iavf_rxtx.c
> @@ -274,7 +274,7 @@ reset_tx_queue(struct ci_tx_queue *txq)
>  	prev = (uint16_t)(txq->nb_tx_desc - 1);
>  	for (i = 0; i < txq->nb_tx_desc; i++) {
>  		txq->ci_tx_ring[i].cmd_type_offset_bsz =
> -
> 	rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE);
> +			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
>  		txe[i].mbuf =  NULL;
>  		txe[i].last_id = i;
>  		txe[prev].next_id = i;
> @@ -2351,12 +2351,12 @@ iavf_fill_ctx_desc_cmd_field(volatile uint64_t
> *field, struct rte_mbuf *m,
> 
>  	/* TSO enabled */
>  	if (m->ol_flags & (RTE_MBUF_F_TX_TCP_SEG |
> RTE_MBUF_F_TX_UDP_SEG))
> -		cmd = IAVF_TX_CTX_DESC_TSO <<
> IAVF_TXD_CTX_QW1_CMD_SHIFT;
> +		cmd = CI_TX_CTX_DESC_TSO <<
> IAVF_TXD_CTX_QW1_CMD_SHIFT;
> 
>  	if ((m->ol_flags & RTE_MBUF_F_TX_VLAN &&
>  			vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2)
> ||
>  			m->ol_flags & RTE_MBUF_F_TX_QINQ) {
> -		cmd |= IAVF_TX_CTX_DESC_IL2TAG2
> +		cmd |= CI_TX_CTX_DESC_IL2TAG2
>  			<< IAVF_TXD_CTX_QW1_CMD_SHIFT;
>  	}
> 
> @@ -2577,20 +2577,20 @@ iavf_build_data_desc_cmd_offset_fields(volatile
> uint64_t *qw1,
>  	uint64_t offset = 0;
>  	uint64_t l2tag1 = 0;
> 
> -	*qw1 = IAVF_TX_DESC_DTYPE_DATA;
> +	*qw1 = CI_TX_DESC_DTYPE_DATA;
> 
> -	command = (uint64_t)IAVF_TX_DESC_CMD_ICRC;
> +	command = (uint64_t)CI_TX_DESC_CMD_ICRC;
> 
>  	/* Descriptor based VLAN insertion */
>  	if ((vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1) &&
>  			m->ol_flags & RTE_MBUF_F_TX_VLAN) {
> -		command |= (uint64_t)IAVF_TX_DESC_CMD_IL2TAG1;
> +		command |= (uint64_t)CI_TX_DESC_CMD_IL2TAG1;
>  		l2tag1 |= m->vlan_tci;
>  	}
> 
>  	/* Descriptor based QinQ insertion. vlan_flag specifies outer tag
> location. */
>  	if (m->ol_flags & RTE_MBUF_F_TX_QINQ) {
> -		command |= (uint64_t)IAVF_TX_DESC_CMD_IL2TAG1;
> +		command |= (uint64_t)CI_TX_DESC_CMD_IL2TAG1;
>  		l2tag1 = vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1
> ? m->vlan_tci_outer :
>  									m-
> >vlan_tci;
>  	}
> @@ -2603,32 +2603,32 @@ iavf_build_data_desc_cmd_offset_fields(volatile
> uint64_t *qw1,
>  	if (m->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK &&
>  			!(m->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD))
>  		offset |= (m->outer_l2_len >> 1)
> -			<< IAVF_TX_DESC_LENGTH_MACLEN_SHIFT;
> +			<< CI_TX_DESC_LEN_MACLEN_S;
>  	else
>  		offset |= (m->l2_len >> 1)
> -			<< IAVF_TX_DESC_LENGTH_MACLEN_SHIFT;
> +			<< CI_TX_DESC_LEN_MACLEN_S;
> 
>  	/* Enable L3 checksum offloading inner */
>  	if (m->ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
>  		if (m->ol_flags & RTE_MBUF_F_TX_IPV4) {
> -			command |= IAVF_TX_DESC_CMD_IIPT_IPV4_CSUM;
> -			offset |= (m->l3_len >> 2) <<
> IAVF_TX_DESC_LENGTH_IPLEN_SHIFT;
> +			command |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
> +			offset |= (m->l3_len >> 2) <<
> CI_TX_DESC_LEN_IPLEN_S;
>  		}
>  	} else if (m->ol_flags & RTE_MBUF_F_TX_IPV4) {
> -		command |= IAVF_TX_DESC_CMD_IIPT_IPV4;
> -		offset |= (m->l3_len >> 2) <<
> IAVF_TX_DESC_LENGTH_IPLEN_SHIFT;
> +		command |= CI_TX_DESC_CMD_IIPT_IPV4;
> +		offset |= (m->l3_len >> 2) << CI_TX_DESC_LEN_IPLEN_S;
>  	} else if (m->ol_flags & RTE_MBUF_F_TX_IPV6) {
> -		command |= IAVF_TX_DESC_CMD_IIPT_IPV6;
> -		offset |= (m->l3_len >> 2) <<
> IAVF_TX_DESC_LENGTH_IPLEN_SHIFT;
> +		command |= CI_TX_DESC_CMD_IIPT_IPV6;
> +		offset |= (m->l3_len >> 2) << CI_TX_DESC_LEN_IPLEN_S;
>  	}
> 
>  	if (m->ol_flags & (RTE_MBUF_F_TX_TCP_SEG |
> RTE_MBUF_F_TX_UDP_SEG)) {
>  		if (m->ol_flags & RTE_MBUF_F_TX_TCP_SEG)
> -			command |= IAVF_TX_DESC_CMD_L4T_EOFT_TCP;
> +			command |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
>  		else
> -			command |= IAVF_TX_DESC_CMD_L4T_EOFT_UDP;
> +			command |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
>  		offset |= (m->l4_len >> 2) <<
> -			      IAVF_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
> +			      CI_TX_DESC_LEN_L4_LEN_S;
> 
>  		*qw1 = rte_cpu_to_le_64((((uint64_t)command <<
>  			IAVF_TXD_DATA_QW1_CMD_SHIFT) &
> IAVF_TXD_DATA_QW1_CMD_MASK) |
> @@ -2642,19 +2642,19 @@ iavf_build_data_desc_cmd_offset_fields(volatile
> uint64_t *qw1,
>  	/* Enable L4 checksum offloads */
>  	switch (m->ol_flags & RTE_MBUF_F_TX_L4_MASK) {
>  	case RTE_MBUF_F_TX_TCP_CKSUM:
> -		command |= IAVF_TX_DESC_CMD_L4T_EOFT_TCP;
> +		command |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
>  		offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
> -				IAVF_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
> +				CI_TX_DESC_LEN_L4_LEN_S;
>  		break;
>  	case RTE_MBUF_F_TX_SCTP_CKSUM:
> -		command |= IAVF_TX_DESC_CMD_L4T_EOFT_SCTP;
> +		command |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
>  		offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
> -				IAVF_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
> +				CI_TX_DESC_LEN_L4_LEN_S;
>  		break;
>  	case RTE_MBUF_F_TX_UDP_CKSUM:
> -		command |= IAVF_TX_DESC_CMD_L4T_EOFT_UDP;
> +		command |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
>  		offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
> -				IAVF_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
> +				CI_TX_DESC_LEN_L4_LEN_S;
>  		break;
>  	}
> 
> @@ -2674,8 +2674,7 @@ iavf_calc_pkt_desc(struct rte_mbuf *tx_pkt)
>  	uint16_t count = 0;
> 
>  	while (txd != NULL) {
> -		count += (txd->data_len + IAVF_MAX_DATA_PER_TXD - 1) /
> -			IAVF_MAX_DATA_PER_TXD;
> +		count += (txd->data_len + CI_MAX_DATA_PER_TXD - 1) /
> CI_MAX_DATA_PER_TXD;
>  		txd = txd->next;
>  	}
> 
> @@ -2881,14 +2880,14 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
>  			buf_dma_addr = rte_mbuf_data_iova(mb_seg);
>  			while ((mb_seg->ol_flags &
> (RTE_MBUF_F_TX_TCP_SEG |
>  					RTE_MBUF_F_TX_UDP_SEG)) &&
> -					unlikely(slen >
> IAVF_MAX_DATA_PER_TXD)) {
> +					unlikely(slen >
> CI_MAX_DATA_PER_TXD)) {
>  				iavf_fill_data_desc(ddesc, ddesc_template,
> -					IAVF_MAX_DATA_PER_TXD,
> buf_dma_addr);
> +					CI_MAX_DATA_PER_TXD,
> buf_dma_addr);
> 
>  				IAVF_DUMP_TX_DESC(txq, ddesc, desc_idx);
> 
> -				buf_dma_addr +=
> IAVF_MAX_DATA_PER_TXD;
> -				slen -= IAVF_MAX_DATA_PER_TXD;
> +				buf_dma_addr += CI_MAX_DATA_PER_TXD;
> +				slen -= CI_MAX_DATA_PER_TXD;
> 
>  				txe->last_id = desc_idx_last;
>  				desc_idx = txe->next_id;
> @@ -2909,7 +2908,7 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
>  		} while (mb_seg);
> 
>  		/* The last packet data descriptor needs End Of Packet (EOP)
> */
> -		ddesc_cmd = IAVF_TX_DESC_CMD_EOP;
> +		ddesc_cmd = CI_TX_DESC_CMD_EOP;
> 
>  		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used +
> nb_desc_required);
>  		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free -
> nb_desc_required);
> @@ -2919,7 +2918,7 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
>  				   "%4u (port=%d queue=%d)",
>  				   desc_idx_last, txq->port_id, txq->queue_id);
> 
> -			ddesc_cmd |= IAVF_TX_DESC_CMD_RS;
> +			ddesc_cmd |= CI_TX_DESC_CMD_RS;
> 
>  			/* Update txq RS bit counters */
>  			txq->nb_tx_used = 0;
> @@ -4423,9 +4422,8 @@ iavf_dev_tx_desc_status(void *tx_queue, uint16_t
> offset)
>  	}
> 
>  	status = &txq->ci_tx_ring[desc].cmd_type_offset_bsz;
> -	mask = rte_le_to_cpu_64(IAVF_TXD_QW1_DTYPE_MASK);
> -	expect = rte_cpu_to_le_64(
> -		 IAVF_TX_DESC_DTYPE_DESC_DONE <<
> IAVF_TXD_QW1_DTYPE_SHIFT);
> +	mask = rte_le_to_cpu_64(CI_TXD_QW1_DTYPE_M);
> +	expect = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE <<
> CI_TXD_QW1_DTYPE_S);
>  	if ((*status & mask) == expect)
>  		return RTE_ETH_TX_DESC_DONE;
> 
> diff --git a/drivers/net/intel/iavf/iavf_rxtx.h
> b/drivers/net/intel/iavf/iavf_rxtx.h
> index dd6d884fc1..395d97b4ee 100644
> --- a/drivers/net/intel/iavf/iavf_rxtx.h
> +++ b/drivers/net/intel/iavf/iavf_rxtx.h
> @@ -162,10 +162,6 @@
>  #define IAVF_TX_OFFLOAD_NOTSUP_MASK \
>  		(RTE_MBUF_F_TX_OFFLOAD_MASK ^
> IAVF_TX_OFFLOAD_MASK)
> 
> -/* HW requires that TX buffer size ranges from 1B up to (16K-1)B. */
> -#define IAVF_MAX_DATA_PER_TXD \
> -	(IAVF_TXD_QW1_TX_BUF_SZ_MASK >>
> IAVF_TXD_QW1_TX_BUF_SZ_SHIFT)
> -
>  #define IAVF_TX_LLDP_DYNFIELD "intel_pmd_dynfield_tx_lldp"
>  #define IAVF_CHECK_TX_LLDP(m) \
>  	((rte_pmd_iavf_tx_lldp_dynfield_offset > 0) && \
> @@ -195,18 +191,6 @@ struct iavf_rx_queue_stats {
>  	struct iavf_ipsec_crypto_stats ipsec_crypto;
>  };
> 
> -/* Offload features */
> -union iavf_tx_offload {
> -	uint64_t data;
> -	struct {
> -		uint64_t l2_len:7; /* L2 (MAC) Header Length. */
> -		uint64_t l3_len:9; /* L3 (IP) Header Length. */
> -		uint64_t l4_len:8; /* L4 Header Length. */
> -		uint64_t tso_segsz:16; /* TCP TSO segment size */
> -		/* uint64_t unused : 24; */
> -	};
> -};
> -
>  /* Rx Flex Descriptor
>   * RxDID Profile ID 16-21
>   * Flex-field 0: RSS hash lower 16-bits
> @@ -409,7 +393,7 @@ enum iavf_rx_flex_desc_ipsec_crypto_status {
> 
> 
>  #define IAVF_TXD_DATA_QW1_DTYPE_SHIFT	(0)
> -#define IAVF_TXD_DATA_QW1_DTYPE_MASK	(0xFUL <<
> IAVF_TXD_QW1_DTYPE_SHIFT)
> +#define IAVF_TXD_DATA_QW1_DTYPE_MASK	(0xFUL <<
> CI_TXD_QW1_DTYPE_S)
> 
>  #define IAVF_TXD_DATA_QW1_CMD_SHIFT	(4)
>  #define IAVF_TXD_DATA_QW1_CMD_MASK	(0x3FFUL <<
> IAVF_TXD_DATA_QW1_CMD_SHIFT)
> @@ -686,7 +670,7 @@ void iavf_dump_tx_descriptor(const struct
> ci_tx_queue *txq,
>  		rte_le_to_cpu_64(tx_desc->cmd_type_offset_bsz &
> 
> 	rte_cpu_to_le_64(IAVF_TXD_DATA_QW1_DTYPE_MASK));
>  	switch (type) {
> -	case IAVF_TX_DESC_DTYPE_DATA:
> +	case CI_TX_DESC_DTYPE_DATA:
>  		name = "Tx_data_desc";
>  		break;
>  	case IAVF_TX_DESC_DTYPE_CONTEXT:
> diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
> b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
> index 89ce841b9e..cea4ee9863 100644
> --- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
> +++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
> @@ -1633,10 +1633,9 @@ static __rte_always_inline void
>  iavf_vtx1(volatile struct ci_tx_desc *txdp,
>  	  struct rte_mbuf *pkt, uint64_t flags, bool offload, uint8_t vlan_flag)
>  {
> -	uint64_t high_qw =
> -		(IAVF_TX_DESC_DTYPE_DATA |
> -		 ((uint64_t)flags  << IAVF_TXD_QW1_CMD_SHIFT) |
> -		 ((uint64_t)pkt->data_len <<
> IAVF_TXD_QW1_TX_BUF_SZ_SHIFT));
> +	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
> +		 ((uint64_t)flags << CI_TXD_QW1_CMD_S) |
> +		 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
>  	if (offload)
>  		iavf_txd_enable_offload(pkt, &high_qw, vlan_flag);
> 
> @@ -1649,8 +1648,7 @@ static __rte_always_inline void
>  iavf_vtx(volatile struct ci_tx_desc *txdp,
>  	 struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags, bool
> offload, uint8_t vlan_flag)
>  {
> -	const uint64_t hi_qw_tmpl = (IAVF_TX_DESC_DTYPE_DATA |
> -			((uint64_t)flags  << IAVF_TXD_QW1_CMD_SHIFT));
> +	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags <<
> CI_TXD_QW1_CMD_S));
> 
>  	/* if unaligned on 32-bit boundary, do one to align */
>  	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
> @@ -1660,28 +1658,20 @@ iavf_vtx(volatile struct ci_tx_desc *txdp,
> 
>  	/* do two at a time while possible, in bursts */
>  	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
> -		uint64_t hi_qw3 =
> -			hi_qw_tmpl |
> -			((uint64_t)pkt[3]->data_len <<
> -			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
> +		uint64_t hi_qw3 = hi_qw_tmpl |
> +			((uint64_t)pkt[3]->data_len <<
> CI_TXD_QW1_TX_BUF_SZ_S);
>  		if (offload)
>  			iavf_txd_enable_offload(pkt[3], &hi_qw3, vlan_flag);
> -		uint64_t hi_qw2 =
> -			hi_qw_tmpl |
> -			((uint64_t)pkt[2]->data_len <<
> -			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
> +		uint64_t hi_qw2 = hi_qw_tmpl |
> +			((uint64_t)pkt[2]->data_len <<
> CI_TXD_QW1_TX_BUF_SZ_S);
>  		if (offload)
>  			iavf_txd_enable_offload(pkt[2], &hi_qw2, vlan_flag);
> -		uint64_t hi_qw1 =
> -			hi_qw_tmpl |
> -			((uint64_t)pkt[1]->data_len <<
> -			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
> +		uint64_t hi_qw1 = hi_qw_tmpl |
> +			((uint64_t)pkt[1]->data_len <<
> CI_TXD_QW1_TX_BUF_SZ_S);
>  		if (offload)
>  			iavf_txd_enable_offload(pkt[1], &hi_qw1, vlan_flag);
> -		uint64_t hi_qw0 =
> -			hi_qw_tmpl |
> -			((uint64_t)pkt[0]->data_len <<
> -			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
> +		uint64_t hi_qw0 = hi_qw_tmpl |
> +			((uint64_t)pkt[0]->data_len <<
> CI_TXD_QW1_TX_BUF_SZ_S);
>  		if (offload)
>  			iavf_txd_enable_offload(pkt[0], &hi_qw0, vlan_flag);
> 
> @@ -1717,8 +1707,8 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue,
> struct rte_mbuf **tx_pkts,
>  	struct ci_tx_entry_vec *txep;
>  	uint16_t n, nb_commit, tx_id;
>  	/* bit2 is reserved and must be set to 1 according to Spec */
> -	uint64_t flags = IAVF_TX_DESC_CMD_EOP |
> IAVF_TX_DESC_CMD_ICRC;
> -	uint64_t rs = IAVF_TX_DESC_CMD_RS | flags;
> +	uint64_t flags = CI_TX_DESC_CMD_EOP | CI_TX_DESC_CMD_ICRC;
> +	uint64_t rs = CI_TX_DESC_CMD_RS | flags;
> 
>  	if (txq->nb_tx_free < txq->tx_free_thresh)
>  		ci_tx_free_bufs_vec(txq, iavf_tx_desc_done, false);
> @@ -1761,8 +1751,7 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue,
> struct rte_mbuf **tx_pkts,
>  	tx_id = (uint16_t)(tx_id + nb_commit);
>  	if (tx_id > txq->tx_next_rs) {
>  		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
> -
> 	rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
> -					 IAVF_TXD_QW1_CMD_SHIFT);
> +			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS)
> << CI_TXD_QW1_CMD_S);
>  		txq->tx_next_rs =
>  			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
>  	}
> diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
> b/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
> index ad1b0b90cd..01477fd501 100644
> --- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
> +++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
> @@ -1844,10 +1844,9 @@ iavf_vtx1(volatile struct ci_tx_desc *txdp,
>  	  struct rte_mbuf *pkt, uint64_t flags,
>  	  bool offload, uint8_t vlan_flag)
>  {
> -	uint64_t high_qw =
> -		(IAVF_TX_DESC_DTYPE_DATA |
> -		 ((uint64_t)flags  << IAVF_TXD_QW1_CMD_SHIFT) |
> -		 ((uint64_t)pkt->data_len <<
> IAVF_TXD_QW1_TX_BUF_SZ_SHIFT));
> +	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
> +		 ((uint64_t)flags << CI_TXD_QW1_CMD_S) |
> +		 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
>  	if (offload)
>  		iavf_txd_enable_offload(pkt, &high_qw, vlan_flag);
> 
> @@ -1863,8 +1862,7 @@ iavf_vtx(volatile struct ci_tx_desc *txdp,
>  		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags,
>  		bool offload, uint8_t vlan_flag)
>  {
> -	const uint64_t hi_qw_tmpl = (IAVF_TX_DESC_DTYPE_DATA |
> -			((uint64_t)flags  << IAVF_TXD_QW1_CMD_SHIFT));
> +	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags <<
> CI_TXD_QW1_CMD_S));
> 
>  	/* if unaligned on 32-bit boundary, do one to align */
>  	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
> @@ -1874,22 +1872,14 @@ iavf_vtx(volatile struct ci_tx_desc *txdp,
> 
>  	/* do 4 at a time while possible, in bursts */
>  	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
> -		uint64_t hi_qw3 =
> -			hi_qw_tmpl |
> -			((uint64_t)pkt[3]->data_len <<
> -			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
> -		uint64_t hi_qw2 =
> -			hi_qw_tmpl |
> -			((uint64_t)pkt[2]->data_len <<
> -			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
> -		uint64_t hi_qw1 =
> -			hi_qw_tmpl |
> -			((uint64_t)pkt[1]->data_len <<
> -			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
> -		uint64_t hi_qw0 =
> -			hi_qw_tmpl |
> -			((uint64_t)pkt[0]->data_len <<
> -			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
> +		uint64_t hi_qw3 = hi_qw_tmpl |
> +			((uint64_t)pkt[3]->data_len <<
> CI_TXD_QW1_TX_BUF_SZ_S);
> +		uint64_t hi_qw2 = hi_qw_tmpl |
> +			((uint64_t)pkt[2]->data_len <<
> CI_TXD_QW1_TX_BUF_SZ_S);
> +		uint64_t hi_qw1 = hi_qw_tmpl |
> +			((uint64_t)pkt[1]->data_len <<
> CI_TXD_QW1_TX_BUF_SZ_S);
> +		uint64_t hi_qw0 = hi_qw_tmpl |
> +			((uint64_t)pkt[0]->data_len <<
> CI_TXD_QW1_TX_BUF_SZ_S);
>  		if (offload) {
>  			iavf_txd_enable_offload(pkt[3], &hi_qw3, vlan_flag);
>  			iavf_txd_enable_offload(pkt[2], &hi_qw2, vlan_flag);
> @@ -2093,9 +2083,9 @@ ctx_vtx1(volatile struct ci_tx_desc *txdp, struct
> rte_mbuf *pkt,
>  	if (IAVF_CHECK_TX_LLDP(pkt))
>  		high_ctx_qw |= IAVF_TX_CTX_DESC_SWTCH_UPLINK
>  			<< IAVF_TXD_CTX_QW1_CMD_SHIFT;
> -	uint64_t high_data_qw = (IAVF_TX_DESC_DTYPE_DATA |
> -				((uint64_t)flags  <<
> IAVF_TXD_QW1_CMD_SHIFT) |
> -				((uint64_t)pkt->data_len <<
> IAVF_TXD_QW1_TX_BUF_SZ_SHIFT));
> +	uint64_t high_data_qw = (CI_TX_DESC_DTYPE_DATA |
> +				((uint64_t)flags << CI_TXD_QW1_CMD_S) |
> +				((uint64_t)pkt->data_len <<
> CI_TXD_QW1_TX_BUF_SZ_S));
>  	if (offload)
>  		iavf_txd_enable_offload(pkt, &high_data_qw, vlan_flag);
> 
> @@ -2110,8 +2100,7 @@ ctx_vtx(volatile struct ci_tx_desc *txdp,
>  		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags,
>  		bool offload, uint8_t vlan_flag)
>  {
> -	uint64_t hi_data_qw_tmpl = (IAVF_TX_DESC_DTYPE_DATA |
> -					((uint64_t)flags  <<
> IAVF_TXD_QW1_CMD_SHIFT));
> +	uint64_t hi_data_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags <<
> CI_TXD_QW1_CMD_S));
> 
>  	/* if unaligned on 32-bit boundary, do one to align */
>  	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
> @@ -2128,11 +2117,9 @@ ctx_vtx(volatile struct ci_tx_desc *txdp,
>  		uint64_t hi_data_qw0 = 0;
> 
>  		hi_data_qw1 = hi_data_qw_tmpl |
> -				((uint64_t)pkt[1]->data_len <<
> -					IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
> +				((uint64_t)pkt[1]->data_len <<
> CI_TXD_QW1_TX_BUF_SZ_S);
>  		hi_data_qw0 = hi_data_qw_tmpl |
> -				((uint64_t)pkt[0]->data_len <<
> -					IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
> +				((uint64_t)pkt[0]->data_len <<
> CI_TXD_QW1_TX_BUF_SZ_S);
> 
>  #ifdef IAVF_TX_VLAN_QINQ_OFFLOAD
>  		if (offload) {
> @@ -2140,13 +2127,11 @@ ctx_vtx(volatile struct ci_tx_desc *txdp,
>  				uint64_t qinq_tag = vlan_flag &
> IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2 ?
>  					(uint64_t)pkt[1]->vlan_tci :
>  					(uint64_t)pkt[1]->vlan_tci_outer;
> -				hi_ctx_qw1 |= IAVF_TX_CTX_DESC_IL2TAG2
> <<
> -
> 	IAVF_TXD_CTX_QW1_CMD_SHIFT;
> +				hi_ctx_qw1 |= CI_TX_CTX_DESC_IL2TAG2 <<
> CI_TXD_QW1_CMD_S;
>  				low_ctx_qw1 |= qinq_tag <<
> IAVF_TXD_CTX_QW0_L2TAG2_PARAM;
>  			} else if (pkt[1]->ol_flags & RTE_MBUF_F_TX_VLAN
> &&
>  					vlan_flag &
> IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2) {
> -				hi_ctx_qw1 |=
> -					IAVF_TX_CTX_DESC_IL2TAG2 <<
> IAVF_TXD_CTX_QW1_CMD_SHIFT;
> +				hi_ctx_qw1 |= IAVF_TX_CTX_DESC_IL2TAG2
> << CI_TXD_QW1_CMD_S;
>  				low_ctx_qw1 |=
>  					(uint64_t)pkt[1]->vlan_tci <<
> IAVF_TXD_CTX_QW0_L2TAG2_PARAM;
>  			}
> @@ -2154,7 +2139,7 @@ ctx_vtx(volatile struct ci_tx_desc *txdp,
>  #endif
>  		if (IAVF_CHECK_TX_LLDP(pkt[1]))
>  			hi_ctx_qw1 |= IAVF_TX_CTX_DESC_SWTCH_UPLINK
> -				<< IAVF_TXD_CTX_QW1_CMD_SHIFT;
> +				<< CI_TXD_QW1_CMD_S;
> 
>  #ifdef IAVF_TX_VLAN_QINQ_OFFLOAD
>  		if (offload) {
> @@ -2162,21 +2147,18 @@ ctx_vtx(volatile struct ci_tx_desc *txdp,
>  				uint64_t qinq_tag = vlan_flag &
> IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2 ?
>  					(uint64_t)pkt[0]->vlan_tci :
>  					(uint64_t)pkt[0]->vlan_tci_outer;
> -				hi_ctx_qw0 |= IAVF_TX_CTX_DESC_IL2TAG2
> <<
> -
> 	IAVF_TXD_CTX_QW1_CMD_SHIFT;
> +				hi_ctx_qw0 |= IAVF_TX_CTX_DESC_IL2TAG2
> << CI_TXD_QW1_CMD_S;
>  				low_ctx_qw0 |= qinq_tag <<
> IAVF_TXD_CTX_QW0_L2TAG2_PARAM;
>  			} else if (pkt[0]->ol_flags & RTE_MBUF_F_TX_VLAN
> &&
>  					vlan_flag &
> IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2) {
> -				hi_ctx_qw0 |=
> -					IAVF_TX_CTX_DESC_IL2TAG2 <<
> IAVF_TXD_CTX_QW1_CMD_SHIFT;
> +				hi_ctx_qw0 |= IAVF_TX_CTX_DESC_IL2TAG2
> << CI_TXD_QW1_CMD_S;
>  				low_ctx_qw0 |=
>  					(uint64_t)pkt[0]->vlan_tci <<
> IAVF_TXD_CTX_QW0_L2TAG2_PARAM;
>  			}
>  		}
>  #endif
>  		if (IAVF_CHECK_TX_LLDP(pkt[0]))
> -			hi_ctx_qw0 |= IAVF_TX_CTX_DESC_SWTCH_UPLINK
> -				<< IAVF_TXD_CTX_QW1_CMD_SHIFT;
> +			hi_ctx_qw0 |= IAVF_TX_CTX_DESC_SWTCH_UPLINK
> << CI_TXD_QW1_CMD_S;
> 
>  		if (offload) {
>  			iavf_txd_enable_offload(pkt[1], &hi_data_qw1,
> vlan_flag);
> @@ -2207,8 +2189,8 @@ iavf_xmit_fixed_burst_vec_avx512(void
> *tx_queue, struct rte_mbuf **tx_pkts,
>  	struct ci_tx_entry_vec *txep;
>  	uint16_t n, nb_commit, tx_id;
>  	/* bit2 is reserved and must be set to 1 according to Spec */
> -	uint64_t flags = IAVF_TX_DESC_CMD_EOP |
> IAVF_TX_DESC_CMD_ICRC;
> -	uint64_t rs = IAVF_TX_DESC_CMD_RS | flags;
> +	uint64_t flags = CI_TX_DESC_CMD_EOP | CI_TX_DESC_CMD_ICRC;
> +	uint64_t rs = CI_TX_DESC_CMD_RS | flags;
> 
>  	if (txq->nb_tx_free < txq->tx_free_thresh)
>  		ci_tx_free_bufs_vec(txq, iavf_tx_desc_done, false);
> @@ -2253,8 +2235,7 @@ iavf_xmit_fixed_burst_vec_avx512(void
> *tx_queue, struct rte_mbuf **tx_pkts,
>  	tx_id = (uint16_t)(tx_id + nb_commit);
>  	if (tx_id > txq->tx_next_rs) {
>  		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
> -
> 	rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
> -					 IAVF_TXD_QW1_CMD_SHIFT);
> +			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS)
> << CI_TXD_QW1_CMD_S);
>  		txq->tx_next_rs =
>  			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
>  	}
> @@ -2275,8 +2256,8 @@ iavf_xmit_fixed_burst_vec_avx512_ctx(void
> *tx_queue, struct rte_mbuf **tx_pkts,
>  	struct ci_tx_entry_vec *txep;
>  	uint16_t n, nb_commit, nb_mbuf, tx_id;
>  	/* bit2 is reserved and must be set to 1 according to Spec */
> -	uint64_t flags = IAVF_TX_DESC_CMD_EOP |
> IAVF_TX_DESC_CMD_ICRC;
> -	uint64_t rs = IAVF_TX_DESC_CMD_RS | flags;
> +	uint64_t flags = CI_TX_DESC_CMD_EOP | CI_TX_DESC_CMD_ICRC;
> +	uint64_t rs = CI_TX_DESC_CMD_RS | flags;
> 
>  	if (txq->nb_tx_free < txq->tx_free_thresh)
>  		ci_tx_free_bufs_vec(txq, iavf_tx_desc_done, true);
> @@ -2321,8 +2302,7 @@ iavf_xmit_fixed_burst_vec_avx512_ctx(void
> *tx_queue, struct rte_mbuf **tx_pkts,
> 
>  	if (tx_id > txq->tx_next_rs) {
>  		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
> -
> 	rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
> -					 IAVF_TXD_QW1_CMD_SHIFT);
> +			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS)
> << CI_TXD_QW1_CMD_S);
>  		txq->tx_next_rs =
>  			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
>  	}
> diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
> b/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
> index 1832b76f89..1538a44892 100644
> --- a/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
> +++ b/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
> @@ -15,8 +15,8 @@ static inline int
>  iavf_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
>  {
>  	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
> -			rte_cpu_to_le_64(IAVF_TXD_QW1_DTYPE_MASK)) ==
> -
> 	rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE);
> +			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
> +
> 	rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
>  }
> 
>  static inline void
> @@ -147,26 +147,26 @@ iavf_txd_enable_offload(__rte_unused struct
> rte_mbuf *tx_pkt,
>  	/* Set MACLEN */
>  	if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)
>  		td_offset |= (tx_pkt->outer_l2_len >> 1)
> -			<< IAVF_TX_DESC_LENGTH_MACLEN_SHIFT;
> +			<< CI_TX_DESC_LEN_MACLEN_S;
>  	else
>  		td_offset |= (tx_pkt->l2_len >> 1)
> -			<< IAVF_TX_DESC_LENGTH_MACLEN_SHIFT;
> +			<< CI_TX_DESC_LEN_MACLEN_S;
> 
>  	/* Enable L3 checksum offloads */
>  	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
>  		if (ol_flags & RTE_MBUF_F_TX_IPV4) {
> -			td_cmd |= IAVF_TX_DESC_CMD_IIPT_IPV4_CSUM;
> +			td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
>  			td_offset |= (tx_pkt->l3_len >> 2) <<
> -				     IAVF_TX_DESC_LENGTH_IPLEN_SHIFT;
> +				     CI_TX_DESC_LEN_IPLEN_S;
>  		}
>  	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
> -		td_cmd |= IAVF_TX_DESC_CMD_IIPT_IPV4;
> +		td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
>  		td_offset |= (tx_pkt->l3_len >> 2) <<
> -			     IAVF_TX_DESC_LENGTH_IPLEN_SHIFT;
> +			     CI_TX_DESC_LEN_IPLEN_S;
>  	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
> -		td_cmd |= IAVF_TX_DESC_CMD_IIPT_IPV6;
> +		td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
>  		td_offset |= (tx_pkt->l3_len >> 2) <<
> -			     IAVF_TX_DESC_LENGTH_IPLEN_SHIFT;
> +			     CI_TX_DESC_LEN_IPLEN_S;
>  	}
> 
>  	/* Enable L4 checksum offloads */
> @@ -190,7 +190,7 @@ iavf_txd_enable_offload(__rte_unused struct
> rte_mbuf *tx_pkt,
>  		break;
>  	}
> 
> -	*txd_hi |= ((uint64_t)td_offset) << IAVF_TXD_QW1_OFFSET_SHIFT;
> +	*txd_hi |= ((uint64_t)td_offset) << CI_TXD_QW1_OFFSET_S;
>  #endif
> 
>  #ifdef IAVF_TX_VLAN_QINQ_OFFLOAD
> @@ -198,17 +198,15 @@ iavf_txd_enable_offload(__rte_unused struct
> rte_mbuf *tx_pkt,
>  		td_cmd |= IAVF_TX_DESC_CMD_IL2TAG1;
>  		/* vlan_flag specifies outer tag location for QinQ. */
>  		if (vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1)
> -			*txd_hi |= ((uint64_t)tx_pkt->vlan_tci_outer <<
> -					IAVF_TXD_QW1_L2TAG1_SHIFT);
> +			*txd_hi |= ((uint64_t)tx_pkt->vlan_tci_outer <<
> CI_TXD_QW1_L2TAG1_S);
>  		else
> -			*txd_hi |= ((uint64_t)tx_pkt->vlan_tci <<
> -					IAVF_TXD_QW1_L2TAG1_SHIFT);
> +			*txd_hi |= ((uint64_t)tx_pkt->vlan_tci <<
> CI_TXD_QW1_L2TAG1_S);
>  	} else if (ol_flags & RTE_MBUF_F_TX_VLAN && vlan_flag &
> IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1) {
> -		td_cmd |= IAVF_TX_DESC_CMD_IL2TAG1;
> -		*txd_hi |= ((uint64_t)tx_pkt->vlan_tci <<
> IAVF_TXD_QW1_L2TAG1_SHIFT);
> +		td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
> +		*txd_hi |= ((uint64_t)tx_pkt->vlan_tci <<
> CI_TXD_QW1_L2TAG1_S);
>  	}
>  #endif
> 
> -	*txd_hi |= ((uint64_t)td_cmd) << IAVF_TXD_QW1_CMD_SHIFT;
> +	*txd_hi |= ((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S;
>  }
>  #endif
> diff --git a/drivers/net/intel/ice/ice_dcf_ethdev.c
> b/drivers/net/intel/ice/ice_dcf_ethdev.c
> index 5f537b4c12..4ceecc15c6 100644
> --- a/drivers/net/intel/ice/ice_dcf_ethdev.c
> +++ b/drivers/net/intel/ice/ice_dcf_ethdev.c
> @@ -406,7 +406,7 @@ reset_tx_queue(struct ci_tx_queue *txq)
>  	prev = (uint16_t)(txq->nb_tx_desc - 1);
>  	for (i = 0; i < txq->nb_tx_desc; i++) {
>  		txq->ci_tx_ring[i].cmd_type_offset_bsz =
> -
> 	rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE);
> +			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
>  		txe[i].mbuf =  NULL;
>  		txe[i].last_id = i;
>  		txe[prev].next_id = i;
> diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
> index 7a33e1e980..52bbf95967 100644
> --- a/drivers/net/intel/ice/ice_rxtx.c
> +++ b/drivers/net/intel/ice/ice_rxtx.c
> @@ -1124,7 +1124,7 @@ ice_reset_tx_queue(struct ci_tx_queue *txq)
>  		volatile struct ci_tx_desc *txd = &txq->ci_tx_ring[i];
> 
>  		txd->cmd_type_offset_bsz =
> -			rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE);
> +			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
>  		txe[i].mbuf =  NULL;
>  		txe[i].last_id = i;
>  		txe[prev].next_id = i;
> @@ -2556,9 +2556,8 @@ ice_tx_descriptor_status(void *tx_queue, uint16_t
> offset)
>  	}
> 
>  	status = &txq->ci_tx_ring[desc].cmd_type_offset_bsz;
> -	mask = rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M);
> -	expect = rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE <<
> -				  ICE_TXD_QW1_DTYPE_S);
> +	mask = rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M);
> +	expect = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE <<
> CI_TXD_QW1_DTYPE_S);
>  	if ((*status & mask) == expect)
>  		return RTE_ETH_TX_DESC_DONE;
> 
> @@ -2904,7 +2903,7 @@ ice_recv_pkts(void *rx_queue,
> 
>  static inline void
>  ice_parse_tunneling_params(uint64_t ol_flags,
> -			    union ice_tx_offload tx_offload,
> +			    union ci_tx_offload tx_offload,
>  			    uint32_t *cd_tunneling)
>  {
>  	/* EIPT: External (outer) IP header type */
> @@ -2965,58 +2964,58 @@ static inline void
>  ice_txd_enable_checksum(uint64_t ol_flags,
>  			uint32_t *td_cmd,
>  			uint32_t *td_offset,
> -			union ice_tx_offload tx_offload)
> +			union ci_tx_offload tx_offload)
>  {
>  	/* Set MACLEN */
>  	if (!(ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK))
>  		*td_offset |= (tx_offload.l2_len >> 1)
> -			<< ICE_TX_DESC_LEN_MACLEN_S;
> +			<< CI_TX_DESC_LEN_MACLEN_S;
> 
>  	/* Enable L3 checksum offloads */
>  	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
> -		*td_cmd |= ICE_TX_DESC_CMD_IIPT_IPV4_CSUM;
> +		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
>  		*td_offset |= (tx_offload.l3_len >> 2) <<
> -			ICE_TX_DESC_LEN_IPLEN_S;
> +			CI_TX_DESC_LEN_IPLEN_S;
>  	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
> -		*td_cmd |= ICE_TX_DESC_CMD_IIPT_IPV4;
> +		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
>  		*td_offset |= (tx_offload.l3_len >> 2) <<
> -			ICE_TX_DESC_LEN_IPLEN_S;
> +			CI_TX_DESC_LEN_IPLEN_S;
>  	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
> -		*td_cmd |= ICE_TX_DESC_CMD_IIPT_IPV6;
> +		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
>  		*td_offset |= (tx_offload.l3_len >> 2) <<
> -			ICE_TX_DESC_LEN_IPLEN_S;
> +			CI_TX_DESC_LEN_IPLEN_S;
>  	}
> 
>  	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
> -		*td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_TCP;
> +		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
>  		*td_offset |= (tx_offload.l4_len >> 2) <<
> -			      ICE_TX_DESC_LEN_L4_LEN_S;
> +			      CI_TX_DESC_LEN_L4_LEN_S;
>  		return;
>  	}
> 
>  	if (ol_flags & RTE_MBUF_F_TX_UDP_SEG) {
> -		*td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_UDP;
> +		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
>  		*td_offset |= (tx_offload.l4_len >> 2) <<
> -			      ICE_TX_DESC_LEN_L4_LEN_S;
> +			      CI_TX_DESC_LEN_L4_LEN_S;
>  		return;
>  	}
> 
>  	/* Enable L4 checksum offloads */
>  	switch (ol_flags & RTE_MBUF_F_TX_L4_MASK) {
>  	case RTE_MBUF_F_TX_TCP_CKSUM:
> -		*td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_TCP;
> +		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
>  		*td_offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
> -			      ICE_TX_DESC_LEN_L4_LEN_S;
> +			      CI_TX_DESC_LEN_L4_LEN_S;
>  		break;
>  	case RTE_MBUF_F_TX_SCTP_CKSUM:
> -		*td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_SCTP;
> +		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
>  		*td_offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
> -			      ICE_TX_DESC_LEN_L4_LEN_S;
> +			      CI_TX_DESC_LEN_L4_LEN_S;
>  		break;
>  	case RTE_MBUF_F_TX_UDP_CKSUM:
> -		*td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_UDP;
> +		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
>  		*td_offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
> -			      ICE_TX_DESC_LEN_L4_LEN_S;
> +			      CI_TX_DESC_LEN_L4_LEN_S;
>  		break;
>  	default:
>  		break;
> @@ -3030,11 +3029,11 @@ ice_build_ctob(uint32_t td_cmd,
>  	       uint16_t size,
>  	       uint32_t td_tag)
>  {
> -	return rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DATA |
> -				((uint64_t)td_cmd << ICE_TXD_QW1_CMD_S)
> |
> -				((uint64_t)td_offset <<
> ICE_TXD_QW1_OFFSET_S) |
> -				((uint64_t)size <<
> ICE_TXD_QW1_TX_BUF_SZ_S) |
> -				((uint64_t)td_tag <<
> ICE_TXD_QW1_L2TAG1_S));
> +	return rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
> +				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S)
> |
> +				((uint64_t)td_offset <<
> CI_TXD_QW1_OFFSET_S) |
> +				((uint64_t)size <<
> CI_TXD_QW1_TX_BUF_SZ_S) |
> +				((uint64_t)td_tag <<
> CI_TXD_QW1_L2TAG1_S));
>  }
> 
>  /* Check if the context descriptor is needed for TX offloading */
> @@ -3053,7 +3052,7 @@ ice_calc_context_desc(uint64_t flags)
> 
>  /* set ice TSO context descriptor */
>  static inline uint64_t
> -ice_set_tso_ctx(struct rte_mbuf *mbuf, union ice_tx_offload tx_offload)
> +ice_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
>  {
>  	uint64_t ctx_desc = 0;
>  	uint32_t cd_cmd, hdr_len, cd_tso_len;
> @@ -3067,18 +3066,15 @@ ice_set_tso_ctx(struct rte_mbuf *mbuf, union
> ice_tx_offload tx_offload)
>  	hdr_len += (mbuf->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) ?
>  		   tx_offload.outer_l2_len + tx_offload.outer_l3_len : 0;
> 
> -	cd_cmd = ICE_TX_CTX_DESC_TSO;
> +	cd_cmd = CI_TX_CTX_DESC_TSO;
>  	cd_tso_len = mbuf->pkt_len - hdr_len;
> -	ctx_desc |= ((uint64_t)cd_cmd << ICE_TXD_CTX_QW1_CMD_S) |
> +	ctx_desc |= ((uint64_t)cd_cmd << CI_TXD_QW1_CMD_S) |
>  		    ((uint64_t)cd_tso_len << ICE_TXD_CTX_QW1_TSO_LEN_S) |
>  		    ((uint64_t)mbuf->tso_segsz <<
> ICE_TXD_CTX_QW1_MSS_S);
> 
>  	return ctx_desc;
>  }
> 
> -/* HW requires that TX buffer size ranges from 1B up to (16K-1)B. */
> -#define ICE_MAX_DATA_PER_TXD \
> -	(ICE_TXD_QW1_TX_BUF_SZ_M >> ICE_TXD_QW1_TX_BUF_SZ_S)
>  /* Calculate the number of TX descriptors needed for each pkt */
>  static inline uint16_t
>  ice_calc_pkt_desc(struct rte_mbuf *tx_pkt)
> @@ -3087,7 +3083,7 @@ ice_calc_pkt_desc(struct rte_mbuf *tx_pkt)
>  	uint16_t count = 0;
> 
>  	while (txd != NULL) {
> -		count += DIV_ROUND_UP(txd->data_len,
> ICE_MAX_DATA_PER_TXD);
> +		count += DIV_ROUND_UP(txd->data_len,
> CI_MAX_DATA_PER_TXD);
>  		txd = txd->next;
>  	}
> 
> @@ -3117,7 +3113,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
>  	uint16_t slen;
>  	uint64_t buf_dma_addr;
>  	uint64_t ol_flags;
> -	union ice_tx_offload tx_offload = {0};
> +	union ci_tx_offload tx_offload = {0};
> 
>  	txq = tx_queue;
>  	sw_ring = txq->sw_ring;
> @@ -3185,7 +3181,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
> 
>  		/* Descriptor based VLAN insertion */
>  		if (ol_flags & (RTE_MBUF_F_TX_VLAN |
> RTE_MBUF_F_TX_QINQ)) {
> -			td_cmd |= ICE_TX_DESC_CMD_IL2TAG1;
> +			td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
>  			td_tag = tx_pkt->vlan_tci;
>  		}
> 
> @@ -3193,7 +3189,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
>  		cd_tunneling_params = 0;
>  		if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) {
>  			td_offset |= (tx_offload.outer_l2_len >> 1)
> -				<< ICE_TX_DESC_LEN_MACLEN_S;
> +				<< CI_TX_DESC_LEN_MACLEN_S;
>  			ice_parse_tunneling_params(ol_flags, tx_offload,
>  						   &cd_tunneling_params);
>  		}
> @@ -3223,8 +3219,8 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
>  					ice_set_tso_ctx(tx_pkt, tx_offload);
>  			else if (ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST)
>  				cd_type_cmd_tso_mss |=
> -					((uint64_t)ICE_TX_CTX_DESC_TSYN
> <<
> -					ICE_TXD_CTX_QW1_CMD_S) |
> +					((uint64_t)CI_TX_CTX_DESC_TSYN <<
> +					CI_TXD_QW1_CMD_S) |
>  					 (((uint64_t)txq->ice_vsi->adapter-
> >ptp_tx_index <<
>  					 ICE_TXD_CTX_QW1_TSYN_S) &
> ICE_TXD_CTX_QW1_TSYN_M);
> 
> @@ -3235,8 +3231,8 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
>  			if (ol_flags & RTE_MBUF_F_TX_QINQ) {
>  				cd_l2tag2 = tx_pkt->vlan_tci_outer;
>  				cd_type_cmd_tso_mss |=
> -
> 	((uint64_t)ICE_TX_CTX_DESC_IL2TAG2 <<
> -					 ICE_TXD_CTX_QW1_CMD_S);
> +					((uint64_t)CI_TX_CTX_DESC_IL2TAG2
> <<
> +					 CI_TXD_QW1_CMD_S);
>  			}
>  			ctx_txd->l2tag2 = rte_cpu_to_le_16(cd_l2tag2);
>  			ctx_txd->qw1 =
> @@ -3261,18 +3257,16 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
>  			buf_dma_addr = rte_mbuf_data_iova(m_seg);
> 
>  			while ((ol_flags & (RTE_MBUF_F_TX_TCP_SEG |
> RTE_MBUF_F_TX_UDP_SEG)) &&
> -				unlikely(slen > ICE_MAX_DATA_PER_TXD)) {
> +					unlikely(slen >
> CI_MAX_DATA_PER_TXD)) {
>  				txd->buffer_addr =
> rte_cpu_to_le_64(buf_dma_addr);
> -				txd->cmd_type_offset_bsz =
> -				rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DATA
> |
> -				((uint64_t)td_cmd << ICE_TXD_QW1_CMD_S)
> |
> -				((uint64_t)td_offset <<
> ICE_TXD_QW1_OFFSET_S) |
> -				((uint64_t)ICE_MAX_DATA_PER_TXD <<
> -				 ICE_TXD_QW1_TX_BUF_SZ_S) |
> -				((uint64_t)td_tag <<
> ICE_TXD_QW1_L2TAG1_S));
> +				txd->cmd_type_offset_bsz =
> rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
> +					((uint64_t)td_cmd <<
> CI_TXD_QW1_CMD_S) |
> +					((uint64_t)td_offset <<
> CI_TXD_QW1_OFFSET_S) |
> +					((uint64_t)CI_MAX_DATA_PER_TXD
> << CI_TXD_QW1_TX_BUF_SZ_S) |
> +					((uint64_t)td_tag <<
> CI_TXD_QW1_L2TAG1_S));
> 
> -				buf_dma_addr += ICE_MAX_DATA_PER_TXD;
> -				slen -= ICE_MAX_DATA_PER_TXD;
> +				buf_dma_addr += CI_MAX_DATA_PER_TXD;
> +				slen -= CI_MAX_DATA_PER_TXD;
> 
>  				txe->last_id = tx_last;
>  				tx_id = txe->next_id;
> @@ -3282,12 +3276,11 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
>  			}
> 
>  			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
> -			txd->cmd_type_offset_bsz =
> -				rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DATA
> |
> -				((uint64_t)td_cmd << ICE_TXD_QW1_CMD_S)
> |
> -				((uint64_t)td_offset <<
> ICE_TXD_QW1_OFFSET_S) |
> -				((uint64_t)slen <<
> ICE_TXD_QW1_TX_BUF_SZ_S) |
> -				((uint64_t)td_tag <<
> ICE_TXD_QW1_L2TAG1_S));
> +			txd->cmd_type_offset_bsz =
> rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
> +				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S)
> |
> +				((uint64_t)td_offset <<
> CI_TXD_QW1_OFFSET_S) |
> +				((uint64_t)slen <<
> CI_TXD_QW1_TX_BUF_SZ_S) |
> +				((uint64_t)td_tag <<
> CI_TXD_QW1_L2TAG1_S));
> 
>  			txe->last_id = tx_last;
>  			tx_id = txe->next_id;
> @@ -3296,7 +3289,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
>  		} while (m_seg);
> 
>  		/* fill the last descriptor with End of Packet (EOP) bit */
> -		td_cmd |= ICE_TX_DESC_CMD_EOP;
> +		td_cmd |= CI_TX_DESC_CMD_EOP;
>  		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
>  		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
> 
> @@ -3307,14 +3300,13 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
>  				   "%4u (port=%d queue=%d)",
>  				   tx_last, txq->port_id, txq->queue_id);
> 
> -			td_cmd |= ICE_TX_DESC_CMD_RS;
> +			td_cmd |= CI_TX_DESC_CMD_RS;
> 
>  			/* Update txq RS bit counters */
>  			txq->nb_tx_used = 0;
>  		}
>  		txd->cmd_type_offset_bsz |=
> -			rte_cpu_to_le_64(((uint64_t)td_cmd) <<
> -					 ICE_TXD_QW1_CMD_S);
> +			rte_cpu_to_le_64(((uint64_t)td_cmd) <<
> CI_TXD_QW1_CMD_S);
> 
>  		if (txq->tsq != NULL && txq->tsq->ts_flag > 0) {
>  			uint64_t txtime = *RTE_MBUF_DYNFIELD(tx_pkt,
> @@ -3361,8 +3353,8 @@ ice_tx_free_bufs(struct ci_tx_queue *txq)
>  	uint16_t i;
> 
>  	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
> -	     rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M)) !=
> -	    rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE))
> +	     rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
> +	    rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
>  		return 0;
> 
>  	txep = &txq->sw_ring[txq->tx_next_dd - (txq->tx_rs_thresh - 1)];
> @@ -3598,8 +3590,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
>  		n = (uint16_t)(txq->nb_tx_desc - txq->tx_tail);
>  		ice_tx_fill_hw_ring(txq, tx_pkts, n);
>  		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
> -			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS)
> <<
> -					 ICE_TXD_QW1_CMD_S);
> +			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS)
> << CI_TXD_QW1_CMD_S);
>  		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
>  		txq->tx_tail = 0;
>  	}
> @@ -3611,8 +3602,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
>  	/* Determine if RS bit needs to be set */
>  	if (txq->tx_tail > txq->tx_next_rs) {
>  		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
> -			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS)
> <<
> -					 ICE_TXD_QW1_CMD_S);
> +			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS)
> << CI_TXD_QW1_CMD_S);
>  		txq->tx_next_rs =
>  			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
>  		if (txq->tx_next_rs >= txq->nb_tx_desc)
> @@ -4843,9 +4833,9 @@ ice_fdir_programming(struct ice_pf *pf, struct
> ice_fltr_desc *fdir_desc)
> 
>  	txdp = &txq->ci_tx_ring[txq->tx_tail + 1];
>  	txdp->buffer_addr = rte_cpu_to_le_64(pf->fdir.dma_addr);
> -	td_cmd = ICE_TX_DESC_CMD_EOP |
> -		ICE_TX_DESC_CMD_RS  |
> -		ICE_TX_DESC_CMD_DUMMY;
> +	td_cmd = CI_TX_DESC_CMD_EOP |
> +		CI_TX_DESC_CMD_RS  |
> +		CI_TX_DESC_CMD_DUMMY;
> 
>  	txdp->cmd_type_offset_bsz =
>  		ice_build_ctob(td_cmd, 0, ICE_FDIR_PKT_LEN, 0);
> @@ -4856,9 +4846,8 @@ ice_fdir_programming(struct ice_pf *pf, struct
> ice_fltr_desc *fdir_desc)
>  	/* Update the tx tail register */
>  	ICE_PCI_REG_WRITE(txq->qtx_tail, txq->tx_tail);
>  	for (i = 0; i < ICE_FDIR_MAX_WAIT_US; i++) {
> -		if ((txdp->cmd_type_offset_bsz &
> -		     rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M)) ==
> -		    rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE))
> +		if ((txdp->cmd_type_offset_bsz &
> rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
> +		    rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
>  			break;
>  		rte_delay_us(1);
>  	}
> diff --git a/drivers/net/intel/ice/ice_rxtx.h b/drivers/net/intel/ice/ice_rxtx.h
> index c524e9f756..cd5fa93d1c 100644
> --- a/drivers/net/intel/ice/ice_rxtx.h
> +++ b/drivers/net/intel/ice/ice_rxtx.h
> @@ -46,7 +46,7 @@
> 
>  #define ICE_SUPPORT_CHAIN_NUM 5
> 
> -#define ICE_TD_CMD                      ICE_TX_DESC_CMD_EOP
> +#define ICE_TD_CMD                      CI_TX_DESC_CMD_EOP
> 
>  #define ICE_VPMD_RX_BURST            CI_VPMD_RX_BURST
>  #define ICE_VPMD_TX_BURST            32
> @@ -169,19 +169,6 @@ struct ice_txtime {
>  	const struct rte_memzone *ts_mz;
>  };
> 
> -/* Offload features */
> -union ice_tx_offload {
> -	uint64_t data;
> -	struct {
> -		uint64_t l2_len:7; /* L2 (MAC) Header Length. */
> -		uint64_t l3_len:9; /* L3 (IP) Header Length. */
> -		uint64_t l4_len:8; /* L4 Header Length. */
> -		uint64_t tso_segsz:16; /* TCP TSO segment size */
> -		uint64_t outer_l2_len:8; /* outer L2 Header Length */
> -		uint64_t outer_l3_len:16; /* outer L3 Header Length */
> -	};
> -};
> -
>  /* Rx Flex Descriptor for Comms Package Profile
>   * RxDID Profile ID 22 (swap Hash and FlowID)
>   * Flex-field 0: Flow ID lower 16-bits
> diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
> b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
> index 0a1df0b2f6..2922671158 100644
> --- a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
> +++ b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
> @@ -777,10 +777,9 @@ static __rte_always_inline void
>  ice_vtx1(volatile struct ci_tx_desc *txdp,
>  	 struct rte_mbuf *pkt, uint64_t flags, bool offload)
>  {
> -	uint64_t high_qw =
> -		(ICE_TX_DESC_DTYPE_DATA |
> -		 ((uint64_t)flags  << ICE_TXD_QW1_CMD_S) |
> -		 ((uint64_t)pkt->data_len << ICE_TXD_QW1_TX_BUF_SZ_S));
> +	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
> +		 ((uint64_t)flags << CI_TXD_QW1_CMD_S) |
> +		 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
>  	if (offload)
>  		ice_txd_enable_offload(pkt, &high_qw);
> 
> @@ -792,8 +791,7 @@ static __rte_always_inline void
>  ice_vtx(volatile struct ci_tx_desc *txdp,
>  	struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags, bool offload)
>  {
> -	const uint64_t hi_qw_tmpl = (ICE_TX_DESC_DTYPE_DATA |
> -			((uint64_t)flags  << ICE_TXD_QW1_CMD_S));
> +	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags <<
> CI_TXD_QW1_CMD_S));
> 
>  	/* if unaligned on 32-bit boundary, do one to align */
>  	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
> @@ -801,30 +799,22 @@ ice_vtx(volatile struct ci_tx_desc *txdp,
>  		nb_pkts--, txdp++, pkt++;
>  	}
> 
> -	/* do two at a time while possible, in bursts */
> +	/* do four at a time while possible, in bursts */
>  	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
> -		uint64_t hi_qw3 =
> -			hi_qw_tmpl |
> -			((uint64_t)pkt[3]->data_len <<
> -			 ICE_TXD_QW1_TX_BUF_SZ_S);
> +		uint64_t hi_qw3 = hi_qw_tmpl |
> +			((uint64_t)pkt[3]->data_len <<
> CI_TXD_QW1_TX_BUF_SZ_S);
>  		if (offload)
>  			ice_txd_enable_offload(pkt[3], &hi_qw3);
> -		uint64_t hi_qw2 =
> -			hi_qw_tmpl |
> -			((uint64_t)pkt[2]->data_len <<
> -			 ICE_TXD_QW1_TX_BUF_SZ_S);
> +		uint64_t hi_qw2 = hi_qw_tmpl |
> +			((uint64_t)pkt[2]->data_len <<
> CI_TXD_QW1_TX_BUF_SZ_S);
>  		if (offload)
>  			ice_txd_enable_offload(pkt[2], &hi_qw2);
> -		uint64_t hi_qw1 =
> -			hi_qw_tmpl |
> -			((uint64_t)pkt[1]->data_len <<
> -			 ICE_TXD_QW1_TX_BUF_SZ_S);
> +		uint64_t hi_qw1 = hi_qw_tmpl |
> +			((uint64_t)pkt[1]->data_len <<
> CI_TXD_QW1_TX_BUF_SZ_S);
>  		if (offload)
>  			ice_txd_enable_offload(pkt[1], &hi_qw1);
> -		uint64_t hi_qw0 =
> -			hi_qw_tmpl |
> -			((uint64_t)pkt[0]->data_len <<
> -			 ICE_TXD_QW1_TX_BUF_SZ_S);
> +		uint64_t hi_qw0 = hi_qw_tmpl |
> +			((uint64_t)pkt[0]->data_len <<
> CI_TXD_QW1_TX_BUF_SZ_S);
>  		if (offload)
>  			ice_txd_enable_offload(pkt[0], &hi_qw0);
> 
> @@ -856,7 +846,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue,
> struct rte_mbuf **tx_pkts,
>  	struct ci_tx_entry_vec *txep;
>  	uint16_t n, nb_commit, tx_id;
>  	uint64_t flags = ICE_TD_CMD;
> -	uint64_t rs = ICE_TX_DESC_CMD_RS | ICE_TD_CMD;
> +	uint64_t rs = CI_TX_DESC_CMD_RS | ICE_TD_CMD;
> 
>  	/* cross rx_thresh boundary is not allowed */
>  	nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);
> @@ -901,8 +891,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue,
> struct rte_mbuf **tx_pkts,
>  	tx_id = (uint16_t)(tx_id + nb_commit);
>  	if (tx_id > txq->tx_next_rs) {
>  		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
> -			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS)
> <<
> -					 ICE_TXD_QW1_CMD_S);
> +			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS)
> << CI_TXD_QW1_CMD_S);
>  		txq->tx_next_rs =
>  			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
>  	}
> diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
> b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
> index d42f41461f..e64b6e227b 100644
> --- a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
> +++ b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
> @@ -850,10 +850,9 @@ static __rte_always_inline void
>  ice_vtx1(volatile struct ci_tx_desc *txdp,
>  	 struct rte_mbuf *pkt, uint64_t flags, bool do_offload)
>  {
> -	uint64_t high_qw =
> -		(ICE_TX_DESC_DTYPE_DATA |
> -		 ((uint64_t)flags  << ICE_TXD_QW1_CMD_S) |
> -		 ((uint64_t)pkt->data_len << ICE_TXD_QW1_TX_BUF_SZ_S));
> +	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
> +		 ((uint64_t)flags << CI_TXD_QW1_CMD_S) |
> +		 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
> 
>  	if (do_offload)
>  		ice_txd_enable_offload(pkt, &high_qw);
> @@ -866,32 +865,23 @@ static __rte_always_inline void
>  ice_vtx(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkt,
>  	uint16_t nb_pkts,  uint64_t flags, bool do_offload)
>  {
> -	const uint64_t hi_qw_tmpl = (ICE_TX_DESC_DTYPE_DATA |
> -			((uint64_t)flags  << ICE_TXD_QW1_CMD_S));
> +	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags <<
> CI_TXD_QW1_CMD_S));
> 
>  	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
> -		uint64_t hi_qw3 =
> -			hi_qw_tmpl |
> -			((uint64_t)pkt[3]->data_len <<
> -			 ICE_TXD_QW1_TX_BUF_SZ_S);
> +		uint64_t hi_qw3 = hi_qw_tmpl |
> +			((uint64_t)pkt[3]->data_len <<
> CI_TXD_QW1_TX_BUF_SZ_S);
>  		if (do_offload)
>  			ice_txd_enable_offload(pkt[3], &hi_qw3);
> -		uint64_t hi_qw2 =
> -			hi_qw_tmpl |
> -			((uint64_t)pkt[2]->data_len <<
> -			 ICE_TXD_QW1_TX_BUF_SZ_S);
> +		uint64_t hi_qw2 = hi_qw_tmpl |
> +			((uint64_t)pkt[2]->data_len <<
> CI_TXD_QW1_TX_BUF_SZ_S);
>  		if (do_offload)
>  			ice_txd_enable_offload(pkt[2], &hi_qw2);
> -		uint64_t hi_qw1 =
> -			hi_qw_tmpl |
> -			((uint64_t)pkt[1]->data_len <<
> -			 ICE_TXD_QW1_TX_BUF_SZ_S);
> +		uint64_t hi_qw1 = hi_qw_tmpl |
> +			((uint64_t)pkt[1]->data_len <<
> CI_TXD_QW1_TX_BUF_SZ_S);
>  		if (do_offload)
>  			ice_txd_enable_offload(pkt[1], &hi_qw1);
> -		uint64_t hi_qw0 =
> -			hi_qw_tmpl |
> -			((uint64_t)pkt[0]->data_len <<
> -			 ICE_TXD_QW1_TX_BUF_SZ_S);
> +		uint64_t hi_qw0 = hi_qw_tmpl |
> +			((uint64_t)pkt[0]->data_len <<
> CI_TXD_QW1_TX_BUF_SZ_S);
>  		if (do_offload)
>  			ice_txd_enable_offload(pkt[0], &hi_qw0);
> 
> @@ -920,7 +910,7 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue,
> struct rte_mbuf **tx_pkts,
>  	struct ci_tx_entry_vec *txep;
>  	uint16_t n, nb_commit, tx_id;
>  	uint64_t flags = ICE_TD_CMD;
> -	uint64_t rs = ICE_TX_DESC_CMD_RS | ICE_TD_CMD;
> +	uint64_t rs = CI_TX_DESC_CMD_RS | ICE_TD_CMD;
> 
>  	/* cross rx_thresh boundary is not allowed */
>  	nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);
> @@ -966,8 +956,7 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue,
> struct rte_mbuf **tx_pkts,
>  	tx_id = (uint16_t)(tx_id + nb_commit);
>  	if (tx_id > txq->tx_next_rs) {
>  		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
> -			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS)
> <<
> -					 ICE_TXD_QW1_CMD_S);
> +			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS)
> << CI_TXD_QW1_CMD_S);
>  		txq->tx_next_rs =
>  			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
>  	}
> diff --git a/drivers/net/intel/ice/ice_rxtx_vec_common.h
> b/drivers/net/intel/ice/ice_rxtx_vec_common.h
> index 8ba591e403..1d83a087cc 100644
> --- a/drivers/net/intel/ice/ice_rxtx_vec_common.h
> +++ b/drivers/net/intel/ice/ice_rxtx_vec_common.h
> @@ -12,8 +12,8 @@ static inline int
>  ice_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
>  {
>  	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
> -			rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M)) ==
> -
> 	rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE);
> +			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
> +
> 	rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
>  }
> 
>  static inline void
> @@ -124,53 +124,52 @@ ice_txd_enable_offload(struct rte_mbuf *tx_pkt,
>  	/* Tx Checksum Offload */
>  	/* SET MACLEN */
>  	td_offset |= (tx_pkt->l2_len >> 1) <<
> -		ICE_TX_DESC_LEN_MACLEN_S;
> +		CI_TX_DESC_LEN_MACLEN_S;
> 
>  	/* Enable L3 checksum offload */
>  	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
> -		td_cmd |= ICE_TX_DESC_CMD_IIPT_IPV4_CSUM;
> +		td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
>  		td_offset |= (tx_pkt->l3_len >> 2) <<
> -			ICE_TX_DESC_LEN_IPLEN_S;
> +			CI_TX_DESC_LEN_IPLEN_S;
>  	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
> -		td_cmd |= ICE_TX_DESC_CMD_IIPT_IPV4;
> +		td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
>  		td_offset |= (tx_pkt->l3_len >> 2) <<
> -			ICE_TX_DESC_LEN_IPLEN_S;
> +			CI_TX_DESC_LEN_IPLEN_S;
>  	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
> -		td_cmd |= ICE_TX_DESC_CMD_IIPT_IPV6;
> +		td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
>  		td_offset |= (tx_pkt->l3_len >> 2) <<
> -			ICE_TX_DESC_LEN_IPLEN_S;
> +			CI_TX_DESC_LEN_IPLEN_S;
>  	}
> 
>  	/* Enable L4 checksum offloads */
>  	switch (ol_flags & RTE_MBUF_F_TX_L4_MASK) {
>  	case RTE_MBUF_F_TX_TCP_CKSUM:
> -		td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_TCP;
> +		td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
>  		td_offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
> -			ICE_TX_DESC_LEN_L4_LEN_S;
> +			CI_TX_DESC_LEN_L4_LEN_S;
>  		break;
>  	case RTE_MBUF_F_TX_SCTP_CKSUM:
> -		td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_SCTP;
> +		td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
>  		td_offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
> -			ICE_TX_DESC_LEN_L4_LEN_S;
> +			CI_TX_DESC_LEN_L4_LEN_S;
>  		break;
>  	case RTE_MBUF_F_TX_UDP_CKSUM:
> -		td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_UDP;
> +		td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
>  		td_offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
> -			ICE_TX_DESC_LEN_L4_LEN_S;
> +			CI_TX_DESC_LEN_L4_LEN_S;
>  		break;
>  	default:
>  		break;
>  	}
> 
> -	*txd_hi |= ((uint64_t)td_offset) << ICE_TXD_QW1_OFFSET_S;
> +	*txd_hi |= ((uint64_t)td_offset) << CI_TXD_QW1_OFFSET_S;
> 
> -	/* Tx VLAN insertion Offload */
> +	/* Tx VLAN/QINQ insertion Offload */
>  	if (ol_flags & RTE_MBUF_F_TX_VLAN) {
> -		td_cmd |= ICE_TX_DESC_CMD_IL2TAG1;
> -		*txd_hi |= ((uint64_t)tx_pkt->vlan_tci <<
> -				ICE_TXD_QW1_L2TAG1_S);
> +		td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
> +		*txd_hi |= ((uint64_t)tx_pkt->vlan_tci <<
> CI_TXD_QW1_L2TAG1_S);
>  	}
> 
> -	*txd_hi |= ((uint64_t)td_cmd) << ICE_TXD_QW1_CMD_S;
> +	*txd_hi |= ((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S;
>  }
>  #endif
> diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c
> b/drivers/net/intel/idpf/idpf_common_rxtx.c
> index 23666539ab..587871b54a 100644
> --- a/drivers/net/intel/idpf/idpf_common_rxtx.c
> +++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
> @@ -271,7 +271,7 @@ idpf_qc_single_tx_queue_reset(struct ci_tx_queue
> *txq)
>  	prev = (uint16_t)(txq->nb_tx_desc - 1);
>  	for (i = 0; i < txq->nb_tx_desc; i++) {
>  		txq->ci_tx_ring[i].cmd_type_offset_bsz =
> -
> 	rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE);
> +			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
>  		txe[i].mbuf =  NULL;
>  		txe[i].last_id = i;
>  		txe[prev].next_id = i;
> @@ -849,7 +849,7 @@ idpf_calc_context_desc(uint64_t flags)
>   */
>  static inline void
>  idpf_set_splitq_tso_ctx(struct rte_mbuf *mbuf,
> -			union idpf_tx_offload tx_offload,
> +			union ci_tx_offload tx_offload,
>  			volatile union idpf_flex_tx_ctx_desc *ctx_desc)
>  {
>  	uint16_t cmd_dtype;
> @@ -887,7 +887,7 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct
> rte_mbuf **tx_pkts,
>  	volatile struct idpf_flex_tx_sched_desc *txr;
>  	volatile struct idpf_flex_tx_sched_desc *txd;
>  	struct ci_tx_entry *sw_ring;
> -	union idpf_tx_offload tx_offload = {0};
> +	union ci_tx_offload tx_offload = {0};
>  	struct ci_tx_entry *txe, *txn;
>  	uint16_t nb_used, tx_id, sw_id;
>  	struct rte_mbuf *tx_pkt;
> @@ -1334,7 +1334,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct
> rte_mbuf **tx_pkts,
>  {
>  	volatile struct ci_tx_desc *txd;
>  	volatile struct ci_tx_desc *txr;
> -	union idpf_tx_offload tx_offload = {0};
> +	union ci_tx_offload tx_offload = {0};
>  	struct ci_tx_entry *txe, *txn;
>  	struct ci_tx_entry *sw_ring;
>  	struct ci_tx_queue *txq;
> @@ -1452,10 +1452,10 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue,
> struct rte_mbuf **tx_pkts,
>  			slen = m_seg->data_len;
>  			buf_dma_addr = rte_mbuf_data_iova(m_seg);
>  			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
> -			txd->cmd_type_offset_bsz =
> rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DATA |
> -				((uint64_t)td_cmd  <<
> IDPF_TXD_QW1_CMD_S) |
> -				((uint64_t)td_offset <<
> IDPF_TXD_QW1_OFFSET_S) |
> -				((uint64_t)slen <<
> IDPF_TXD_QW1_TX_BUF_SZ_S));
> +			txd->cmd_type_offset_bsz =
> rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
> +				((uint64_t)td_cmd  << CI_TXD_QW1_CMD_S)
> |
> +				((uint64_t)td_offset <<
> CI_TXD_QW1_OFFSET_S) |
> +				((uint64_t)slen <<
> CI_TXD_QW1_TX_BUF_SZ_S));
> 
>  			txe->last_id = tx_last;
>  			tx_id = txe->next_id;
> @@ -1464,7 +1464,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct
> rte_mbuf **tx_pkts,
>  		} while (m_seg);
> 
>  		/* The last packet data descriptor needs End Of Packet (EOP)
> */
> -		td_cmd |= IDPF_TX_DESC_CMD_EOP;
> +		td_cmd |= CI_TX_DESC_CMD_EOP;
>  		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
>  		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
> 
> @@ -1473,13 +1473,13 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue,
> struct rte_mbuf **tx_pkts,
>  			       "%4u (port=%d queue=%d)",
>  			       tx_last, txq->port_id, txq->queue_id);
> 
> -			td_cmd |= IDPF_TX_DESC_CMD_RS;
> +			td_cmd |= CI_TX_DESC_CMD_RS;
> 
>  			/* Update txq RS bit counters */
>  			txq->nb_tx_used = 0;
>  		}
> 
> -		txd->cmd_type_offset_bsz |= rte_cpu_to_le_16(td_cmd <<
> IDPF_TXD_QW1_CMD_S);
> +		txd->cmd_type_offset_bsz |= rte_cpu_to_le_16(td_cmd <<
> CI_TXD_QW1_CMD_S);
>  	}
> 
>  end_of_tx:
> diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.h
> b/drivers/net/intel/idpf/idpf_common_rxtx.h
> index 2f2fa153b2..b88a87402d 100644
> --- a/drivers/net/intel/idpf/idpf_common_rxtx.h
> +++ b/drivers/net/intel/idpf/idpf_common_rxtx.h
> @@ -169,18 +169,6 @@ struct idpf_rx_queue {
>  	uint32_t hw_register_set;
>  };
> 
> -/* Offload features */
> -union idpf_tx_offload {
> -	uint64_t data;
> -	struct {
> -		uint64_t l2_len:7; /* L2 (MAC) Header Length. */
> -		uint64_t l3_len:9; /* L3 (IP) Header Length. */
> -		uint64_t l4_len:8; /* L4 Header Length. */
> -		uint64_t tso_segsz:16; /* TCP TSO segment size */
> -		/* uint64_t unused : 24; */
> -	};
> -};
> -
>  union idpf_tx_desc {
>  	struct ci_tx_desc *tx_ring;
>  	struct idpf_flex_tx_sched_desc *desc_ring;
> diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
> b/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
> index 04efee3722..411b171b97 100644
> --- a/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
> +++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
> @@ -486,10 +486,9 @@ static inline void
>  idpf_singleq_vtx1(volatile struct ci_tx_desc *txdp,
>  		  struct rte_mbuf *pkt, uint64_t flags)
>  {
> -	uint64_t high_qw =
> -		(IDPF_TX_DESC_DTYPE_DATA |
> -		 ((uint64_t)flags  << IDPF_TXD_QW1_CMD_S) |
> -		 ((uint64_t)pkt->data_len << IDPF_TXD_QW1_TX_BUF_SZ_S));
> +	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
> +		 ((uint64_t)flags << CI_TXD_QW1_CMD_S) |
> +		 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
> 
>  	__m128i descriptor = _mm_set_epi64x(high_qw,
>  				pkt->buf_iova + pkt->data_off);
> @@ -500,8 +499,7 @@ static inline void
>  idpf_singleq_vtx(volatile struct ci_tx_desc *txdp,
>  		 struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
>  {
> -	const uint64_t hi_qw_tmpl = (IDPF_TX_DESC_DTYPE_DATA |
> -			((uint64_t)flags  << IDPF_TXD_QW1_CMD_S));
> +	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags <<
> CI_TXD_QW1_CMD_S));
> 
>  	/* if unaligned on 32-bit boundary, do one to align */
>  	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
> @@ -511,22 +509,14 @@ idpf_singleq_vtx(volatile struct ci_tx_desc *txdp,
> 
>  	/* do two at a time while possible, in bursts */
>  	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
> -		uint64_t hi_qw3 =
> -			hi_qw_tmpl |
> -			((uint64_t)pkt[3]->data_len <<
> -			 IDPF_TXD_QW1_TX_BUF_SZ_S);
> -		uint64_t hi_qw2 =
> -			hi_qw_tmpl |
> -			((uint64_t)pkt[2]->data_len <<
> -			 IDPF_TXD_QW1_TX_BUF_SZ_S);
> -		uint64_t hi_qw1 =
> -			hi_qw_tmpl |
> -			((uint64_t)pkt[1]->data_len <<
> -			 IDPF_TXD_QW1_TX_BUF_SZ_S);
> -		uint64_t hi_qw0 =
> -			hi_qw_tmpl |
> -			((uint64_t)pkt[0]->data_len <<
> -			 IDPF_TXD_QW1_TX_BUF_SZ_S);
> +		uint64_t hi_qw3 = hi_qw_tmpl |
> +			((uint64_t)pkt[3]->data_len <<
> CI_TXD_QW1_TX_BUF_SZ_S);
> +		uint64_t hi_qw2 = hi_qw_tmpl |
> +			((uint64_t)pkt[2]->data_len <<
> CI_TXD_QW1_TX_BUF_SZ_S);
> +		uint64_t hi_qw1 = hi_qw_tmpl |
> +			((uint64_t)pkt[1]->data_len <<
> CI_TXD_QW1_TX_BUF_SZ_S);
> +		uint64_t hi_qw0 = hi_qw_tmpl |
> +			((uint64_t)pkt[0]->data_len <<
> CI_TXD_QW1_TX_BUF_SZ_S);
> 
>  		__m256i desc2_3 =
>  			_mm256_set_epi64x
> @@ -559,8 +549,8 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void
> *tx_queue, struct rte_mbuf **tx_pkts
>  	volatile struct ci_tx_desc *txdp;
>  	struct ci_tx_entry_vec *txep;
>  	uint16_t n, nb_commit, tx_id;
> -	uint64_t flags = IDPF_TX_DESC_CMD_EOP;
> -	uint64_t rs = IDPF_TX_DESC_CMD_RS | flags;
> +	uint64_t flags = CI_TX_DESC_CMD_EOP;
> +	uint64_t rs = CI_TX_DESC_CMD_RS | flags;
> 
>  	/* cross rx_thresh boundary is not allowed */
>  	nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);
> @@ -605,8 +595,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void
> *tx_queue, struct rte_mbuf **tx_pkts
>  	tx_id = (uint16_t)(tx_id + nb_commit);
>  	if (tx_id > txq->tx_next_rs) {
>  		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
> -
> 	rte_cpu_to_le_64(((uint64_t)IDPF_TX_DESC_CMD_RS) <<
> -					 IDPF_TXD_QW1_CMD_S);
> +			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS)
> << CI_TXD_QW1_CMD_S);
>  		txq->tx_next_rs =
>  			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
>  	}
> diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
> b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
> index d5e5a2ca5f..49ace35615 100644
> --- a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
> +++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
> @@ -1003,10 +1003,9 @@ static __rte_always_inline void
>  idpf_singleq_vtx1(volatile struct ci_tx_desc *txdp,
>  	  struct rte_mbuf *pkt, uint64_t flags)
>  {
> -	uint64_t high_qw =
> -		(IDPF_TX_DESC_DTYPE_DATA |
> -		 ((uint64_t)flags  << IDPF_TXD_QW1_CMD_S) |
> -		 ((uint64_t)pkt->data_len << IDPF_TXD_QW1_TX_BUF_SZ_S));
> +	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
> +		 ((uint64_t)flags << CI_TXD_QW1_CMD_S) |
> +		 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
> 
>  	__m128i descriptor = _mm_set_epi64x(high_qw,
>  					    pkt->buf_iova + pkt->data_off);
> @@ -1019,8 +1018,7 @@ static __rte_always_inline void
>  idpf_singleq_vtx(volatile struct ci_tx_desc *txdp,
>  	 struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
>  {
> -	const uint64_t hi_qw_tmpl = (IDPF_TX_DESC_DTYPE_DATA  |
> -			((uint64_t)flags  << IDPF_TXD_QW1_CMD_S));
> +	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA  | (flags <<
> CI_TXD_QW1_CMD_S));
> 
>  	/* if unaligned on 32-bit boundary, do one to align */
>  	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
> @@ -1030,22 +1028,14 @@ idpf_singleq_vtx(volatile struct ci_tx_desc *txdp,
> 
>  	/* do 4 at a time while possible, in bursts */
>  	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
> -		uint64_t hi_qw3 =
> -			hi_qw_tmpl |
> -			((uint64_t)pkt[3]->data_len <<
> -			 IDPF_TXD_QW1_TX_BUF_SZ_S);
> -		uint64_t hi_qw2 =
> -			hi_qw_tmpl |
> -			((uint64_t)pkt[2]->data_len <<
> -			 IDPF_TXD_QW1_TX_BUF_SZ_S);
> -		uint64_t hi_qw1 =
> -			hi_qw_tmpl |
> -			((uint64_t)pkt[1]->data_len <<
> -			 IDPF_TXD_QW1_TX_BUF_SZ_S);
> -		uint64_t hi_qw0 =
> -			hi_qw_tmpl |
> -			((uint64_t)pkt[0]->data_len <<
> -			 IDPF_TXD_QW1_TX_BUF_SZ_S);
> +		uint64_t hi_qw3 = hi_qw_tmpl |
> +			((uint64_t)pkt[3]->data_len <<
> CI_TXD_QW1_TX_BUF_SZ_S);
> +		uint64_t hi_qw2 = hi_qw_tmpl |
> +			((uint64_t)pkt[2]->data_len <<
> CI_TXD_QW1_TX_BUF_SZ_S);
> +		uint64_t hi_qw1 = hi_qw_tmpl |
> +			((uint64_t)pkt[1]->data_len <<
> CI_TXD_QW1_TX_BUF_SZ_S);
> +		uint64_t hi_qw0 = hi_qw_tmpl |
> +			((uint64_t)pkt[0]->data_len <<
> CI_TXD_QW1_TX_BUF_SZ_S);
> 
>  		__m512i desc0_3 =
>  			_mm512_set_epi64
> @@ -1075,8 +1065,8 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void
> *tx_queue, struct rte_mbuf **tx_pk
>  	volatile struct ci_tx_desc *txdp;
>  	struct ci_tx_entry_vec *txep;
>  	uint16_t n, nb_commit, tx_id;
> -	uint64_t flags = IDPF_TX_DESC_CMD_EOP;
> -	uint64_t rs = IDPF_TX_DESC_CMD_RS | flags;
> +	uint64_t flags = CI_TX_DESC_CMD_EOP;
> +	uint64_t rs = CI_TX_DESC_CMD_RS | flags;
> 
>  	/* cross rx_thresh boundary is not allowed */
>  	nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);
> @@ -1124,8 +1114,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void
> *tx_queue, struct rte_mbuf **tx_pk
>  	tx_id = (uint16_t)(tx_id + nb_commit);
>  	if (tx_id > txq->tx_next_rs) {
>  		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
> -
> 	rte_cpu_to_le_64(((uint64_t)IDPF_TX_DESC_CMD_RS) <<
> -					 IDPF_TXD_QW1_CMD_S);
> +			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS)
> << CI_TXD_QW1_CMD_S);
>  		txq->tx_next_rs =
>  			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
>  	}
> diff --git a/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
> b/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
> index b5e8574667..a43d8f78e2 100644
> --- a/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
> +++ b/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
> @@ -32,8 +32,8 @@ idpf_tx_desc_done(struct ci_tx_queue *txq, uint16_t
> idx)
>  		return 1;
> 
>  	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
> -			rte_cpu_to_le_64(IDPF_TXD_QW1_DTYPE_M)) ==
> -
> 	rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE);
> +			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
> +
> 	rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
>  }
> 
>  static inline int
> --
> 2.51.0


^ permalink raw reply	[flat|nested] 274+ messages in thread

* RE: [PATCH v3 05/36] net/intel: create separate header for Tx scalar fns
  2026-01-30 11:41   ` [PATCH v3 05/36] net/intel: create separate header for Tx scalar fns Bruce Richardson
@ 2026-02-06 10:23     ` Loftus, Ciara
  2026-02-09 11:04       ` Bruce Richardson
  0 siblings, 1 reply; 274+ messages in thread
From: Loftus, Ciara @ 2026-02-06 10:23 UTC (permalink / raw)
  To: Richardson, Bruce, dev@dpdk.org; +Cc: Richardson, Bruce

> Subject: [PATCH v3 05/36] net/intel: create separate header for Tx scalar fns
> 
> Rather than having all Tx code in the one file, which could start
> getting rather long, move the scalar datapath functions to a new header
> file.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>  drivers/net/intel/common/tx.h            | 58 ++------------------
>  drivers/net/intel/common/tx_scalar_fns.h | 67
> ++++++++++++++++++++++++
>  2 files changed, 72 insertions(+), 53 deletions(-)
>  create mode 100644 drivers/net/intel/common/tx_scalar_fns.h

Why not create the file when ci_tx_xmit_cleanup was first introduced?
I prefer the naming tx_scalar.h but keep tx_scalar_fns.h if you feel it's better.

> 
> diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
> index 03245d4fba..01e42303b4 100644
> --- a/drivers/net/intel/common/tx.h
> +++ b/drivers/net/intel/common/tx.h
> @@ -319,59 +319,6 @@ ci_tx_free_bufs_vec(struct ci_tx_queue *txq,
> ci_desc_done_fn desc_done, bool ctx
>  	return txq->tx_rs_thresh;
>  }
> 
> -/*
> - * Common transmit descriptor cleanup function for Intel drivers.
> - * Used by ice, i40e, iavf, and idpf drivers.
> - *
> - * Returns:
> - *   0 on success
> - *  -1 if cleanup cannot proceed (descriptors not yet processed by HW)
> - */
> -static __rte_always_inline int
> -ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
> -{
> -	struct ci_tx_entry *sw_ring = txq->sw_ring;
> -	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
> -	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
> -	uint16_t nb_tx_desc = txq->nb_tx_desc;
> -	uint16_t desc_to_clean_to;
> -	uint16_t nb_tx_to_clean;
> -
> -	/* Determine the last descriptor needing to be cleaned */
> -	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
> -	if (desc_to_clean_to >= nb_tx_desc)
> -		desc_to_clean_to = (uint16_t)(desc_to_clean_to -
> nb_tx_desc);
> -
> -	/* Check to make sure the last descriptor to clean is done */
> -	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
> -
> -	/* Check if descriptor is done - all drivers use 0xF as done value in bits
> 3:0 */
> -	if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
> rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
> -			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
> {
> -		/* Descriptor not yet processed by hardware */
> -		return -1;
> -	}
> -
> -	/* Figure out how many descriptors will be cleaned */
> -	if (last_desc_cleaned > desc_to_clean_to)
> -		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned)
> + desc_to_clean_to);
> -	else
> -		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
> last_desc_cleaned);
> -
> -	/* The last descriptor to clean is done, so that means all the
> -	 * descriptors from the last descriptor that was cleaned
> -	 * up to the last descriptor with the RS bit set
> -	 * are done. Only reset the threshold descriptor.
> -	 */
> -	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
> -
> -	/* Update the txq to reflect the last descriptor that was cleaned */
> -	txq->last_desc_cleaned = desc_to_clean_to;
> -	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
> -
> -	return 0;
> -}
> -
>  static inline void
>  ci_txq_release_all_mbufs(struct ci_tx_queue *txq, bool use_ctx)
>  {
> @@ -490,4 +437,9 @@ ci_tx_path_select(const struct ci_tx_path_features
> *req_features,
>  	return idx;
>  }
> 
> +/* include the scalar functions at the end, so they can use the common
> definitions.
> + * This is done so drivers can use all functions just by including tx.h
> + */
> +#include "tx_scalar_fns.h"
> +
>  #endif /* _COMMON_INTEL_TX_H_ */
> diff --git a/drivers/net/intel/common/tx_scalar_fns.h
> b/drivers/net/intel/common/tx_scalar_fns.h
> new file mode 100644
> index 0000000000..c79210d084
> --- /dev/null
> +++ b/drivers/net/intel/common/tx_scalar_fns.h
> @@ -0,0 +1,67 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2025 Intel Corporation

Should this be 2026?

> + */
> +
> +#ifndef _COMMON_INTEL_TX_SCALAR_FNS_H_
> +#define _COMMON_INTEL_TX_SCALAR_FNS_H_
> +
> +#include <stdint.h>
> +#include <rte_byteorder.h>
> +
> +/* depends on common Tx definitions. */
> +#include "tx.h"
> +
> +/*
> + * Common transmit descriptor cleanup function for Intel drivers.
> + * Used by ice, i40e, iavf, and idpf drivers.

Do we need to call out the driver names in the comment? If a new driver
were to adopt this function they would need to patch this file too to
update the comment.

> + *
> + * Returns:
> + *   0 on success
> + *  -1 if cleanup cannot proceed (descriptors not yet processed by HW)
> + */
> +static __rte_always_inline int
> +ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
> +{
> +	struct ci_tx_entry *sw_ring = txq->sw_ring;
> +	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
> +	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
> +	uint16_t nb_tx_desc = txq->nb_tx_desc;
> +	uint16_t desc_to_clean_to;
> +	uint16_t nb_tx_to_clean;
> +
> +	/* Determine the last descriptor needing to be cleaned */
> +	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
> +	if (desc_to_clean_to >= nb_tx_desc)
> +		desc_to_clean_to = (uint16_t)(desc_to_clean_to -
> nb_tx_desc);
> +
> +	/* Check to make sure the last descriptor to clean is done */
> +	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
> +
> +	/* Check if descriptor is done - all drivers use 0xF as done value in bits
> 3:0 */
> +	if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
> rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
> +			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
> {
> +		/* Descriptor not yet processed by hardware */
> +		return -1;
> +	}
> +
> +	/* Figure out how many descriptors will be cleaned */
> +	if (last_desc_cleaned > desc_to_clean_to)
> +		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned)
> + desc_to_clean_to);
> +	else
> +		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
> last_desc_cleaned);
> +
> +	/* The last descriptor to clean is done, so that means all the
> +	 * descriptors from the last descriptor that was cleaned
> +	 * up to the last descriptor with the RS bit set
> +	 * are done. Only reset the threshold descriptor.
> +	 */
> +	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
> +
> +	/* Update the txq to reflect the last descriptor that was cleaned */
> +	txq->last_desc_cleaned = desc_to_clean_to;
> +	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
> +
> +	return 0;
> +}
> +
> +#endif /* _COMMON_INTEL_TX_SCALAR_FNS_H_ */
> --
> 2.51.0


^ permalink raw reply	[flat|nested] 274+ messages in thread

* RE: [PATCH v3 06/36] net/intel: add common fn to calculate needed descriptors
  2026-01-30 11:41   ` [PATCH v3 06/36] net/intel: add common fn to calculate needed descriptors Bruce Richardson
@ 2026-02-06 10:25     ` Loftus, Ciara
  2026-02-09 11:15       ` Bruce Richardson
  0 siblings, 1 reply; 274+ messages in thread
From: Loftus, Ciara @ 2026-02-06 10:25 UTC (permalink / raw)
  To: Richardson, Bruce, dev@dpdk.org
  Cc: Richardson, Bruce, Medvedkin, Vladimir, Burakov, Anatoly,
	Wu, Jingjing, Shetty,  Praveen

> Subject: [PATCH v3 06/36] net/intel: add common fn to calculate needed
> descriptors
> 
> Multiple drivers used the same logic to calculate how many Tx data
> descriptors were needed. Move that calculation to common code. In the
> process of updating drivers, fix idpf driver calculation for the TSO
> case.

Should this fix be split out into a separate patch with a Fixes tag?

> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>  drivers/net/intel/common/tx_scalar_fns.h  | 21 +++++++++++++++++++++
>  drivers/net/intel/i40e/i40e_rxtx.c        | 18 +-----------------
>  drivers/net/intel/iavf/iavf_rxtx.c        | 17 +----------------
>  drivers/net/intel/ice/ice_rxtx.c          | 18 +-----------------
>  drivers/net/intel/idpf/idpf_common_rxtx.c | 21 +++++++++++++++++----
>  5 files changed, 41 insertions(+), 54 deletions(-)
> 
> diff --git a/drivers/net/intel/common/tx_scalar_fns.h
> b/drivers/net/intel/common/tx_scalar_fns.h
> index c79210d084..f894cea616 100644
> --- a/drivers/net/intel/common/tx_scalar_fns.h
> +++ b/drivers/net/intel/common/tx_scalar_fns.h
> @@ -64,4 +64,25 @@ ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
>  	return 0;
>  }
> 
> +static inline uint16_t
> +ci_div_roundup16(uint16_t x, uint16_t y)
> +{
> +	return (uint16_t)((x + y - 1) / y);
> +}
> +
> +/* Calculate the number of TX descriptors needed for each pkt */
> +static inline uint16_t
> +ci_calc_pkt_desc(const struct rte_mbuf *tx_pkt)
> +{
> +	uint16_t count = 0;
> +
> +	while (tx_pkt != NULL) {
> +		count += ci_div_roundup16(tx_pkt->data_len,
> CI_MAX_DATA_PER_TXD);
> +		tx_pkt = tx_pkt->next;
> +	}
> +
> +	return count;
> +}
> +
> +
>  #endif /* _COMMON_INTEL_TX_SCALAR_FNS_H_ */
> diff --git a/drivers/net/intel/i40e/i40e_rxtx.c
> b/drivers/net/intel/i40e/i40e_rxtx.c
> index f96c5c7f1e..b75306931a 100644
> --- a/drivers/net/intel/i40e/i40e_rxtx.c
> +++ b/drivers/net/intel/i40e/i40e_rxtx.c
> @@ -1029,21 +1029,6 @@ i40e_set_tso_ctx(struct rte_mbuf *mbuf, union
> ci_tx_offload tx_offload)
>  	return ctx_desc;
>  }
> 
> -/* Calculate the number of TX descriptors needed for each pkt */
> -static inline uint16_t
> -i40e_calc_pkt_desc(struct rte_mbuf *tx_pkt)
> -{
> -	struct rte_mbuf *txd = tx_pkt;
> -	uint16_t count = 0;
> -
> -	while (txd != NULL) {
> -		count += DIV_ROUND_UP(txd->data_len,
> CI_MAX_DATA_PER_TXD);
> -		txd = txd->next;
> -	}
> -
> -	return count;
> -}
> -
>  uint16_t
>  i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t
> nb_pkts)
>  {
> @@ -1106,8 +1091,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
>  		 * per tx desc.
>  		 */
>  		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
> -			nb_used = (uint16_t)(i40e_calc_pkt_desc(tx_pkt) +
> -					     nb_ctx);
> +			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) +
> nb_ctx);
>  		else
>  			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
>  		tx_last = (uint16_t)(tx_id + nb_used - 1);
> diff --git a/drivers/net/intel/iavf/iavf_rxtx.c
> b/drivers/net/intel/iavf/iavf_rxtx.c
> index 947b6c24d2..885d9309cc 100644
> --- a/drivers/net/intel/iavf/iavf_rxtx.c
> +++ b/drivers/net/intel/iavf/iavf_rxtx.c
> @@ -2666,21 +2666,6 @@ iavf_build_data_desc_cmd_offset_fields(volatile
> uint64_t *qw1,
>  		((uint64_t)l2tag1 << IAVF_TXD_DATA_QW1_L2TAG1_SHIFT));
>  }
> 
> -/* Calculate the number of TX descriptors needed for each pkt */
> -static inline uint16_t
> -iavf_calc_pkt_desc(struct rte_mbuf *tx_pkt)
> -{
> -	struct rte_mbuf *txd = tx_pkt;
> -	uint16_t count = 0;
> -
> -	while (txd != NULL) {
> -		count += (txd->data_len + CI_MAX_DATA_PER_TXD - 1) /
> CI_MAX_DATA_PER_TXD;
> -		txd = txd->next;
> -	}
> -
> -	return count;
> -}
> -
>  static inline void
>  iavf_fill_data_desc(volatile struct ci_tx_desc *desc,
>  	uint64_t desc_template,	uint16_t buffsz,
> @@ -2766,7 +2751,7 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
>  		 * per tx desc.
>  		 */
>  		if (mb->ol_flags & RTE_MBUF_F_TX_TCP_SEG)
> -			nb_desc_required = iavf_calc_pkt_desc(mb) +
> nb_desc_ctx + nb_desc_ipsec;
> +			nb_desc_required = ci_calc_pkt_desc(mb) +
> nb_desc_ctx + nb_desc_ipsec;
>  		else
>  			nb_desc_required = nb_desc_data + nb_desc_ctx +
> nb_desc_ipsec;
> 
> diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
> index 52bbf95967..2a53b614b2 100644
> --- a/drivers/net/intel/ice/ice_rxtx.c
> +++ b/drivers/net/intel/ice/ice_rxtx.c
> @@ -3075,21 +3075,6 @@ ice_set_tso_ctx(struct rte_mbuf *mbuf, union
> ci_tx_offload tx_offload)
>  	return ctx_desc;
>  }
> 
> -/* Calculate the number of TX descriptors needed for each pkt */
> -static inline uint16_t
> -ice_calc_pkt_desc(struct rte_mbuf *tx_pkt)
> -{
> -	struct rte_mbuf *txd = tx_pkt;
> -	uint16_t count = 0;
> -
> -	while (txd != NULL) {
> -		count += DIV_ROUND_UP(txd->data_len,
> CI_MAX_DATA_PER_TXD);
> -		txd = txd->next;
> -	}
> -
> -	return count;
> -}
> -
>  uint16_t
>  ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
>  {
> @@ -3152,8 +3137,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
>  		 * per tx desc.
>  		 */
>  		if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG |
> RTE_MBUF_F_TX_UDP_SEG))
> -			nb_used = (uint16_t)(ice_calc_pkt_desc(tx_pkt) +
> -					     nb_ctx);
> +			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) +
> nb_ctx);
>  		else
>  			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
>  		tx_last = (uint16_t)(tx_id + nb_used - 1);
> diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c
> b/drivers/net/intel/idpf/idpf_common_rxtx.c
> index 587871b54a..11d6848430 100644
> --- a/drivers/net/intel/idpf/idpf_common_rxtx.c
> +++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
> @@ -934,7 +934,16 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct
> rte_mbuf **tx_pkts,
>  		tx_offload.tso_segsz = tx_pkt->tso_segsz;
>  		/* Calculate the number of context descriptors needed. */
>  		nb_ctx = idpf_calc_context_desc(ol_flags);
> -		nb_used = tx_pkt->nb_segs + nb_ctx;
> +
> +		/* Calculate the number of TX descriptors needed for
> +		 * each packet. For TSO packets, use ci_calc_pkt_desc as
> +		 * the mbuf data size might exceed max data size that hw
> allows
> +		 * per tx desc.
> +		 */
> +		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
> +			nb_used = ci_calc_pkt_desc(tx_pkt) + nb_ctx;
> +		else
> +			nb_used = tx_pkt->nb_segs + nb_ctx;
> 
>  		if (ol_flags & IDPF_TX_CKSUM_OFFLOAD_MASK)
>  			cmd_dtype = IDPF_TXD_FLEX_FLOW_CMD_CS_EN;
> @@ -1382,10 +1391,14 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue,
> struct rte_mbuf **tx_pkts,
>  		nb_ctx = idpf_calc_context_desc(ol_flags);
> 
>  		/* The number of descriptors that must be allocated for
> -		 * a packet equals to the number of the segments of that
> -		 * packet plus 1 context descriptor if needed.
> +		 * a packet. For TSO packets, use ci_calc_pkt_desc as
> +		 * the mbuf data size might exceed max data size that hw
> allows
> +		 * per tx desc.
>  		 */
> -		nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
> +		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
> +			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) +
> nb_ctx);
> +		else
> +			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
>  		tx_last = (uint16_t)(tx_id + nb_used - 1);
> 
>  		/* Circular ring */
> --
> 2.51.0


^ permalink raw reply	[flat|nested] 274+ messages in thread

* RE: [PATCH v3 07/36] net/ice: refactor context descriptor handling
  2026-01-30 11:41   ` [PATCH v3 07/36] net/ice: refactor context descriptor handling Bruce Richardson
@ 2026-02-06 10:47     ` Loftus, Ciara
  2026-02-09 11:16       ` Bruce Richardson
  0 siblings, 1 reply; 274+ messages in thread
From: Loftus, Ciara @ 2026-02-06 10:47 UTC (permalink / raw)
  To: Richardson, Bruce, dev@dpdk.org; +Cc: Richardson, Bruce, Burakov, Anatoly

> Subject: [PATCH v3 07/36] net/ice: refactor context descriptor handling
> 
> Create a single function to manage all context descriptor handling,
> which returns either 0 or 1 depending on whether a descriptor is needed
> or not, as well as returning directly the descriptor contents if
> relevant.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>  drivers/net/intel/ice/ice_rxtx.c | 104 +++++++++++++++++--------------
>  1 file changed, 57 insertions(+), 47 deletions(-)
> 
> diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
> index 2a53b614b2..cc442fed75 100644
> --- a/drivers/net/intel/ice/ice_rxtx.c
> +++ b/drivers/net/intel/ice/ice_rxtx.c
> @@ -2966,10 +2966,6 @@ ice_txd_enable_checksum(uint64_t ol_flags,
>  			uint32_t *td_offset,
>  			union ci_tx_offload tx_offload)
>  {
> -	/* Set MACLEN */
> -	if (!(ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK))
> -		*td_offset |= (tx_offload.l2_len >> 1)
> -			<< CI_TX_DESC_LEN_MACLEN_S;
> 
>  	/* Enable L3 checksum offloads */
>  	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
> @@ -3052,7 +3048,7 @@ ice_calc_context_desc(uint64_t flags)
> 
>  /* set ice TSO context descriptor */
>  static inline uint64_t
> -ice_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
> +ice_set_tso_ctx(uint64_t ol_flags, const struct rte_mbuf *mbuf, union
> ci_tx_offload tx_offload)
>  {
>  	uint64_t ctx_desc = 0;
>  	uint32_t cd_cmd, hdr_len, cd_tso_len;
> @@ -3063,7 +3059,7 @@ ice_set_tso_ctx(struct rte_mbuf *mbuf, union
> ci_tx_offload tx_offload)
>  	}
> 
>  	hdr_len = tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len;
> -	hdr_len += (mbuf->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) ?
> +	hdr_len += (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) ?
>  		   tx_offload.outer_l2_len + tx_offload.outer_l3_len : 0;
> 
>  	cd_cmd = CI_TX_CTX_DESC_TSO;
> @@ -3075,6 +3071,49 @@ ice_set_tso_ctx(struct rte_mbuf *mbuf, union
> ci_tx_offload tx_offload)
>  	return ctx_desc;
>  }
> 
> +/* compute a context descriptor if one is necessary based on the ol_flags
> + *
> + * Returns 0 if no descriptor is necessary.
> + * Returns 1 if one is necessary and the contents of the descriptor are
> returned
> + *   in the values pointed to by qw0 and qw1. td_offset may also be modified.

Regarding the comment above, "td_offset" is not a variable in this function, I
assume the comment is obsolete.

> + */
> +static __rte_always_inline uint16_t
> +get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
> +	const union ci_tx_offload *tx_offload, const struct ci_tx_queue *txq,
> +	uint64_t *qw0, uint64_t *qw1)
> +{
> +	uint16_t cd_l2tag2 = 0;
> +	uint64_t cd_type_cmd_tso_mss = ICE_TX_DESC_DTYPE_CTX;
> +	uint32_t cd_tunneling_params = 0;
> +	uint64_t ptp_tx_index = txq->ice_vsi->adapter->ptp_tx_index;
> +
> +	if (ice_calc_context_desc(ol_flags) == 0)
> +		return 0;
> +
> +	if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)
> +		ice_parse_tunneling_params(ol_flags, *tx_offload,
> &cd_tunneling_params);
> +
> +	if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG |
> RTE_MBUF_F_TX_UDP_SEG))
> +		cd_type_cmd_tso_mss |= ice_set_tso_ctx(ol_flags, tx_pkt,
> *tx_offload);
> +	else if (ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST)
> +		cd_type_cmd_tso_mss |=
> +			((uint64_t)CI_TX_CTX_DESC_TSYN <<
> CI_TXD_QW1_CMD_S) |
> +			((ptp_tx_index << ICE_TXD_CTX_QW1_TSYN_S) &
> ICE_TXD_CTX_QW1_TSYN_M);
> +
> +
> +	/* TX context descriptor based double VLAN insert */
> +	if (ol_flags & RTE_MBUF_F_TX_QINQ) {
> +		cd_l2tag2 = tx_pkt->vlan_tci_outer;
> +		cd_type_cmd_tso_mss |=
> ((uint64_t)CI_TX_CTX_DESC_IL2TAG2 << CI_TXD_QW1_CMD_S);
> +	}
> +
> +	*qw0 = rte_cpu_to_le_32(cd_tunneling_params) |
> +		((uint64_t)rte_cpu_to_le_16(cd_l2tag2) << 32);
> +	*qw1 = rte_cpu_to_le_64(cd_type_cmd_tso_mss);
> +
> +	return 1;
> +}
> +
>  uint16_t
>  ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
>  {
> @@ -3085,7 +3124,6 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
>  	struct ci_tx_entry *txe, *txn;
>  	struct rte_mbuf *tx_pkt;
>  	struct rte_mbuf *m_seg;
> -	uint32_t cd_tunneling_params;
>  	uint16_t tx_id;
>  	uint16_t ts_id = -1;
>  	uint16_t nb_tx;
> @@ -3096,6 +3134,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
>  	uint32_t td_tag = 0;
>  	uint16_t tx_last;
>  	uint16_t slen;
> +	uint16_t l2_len;
>  	uint64_t buf_dma_addr;
>  	uint64_t ol_flags;
>  	union ci_tx_offload tx_offload = {0};
> @@ -3114,20 +3153,25 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
>  		(void)ci_tx_xmit_cleanup(txq);
> 
>  	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
> +		uint64_t cd_qw0, cd_qw1;
>  		tx_pkt = *tx_pkts++;
> 
> +		ol_flags = tx_pkt->ol_flags;
>  		td_cmd = 0;
>  		td_tag = 0;
> -		td_offset = 0;
> -		ol_flags = tx_pkt->ol_flags;
> +		l2_len = ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK ?
> +				tx_pkt->outer_l2_len : tx_pkt->l2_len;
> +		td_offset = (l2_len >> 1) << CI_TX_DESC_LEN_MACLEN_S;
> +
>  		tx_offload.l2_len = tx_pkt->l2_len;
>  		tx_offload.l3_len = tx_pkt->l3_len;
>  		tx_offload.outer_l2_len = tx_pkt->outer_l2_len;
>  		tx_offload.outer_l3_len = tx_pkt->outer_l3_len;
>  		tx_offload.l4_len = tx_pkt->l4_len;
>  		tx_offload.tso_segsz = tx_pkt->tso_segsz;
> +
>  		/* Calculate the number of context descriptors needed. */
> -		nb_ctx = ice_calc_context_desc(ol_flags);
> +		nb_ctx = get_context_desc(ol_flags, tx_pkt, &tx_offload, txq,
> &cd_qw0, &cd_qw1);
> 
>  		/* The number of descriptors that must be allocated for
>  		 * a packet equals to the number of the segments of that
> @@ -3169,15 +3213,6 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
>  			td_tag = tx_pkt->vlan_tci;
>  		}
> 
> -		/* Fill in tunneling parameters if necessary */
> -		cd_tunneling_params = 0;
> -		if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) {
> -			td_offset |= (tx_offload.outer_l2_len >> 1)
> -				<< CI_TX_DESC_LEN_MACLEN_S;
> -			ice_parse_tunneling_params(ol_flags, tx_offload,
> -						   &cd_tunneling_params);
> -		}
> -
>  		/* Enable checksum offloading */
>  		if (ol_flags & ICE_TX_CKSUM_OFFLOAD_MASK)
>  			ice_txd_enable_checksum(ol_flags, &td_cmd,
> @@ -3185,11 +3220,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
> 
>  		if (nb_ctx) {
>  			/* Setup TX context descriptor if required */
> -			volatile struct ice_tx_ctx_desc *ctx_txd =
> -				(volatile struct ice_tx_ctx_desc *)
> -					&ci_tx_ring[tx_id];
> -			uint16_t cd_l2tag2 = 0;
> -			uint64_t cd_type_cmd_tso_mss =
> ICE_TX_DESC_DTYPE_CTX;
> +			uint64_t *ctx_txd = RTE_CAST_PTR(uint64_t *,
> &ci_tx_ring[tx_id]);
> 
>  			txn = &sw_ring[txe->next_id];
>  			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
> @@ -3198,29 +3229,8 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
>  				txe->mbuf = NULL;
>  			}
> 
> -			if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG |
> RTE_MBUF_F_TX_UDP_SEG))
> -				cd_type_cmd_tso_mss |=
> -					ice_set_tso_ctx(tx_pkt, tx_offload);
> -			else if (ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST)
> -				cd_type_cmd_tso_mss |=
> -					((uint64_t)CI_TX_CTX_DESC_TSYN <<
> -					CI_TXD_QW1_CMD_S) |
> -					 (((uint64_t)txq->ice_vsi->adapter-
> >ptp_tx_index <<
> -					 ICE_TXD_CTX_QW1_TSYN_S) &
> ICE_TXD_CTX_QW1_TSYN_M);
> -
> -			ctx_txd->tunneling_params =
> -				rte_cpu_to_le_32(cd_tunneling_params);
> -
> -			/* TX context descriptor based double VLAN insert */
> -			if (ol_flags & RTE_MBUF_F_TX_QINQ) {
> -				cd_l2tag2 = tx_pkt->vlan_tci_outer;
> -				cd_type_cmd_tso_mss |=
> -					((uint64_t)CI_TX_CTX_DESC_IL2TAG2
> <<
> -					 CI_TXD_QW1_CMD_S);
> -			}
> -			ctx_txd->l2tag2 = rte_cpu_to_le_16(cd_l2tag2);
> -			ctx_txd->qw1 =
> -				rte_cpu_to_le_64(cd_type_cmd_tso_mss);
> +			ctx_txd[0] = cd_qw0;
> +			ctx_txd[1] = cd_qw1;
> 
>  			txe->last_id = tx_last;
>  			tx_id = txe->next_id;
> --
> 2.51.0


^ permalink raw reply	[flat|nested] 274+ messages in thread

* RE: [PATCH v3 08/36] net/i40e: refactor context descriptor handling
  2026-01-30 11:41   ` [PATCH v3 08/36] net/i40e: " Bruce Richardson
@ 2026-02-06 10:54     ` Loftus, Ciara
  0 siblings, 0 replies; 274+ messages in thread
From: Loftus, Ciara @ 2026-02-06 10:54 UTC (permalink / raw)
  To: Richardson, Bruce, dev@dpdk.org; +Cc: Richardson, Bruce

> Subject: [PATCH v3 08/36] net/i40e: refactor context descriptor handling
> 
> move all context descriptor handling to a single function, as with the

Nit: capitalise

> ice driver, and use the same function signature as that driver.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>  drivers/net/intel/i40e/i40e_rxtx.c | 123 +++++++++++++++--------------
>  1 file changed, 63 insertions(+), 60 deletions(-)
> 
> diff --git a/drivers/net/intel/i40e/i40e_rxtx.c
> b/drivers/net/intel/i40e/i40e_rxtx.c
> index b75306931a..183b70c63f 100644
> --- a/drivers/net/intel/i40e/i40e_rxtx.c
> +++ b/drivers/net/intel/i40e/i40e_rxtx.c
> @@ -321,11 +321,6 @@ i40e_txd_enable_checksum(uint64_t ol_flags,
>  			uint32_t *td_offset,
>  			union ci_tx_offload tx_offload)
>  {
> -	/* Set MACLEN */
> -	if (!(ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK))
> -		*td_offset |= (tx_offload.l2_len >> 1)
> -			<< CI_TX_DESC_LEN_MACLEN_S;
> -
>  	/* Enable L3 checksum offloads */
>  	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
>  		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
> @@ -1004,7 +999,7 @@ i40e_calc_context_desc(uint64_t flags)
> 
>  /* set i40e TSO context descriptor */
>  static inline uint64_t
> -i40e_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
> +i40e_set_tso_ctx(uint64_t ol_flags, const struct rte_mbuf *mbuf, union
> ci_tx_offload tx_offload)
>  {
>  	uint64_t ctx_desc = 0;
>  	uint32_t cd_cmd, hdr_len, cd_tso_len;
> @@ -1015,7 +1010,7 @@ i40e_set_tso_ctx(struct rte_mbuf *mbuf, union
> ci_tx_offload tx_offload)
>  	}
> 
>  	hdr_len = tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len;
> -	hdr_len += (mbuf->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) ?
> +	hdr_len += (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) ?
>  		   tx_offload.outer_l2_len + tx_offload.outer_l3_len : 0;
> 
>  	cd_cmd = I40E_TX_CTX_DESC_TSO;
> @@ -1029,6 +1024,52 @@ i40e_set_tso_ctx(struct rte_mbuf *mbuf, union
> ci_tx_offload tx_offload)
>  	return ctx_desc;
>  }
> 
> +/* compute a context descriptor if one is necessary based on the ol_flags
> + *
> + * Returns 0 if no descriptor is necessary.
> + * Returns 1 if one is necessary and the contents of the descriptor are
> returned
> + *   in the values pointed to by qw0 and qw1. td_offset may also be modified.

Same comment as previous patch re td_offset comment

> + */
> +static __rte_always_inline uint16_t
> +get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
> +		 const union ci_tx_offload *tx_offload,
> +		 const struct ci_tx_queue *txq __rte_unused,
> +		 uint64_t *qw0, uint64_t *qw1)
> +{
> +	uint16_t cd_l2tag2 = 0;
> +	uint64_t cd_type_cmd_tso_mss = I40E_TX_DESC_DTYPE_CONTEXT;

CI_TX_DESC_DTYPE_CTX could now be used

> +	uint32_t cd_tunneling_params = 0;
> +



^ permalink raw reply	[flat|nested] 274+ messages in thread

* RE: [PATCH v3 09/36] net/idpf: refactor context descriptor handling
  2026-01-30 11:41   ` [PATCH v3 09/36] net/idpf: " Bruce Richardson
@ 2026-02-06 10:59     ` Loftus, Ciara
  0 siblings, 0 replies; 274+ messages in thread
From: Loftus, Ciara @ 2026-02-06 10:59 UTC (permalink / raw)
  To: Richardson, Bruce, dev@dpdk.org
  Cc: Richardson, Bruce, Wu, Jingjing, Shetty, Praveen

> Subject: [PATCH v3 09/36] net/idpf: refactor context descriptor handling
> 
> move all context descriptor handling to a single function, as with the

Nit: capitalise

> ice driver.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

Acked-by: Ciara Loftus <ciara.loftus@intel.com>



^ permalink raw reply	[flat|nested] 274+ messages in thread

* RE: [PATCH v3 10/36] net/intel: consolidate checksum mask definition
  2026-01-30 11:41   ` [PATCH v3 10/36] net/intel: consolidate checksum mask definition Bruce Richardson
@ 2026-02-06 11:25     ` Loftus, Ciara
  2026-02-09 11:40       ` Bruce Richardson
  0 siblings, 1 reply; 274+ messages in thread
From: Loftus, Ciara @ 2026-02-06 11:25 UTC (permalink / raw)
  To: Richardson, Bruce, dev@dpdk.org
  Cc: Richardson, Bruce, Medvedkin, Vladimir, Burakov, Anatoly,
	Wu, Jingjing, Shetty,  Praveen

> Subject: [PATCH v3 10/36] net/intel: consolidate checksum mask definition
> 
> Create a common definition for checksum masks across iavf, idpf, i40e
> and ice drivers.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>  drivers/net/intel/common/tx.h             | 8 ++++++++
>  drivers/net/intel/i40e/i40e_rxtx.c        | 7 +------
>  drivers/net/intel/iavf/iavf_rxtx.c        | 2 +-
>  drivers/net/intel/iavf/iavf_rxtx.h        | 8 --------
>  drivers/net/intel/ice/ice_rxtx.c          | 8 +-------
>  drivers/net/intel/idpf/idpf_common_rxtx.c | 4 ++--
>  drivers/net/intel/idpf/idpf_common_rxtx.h | 7 +------
>  7 files changed, 14 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
> index 01e42303b4..928fad1df5 100644
> --- a/drivers/net/intel/common/tx.h
> +++ b/drivers/net/intel/common/tx.h
> @@ -53,6 +53,14 @@
>  /* Common maximum data per TX descriptor */
>  #define CI_MAX_DATA_PER_TXD     (CI_TXD_QW1_TX_BUF_SZ_M >>
> CI_TXD_QW1_TX_BUF_SZ_S)
> 
> +/* Checksum offload mask to identify packets requesting offload */
> +#define CI_TX_CKSUM_OFFLOAD_MASK (RTE_MBUF_F_TX_IP_CKSUM |
> 		 \
> +				   RTE_MBUF_F_TX_L4_MASK |		 \
> +				   RTE_MBUF_F_TX_TCP_SEG |		 \
> +				   RTE_MBUF_F_TX_UDP_SEG |		 \
> +				   RTE_MBUF_F_TX_OUTER_IP_CKSUM |
> 	 \
> +				   RTE_MBUF_F_TX_OUTER_UDP_CKSUM)
> +
>  /**


<snip>

> diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.h
> b/drivers/net/intel/idpf/idpf_common_rxtx.h
> index b88a87402d..fe7094d434 100644
> --- a/drivers/net/intel/idpf/idpf_common_rxtx.h
> +++ b/drivers/net/intel/idpf/idpf_common_rxtx.h
> @@ -39,13 +39,8 @@
>  #define IDPF_RLAN_CTX_DBUF_S	7
>  #define IDPF_RX_MAX_DATA_BUF_SIZE	(16 * 1024 - 128)
> 
> -#define IDPF_TX_CKSUM_OFFLOAD_MASK (		\
> -		RTE_MBUF_F_TX_IP_CKSUM |	\
> -		RTE_MBUF_F_TX_L4_MASK |		\
> -		RTE_MBUF_F_TX_TCP_SEG)
> -
>  #define IDPF_TX_OFFLOAD_MASK (			\
> -		IDPF_TX_CKSUM_OFFLOAD_MASK |	\
> +		CI_TX_CKSUM_OFFLOAD_MASK |	\

With this change should the features in idpf.ini be updated to include
Inner L3/L4 checksum?
And IDPF_TX_SCALAR_OFFLOADS update to include
RTE_ETH_TX_OFFLOAD_OUTER_IPV4_CKSUM
RTE_ETH_TX_OFFLOAD_OUTER_UDP_CKSUM

>  		RTE_MBUF_F_TX_IPV4 |		\
>  		RTE_MBUF_F_TX_IPV6)
> 
> --
> 2.51.0


^ permalink raw reply	[flat|nested] 274+ messages in thread

* RE: [PATCH v3 11/36] net/intel: create common checksum Tx offload function
  2026-01-30 11:41   ` [PATCH v3 11/36] net/intel: create common checksum Tx offload function Bruce Richardson
@ 2026-02-06 11:37     ` Loftus, Ciara
  2026-02-09 11:41       ` Bruce Richardson
  0 siblings, 1 reply; 274+ messages in thread
From: Loftus, Ciara @ 2026-02-06 11:37 UTC (permalink / raw)
  To: Richardson, Bruce, dev@dpdk.org; +Cc: Richardson, Bruce, Burakov, Anatoly

> Subject: [PATCH v3 11/36] net/intel: create common checksum Tx offload
> function
> 
> Since i40e and ice have the same checksum offload logic, merge their
> functions into one. Future rework should enable this to be used by more
> drivers also.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>  drivers/net/intel/common/tx_scalar_fns.h | 58
> +++++++++++++++++++++++
>  drivers/net/intel/i40e/i40e_rxtx.c       | 52 +-------------------
>  drivers/net/intel/ice/ice_rxtx.c         | 60 +-----------------------
>  3 files changed, 60 insertions(+), 110 deletions(-)
> 

<snip> 

> -
>  /* Construct the tx flags */
>  static inline uint64_t
>  i40e_build_ctob(uint32_t td_cmd,
> @@ -1167,7 +1117,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
> 
>  		/* Enable checksum offloading */
>  		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
> -			i40e_txd_enable_checksum(ol_flags, &td_cmd,
> +			ci_txd_enable_checksum(ol_flags, &td_cmd,
>  						 &td_offset, tx_offload);

Now that it uses the common function which handles
RTE_MBUF_F_TX_UDP_SEG this means the scalar path now supports the
offload RTE_ETH_TX_OFFLOAD_UDP_TSO so I think it should be added to
the I40E_TX_SCALAR_OFFLOADS. It seems to be missing from the device
capabilities too. Same for ice.



^ permalink raw reply	[flat|nested] 274+ messages in thread

* RE: [PATCH v3 12/36] net/intel: create a common scalar Tx function
  2026-01-30 11:41   ` [PATCH v3 12/36] net/intel: create a common scalar Tx function Bruce Richardson
@ 2026-02-06 12:01     ` Loftus, Ciara
  2026-02-06 12:13       ` Bruce Richardson
  0 siblings, 1 reply; 274+ messages in thread
From: Loftus, Ciara @ 2026-02-06 12:01 UTC (permalink / raw)
  To: Richardson, Bruce, dev@dpdk.org; +Cc: Richardson, Bruce, Burakov, Anatoly

> Subject: [PATCH v3 12/36] net/intel: create a common scalar Tx function
> 
> Given the similarities between the transmit functions across various
> Intel drivers, make a start on consolidating them by moving the ice Tx
> function into common, for reuse by other drivers.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>  drivers/net/intel/common/tx_scalar_fns.h | 218 ++++++++++++++++++
>  drivers/net/intel/ice/ice_rxtx.c         | 270 +++++------------------
>  2 files changed, 270 insertions(+), 218 deletions(-)
> 
> diff --git a/drivers/net/intel/common/tx_scalar_fns.h
> b/drivers/net/intel/common/tx_scalar_fns.h
> index f88ca7f25a..6d01c14283 100644
> --- a/drivers/net/intel/common/tx_scalar_fns.h
> +++ b/drivers/net/intel/common/tx_scalar_fns.h

<snip>

> +typedef void (*write_ts_tail_t)(struct ci_tx_queue *txq, uint16_t ts_id);
> +
> +struct ci_timesstamp_queue_fns {

typo: timesstamp

> +	get_ts_tail_t get_ts_tail;
> +	write_ts_desc_t write_ts_desc;
> +	write_ts_tail_t write_ts_tail;
> +};
> +
> +static inline uint16_t
> +ci_xmit_pkts(struct ci_tx_queue *txq,
> +	     struct rte_mbuf **tx_pkts,
> +	     uint16_t nb_pkts,
> +	     ci_get_ctx_desc_fn get_ctx_desc,
> +	     const struct ci_timesstamp_queue_fns *ts_fns)
> +{
> +	volatile struct ci_tx_desc *ci_tx_ring;
> +	volatile struct ci_tx_desc *txd;
> +	struct ci_tx_entry *sw_ring;
> +	struct ci_tx_entry *txe, *txn;
> +	struct rte_mbuf *tx_pkt;
> +	struct rte_mbuf *m_seg;
> +	uint16_t tx_id;
> +	uint16_t ts_id = -1;
> +	uint16_t nb_tx;
> +	uint16_t nb_used;
> +	uint16_t nb_ctx;
> +	uint32_t td_cmd = 0;
> +	uint32_t td_offset = 0;
> +	uint32_t td_tag = 0;
> +	uint16_t tx_last;
> +	uint16_t slen;
> +	uint16_t l2_len;
> +	uint64_t buf_dma_addr;
> +	uint64_t ol_flags;
> +	union ci_tx_offload tx_offload = {0};
> +
> +	sw_ring = txq->sw_ring;
> +	ci_tx_ring = txq->ci_tx_ring;
> +	tx_id = txq->tx_tail;
> +	txe = &sw_ring[tx_id];
> +
> +	if (ts_fns != NULL)
> +		ts_id = ts_fns->get_ts_tail(txq);
> +
> +	/* Check if the descriptor ring needs to be cleaned. */
> +	if (txq->nb_tx_free < txq->tx_free_thresh)
> +		(void)ci_tx_xmit_cleanup(txq);
> +
> +	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
> +		uint64_t cd_qw0, cd_qw1;
> +		tx_pkt = *tx_pkts++;
> +
> +		ol_flags = tx_pkt->ol_flags;
> +		td_cmd = CI_TX_DESC_CMD_ICRC;

Why change this initialisation from 0 in the ice code to this value in the common code?

> +		td_tag = 0;
> +		l2_len = ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK ?
> +				tx_pkt->outer_l2_len : tx_pkt->l2_len;
> +		td_offset = (l2_len >> 1) << CI_TX_DESC_LEN_MACLEN_S;
> +
> +
> +		tx_offload.l2_len = tx_pkt->l2_len;

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v3 12/36] net/intel: create a common scalar Tx function
  2026-02-06 12:01     ` Loftus, Ciara
@ 2026-02-06 12:13       ` Bruce Richardson
  0 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-06 12:13 UTC (permalink / raw)
  To: Loftus, Ciara; +Cc: dev@dpdk.org, Burakov, Anatoly

On Fri, Feb 06, 2026 at 12:01:38PM +0000, Loftus, Ciara wrote:
> > Subject: [PATCH v3 12/36] net/intel: create a common scalar Tx function
> > 
> > Given the similarities between the transmit functions across various
> > Intel drivers, make a start on consolidating them by moving the ice Tx
> > function into common, for reuse by other drivers.
> > 
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > ---
> >  drivers/net/intel/common/tx_scalar_fns.h | 218 ++++++++++++++++++
> >  drivers/net/intel/ice/ice_rxtx.c         | 270 +++++------------------
> >  2 files changed, 270 insertions(+), 218 deletions(-)
> > 
> > diff --git a/drivers/net/intel/common/tx_scalar_fns.h
> > b/drivers/net/intel/common/tx_scalar_fns.h
> > index f88ca7f25a..6d01c14283 100644
> > --- a/drivers/net/intel/common/tx_scalar_fns.h
> > +++ b/drivers/net/intel/common/tx_scalar_fns.h
> 
> <snip>
> 
> > +typedef void (*write_ts_tail_t)(struct ci_tx_queue *txq, uint16_t ts_id);
> > +
> > +struct ci_timesstamp_queue_fns {
> 
> typo: timesstamp
> 
> > +	get_ts_tail_t get_ts_tail;
> > +	write_ts_desc_t write_ts_desc;
> > +	write_ts_tail_t write_ts_tail;
> > +};
> > +
> > +static inline uint16_t
> > +ci_xmit_pkts(struct ci_tx_queue *txq,
> > +	     struct rte_mbuf **tx_pkts,
> > +	     uint16_t nb_pkts,
> > +	     ci_get_ctx_desc_fn get_ctx_desc,
> > +	     const struct ci_timesstamp_queue_fns *ts_fns)
> > +{
> > +	volatile struct ci_tx_desc *ci_tx_ring;
> > +	volatile struct ci_tx_desc *txd;
> > +	struct ci_tx_entry *sw_ring;
> > +	struct ci_tx_entry *txe, *txn;
> > +	struct rte_mbuf *tx_pkt;
> > +	struct rte_mbuf *m_seg;
> > +	uint16_t tx_id;
> > +	uint16_t ts_id = -1;
> > +	uint16_t nb_tx;
> > +	uint16_t nb_used;
> > +	uint16_t nb_ctx;
> > +	uint32_t td_cmd = 0;
> > +	uint32_t td_offset = 0;
> > +	uint32_t td_tag = 0;
> > +	uint16_t tx_last;
> > +	uint16_t slen;
> > +	uint16_t l2_len;
> > +	uint64_t buf_dma_addr;
> > +	uint64_t ol_flags;
> > +	union ci_tx_offload tx_offload = {0};
> > +
> > +	sw_ring = txq->sw_ring;
> > +	ci_tx_ring = txq->ci_tx_ring;
> > +	tx_id = txq->tx_tail;
> > +	txe = &sw_ring[tx_id];
> > +
> > +	if (ts_fns != NULL)
> > +		ts_id = ts_fns->get_ts_tail(txq);
> > +
> > +	/* Check if the descriptor ring needs to be cleaned. */
> > +	if (txq->nb_tx_free < txq->tx_free_thresh)
> > +		(void)ci_tx_xmit_cleanup(txq);
> > +
> > +	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
> > +		uint64_t cd_qw0, cd_qw1;
> > +		tx_pkt = *tx_pkts++;
> > +
> > +		ol_flags = tx_pkt->ol_flags;
> > +		td_cmd = CI_TX_DESC_CMD_ICRC;
> 
> Why change this initialisation from 0 in the ice code to this value in the common code?
> 
The spec specifies that this bit is reserved and should be set to 1.

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v3 03/36] net/intel: create common post-Tx cleanup function
  2026-02-06 10:07     ` Loftus, Ciara
@ 2026-02-09 10:41       ` Bruce Richardson
  0 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 10:41 UTC (permalink / raw)
  To: Loftus, Ciara
  Cc: dev@dpdk.org, Medvedkin, Vladimir, Burakov, Anatoly, Wu, Jingjing,
	Shetty,  Praveen

On Fri, Feb 06, 2026 at 10:07:32AM +0000, Loftus, Ciara wrote:
> 
> 
> > -----Original Message-----
> > From: Bruce Richardson <bruce.richardson@intel.com>
> > Sent: Friday 30 January 2026 11:42
> > To: dev@dpdk.org
> > Cc: Richardson, Bruce <bruce.richardson@intel.com>; Medvedkin, Vladimir
> > <vladimir.medvedkin@intel.com>; Burakov, Anatoly
> > <anatoly.burakov@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Shetty,
> > Praveen <praveen.shetty@intel.com>
> > Subject: [PATCH v3 03/36] net/intel: create common post-Tx cleanup function
> > 
> > The code used in ice, iavf, idpf and i40e for doing cleanup of mbufs
> > after they had been transmitted was identical. Therefore deduplicate it
> > by moving to common and remove the driver-specific versions.
> > 
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > ---
> >  drivers/net/intel/common/tx.h             | 53 ++++++++++++++++++++
> >  drivers/net/intel/i40e/i40e_rxtx.c        | 49 ++----------------
> >  drivers/net/intel/iavf/iavf_rxtx.c        | 50 ++-----------------
> >  drivers/net/intel/ice/ice_rxtx.c          | 60 ++---------------------
> >  drivers/net/intel/idpf/idpf_common_rxtx.c | 46 ++---------------
> >  5 files changed, 71 insertions(+), 187 deletions(-)
> > 
> > diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
> > index 8cf63e59ab..a89412c195 100644
> > --- a/drivers/net/intel/common/tx.h
> > +++ b/drivers/net/intel/common/tx.h
> > @@ -259,6 +259,59 @@ ci_tx_free_bufs_vec(struct ci_tx_queue *txq,
> > ci_desc_done_fn desc_done, bool ctx
> >  	return txq->tx_rs_thresh;
> >  }
> > 
> > +/*
> > + * Common transmit descriptor cleanup function for Intel drivers.
> > + * Used by ice, i40e, iavf, and idpf drivers.
> > + *
> > + * Returns:
> > + *   0 on success
> > + *  -1 if cleanup cannot proceed (descriptors not yet processed by HW)
> > + */
> > +static __rte_always_inline int
> > +ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
> > +{
> > +	struct ci_tx_entry *sw_ring = txq->sw_ring;
> > +	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
> > +	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
> > +	uint16_t nb_tx_desc = txq->nb_tx_desc;
> > +	uint16_t desc_to_clean_to;
> > +	uint16_t nb_tx_to_clean;
> > +
> > +	/* Determine the last descriptor needing to be cleaned */
> > +	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
> > +	if (desc_to_clean_to >= nb_tx_desc)
> > +		desc_to_clean_to = (uint16_t)(desc_to_clean_to -
> > nb_tx_desc);
> > +
> > +	/* Check to make sure the last descriptor to clean is done */
> 
> This comment is similar to the next one. Maybe merge them?
> 
Ack

> > +	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
> > +
> > +	/* Check if descriptor is done - all drivers use 0xF as done value in bits
> > 3:0 */
> > +	if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
> > rte_cpu_to_le_64(0xFUL)) !=
> > +			rte_cpu_to_le_64(0xFUL)) {
> > +		/* Descriptor not yet processed by hardware */
> > +		return -1;
> > +	}
> > +
> > +	/* Figure out how many descriptors will be cleaned */
> > +	if (last_desc_cleaned > desc_to_clean_to)
> > +		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned)
> > + desc_to_clean_to);
> > +	else
> > +		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
> > last_desc_cleaned);
> > +
> > +	/* The last descriptor to clean is done, so that means all the
> > +	 * descriptors from the last descriptor that was cleaned
> > +	 * up to the last descriptor with the RS bit set
> > +	 * are done. Only reset the threshold descriptor.
> > +	 */
> > +	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
> > +
> > +	/* Update the txq to reflect the last descriptor that was cleaned */
> > +	txq->last_desc_cleaned = desc_to_clean_to;
> > +	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
> > +
> > +	return 0;
> > +}
> > +
> >  static inline void
> >  ci_txq_release_all_mbufs(struct ci_tx_queue *txq, bool use_ctx)
> >  {
> > diff --git a/drivers/net/intel/i40e/i40e_rxtx.c
> > b/drivers/net/intel/i40e/i40e_rxtx.c
> > index 210fc0201e..2760e76e99 100644
> > --- a/drivers/net/intel/i40e/i40e_rxtx.c
> > +++ b/drivers/net/intel/i40e/i40e_rxtx.c
> > @@ -384,45 +384,6 @@ i40e_build_ctob(uint32_t td_cmd,
> >  			((uint64_t)td_tag  <<
> > I40E_TXD_QW1_L2TAG1_SHIFT));
> >  }
> > 
> > -static inline int
> > -i40e_xmit_cleanup(struct ci_tx_queue *txq)
> > -{
> > -	struct ci_tx_entry *sw_ring = txq->sw_ring;
> > -	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
> > -	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
> > -	uint16_t nb_tx_desc = txq->nb_tx_desc;
> > -	uint16_t desc_to_clean_to;
> > -	uint16_t nb_tx_to_clean;
> > -
> > -	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
> > -	if (desc_to_clean_to >= nb_tx_desc)
> > -		desc_to_clean_to = (uint16_t)(desc_to_clean_to -
> > nb_tx_desc);
> > -
> > -	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
> > -	if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
> > -			rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) !=
> > -
> > 	rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE)) {
> > -		PMD_TX_LOG(DEBUG, "TX descriptor %4u is not done "
> > -			   "(port=%d queue=%d)", desc_to_clean_to,
> > -			   txq->port_id, txq->queue_id);
> 
> These logs are lost in each of the drivers. I'm not sure if they're terribly
> helpful though so I think it's fine that they're dropped.
> 
Yes, I don't think they are that helpful either. However, I was in
discussion with Anatoly about how we might look to do logging from the
common Intel code, so we may look to add them back in future if we see the
need.

/Bruce

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v3 04/36] net/intel: consolidate definitions for Tx desc fields
  2026-02-06 10:14     ` Loftus, Ciara
@ 2026-02-09 10:43       ` Bruce Richardson
  0 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 10:43 UTC (permalink / raw)
  To: Loftus, Ciara
  Cc: dev@dpdk.org, Medvedkin, Vladimir, Burakov, Anatoly, Wu, Jingjing,
	Shetty,  Praveen

On Fri, Feb 06, 2026 at 10:14:15AM +0000, Loftus, Ciara wrote:
> 
> 
> > -----Original Message-----
> > From: Bruce Richardson <bruce.richardson@intel.com>
> > Sent: Friday 30 January 2026 11:42
> > To: dev@dpdk.org
> > Cc: Richardson, Bruce <bruce.richardson@intel.com>; Medvedkin, Vladimir
> > <vladimir.medvedkin@intel.com>; Burakov, Anatoly
> > <anatoly.burakov@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Shetty,
> > Praveen <praveen.shetty@intel.com>
> > Subject: [PATCH v3 04/36] net/intel: consolidate definitions for Tx desc fields
> > 
> > The offsets of the various fields within the Tx descriptors are common
> > for i40e, iavf, ice and idpf, so put a single set of defines in tx.h and
> > use those throughout all drivers. (NOTE: there was a small difference in
> > mask of CMD field between drivers depending on whether reserved fields
> > or not were included. Those can be ignored as those bits are unused in
> > the drivers for which they are reserved). Similarly, the various flag
> > fields, such as End-of-packet (EOP) and Report-status (RS) are the same,
> > as are offload definitions so consolidate them.
> > 
> > Original definitions are in base code, and are left in place because of
> > that, but are unused.
> > 
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > ---
> >  drivers/net/intel/common/tx.h                 |  64 +++++++-
> >  drivers/net/intel/i40e/i40e_fdir.c            |  24 +--
> >  drivers/net/intel/i40e/i40e_rxtx.c            |  92 ++++++------
> >  drivers/net/intel/i40e/i40e_rxtx.h            |  17 +--
> >  .../net/intel/i40e/i40e_rxtx_vec_altivec.c    |  11 +-
> >  drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c   |  22 ++-
> >  drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c |  38 ++---
> >  drivers/net/intel/i40e/i40e_rxtx_vec_common.h |   4 +-
> >  drivers/net/intel/i40e/i40e_rxtx_vec_neon.c   |  11 +-
> >  drivers/net/intel/iavf/iavf_rxtx.c            |  68 +++++----
> >  drivers/net/intel/iavf/iavf_rxtx.h            |  20 +--
> >  drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c   |  41 ++----
> >  drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c |  80 ++++------
> >  drivers/net/intel/iavf/iavf_rxtx_vec_common.h |  34 ++---
> >  drivers/net/intel/ice/ice_dcf_ethdev.c        |   2 +-
> >  drivers/net/intel/ice/ice_rxtx.c              | 137 ++++++++----------
> >  drivers/net/intel/ice/ice_rxtx.h              |  15 +-
> >  drivers/net/intel/ice/ice_rxtx_vec_avx2.c     |  41 ++----
> >  drivers/net/intel/ice/ice_rxtx_vec_avx512.c   |  39 ++---
> >  drivers/net/intel/ice/ice_rxtx_vec_common.h   |  41 +++---
> >  drivers/net/intel/idpf/idpf_common_rxtx.c     |  22 +--
> >  drivers/net/intel/idpf/idpf_common_rxtx.h     |  12 --
> >  .../net/intel/idpf/idpf_common_rxtx_avx2.c    |  41 ++----
> >  .../net/intel/idpf/idpf_common_rxtx_avx512.c  |  41 ++----
> >  drivers/net/intel/idpf/idpf_rxtx_vec_common.h |   4 +-
> >  25 files changed, 408 insertions(+), 513 deletions(-)
> > 
> > diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
> > index a89412c195..03245d4fba 100644
> > --- a/drivers/net/intel/common/tx.h
> > +++ b/drivers/net/intel/common/tx.h
> > @@ -10,6 +10,66 @@
> >  #include <rte_ethdev.h>
> >  #include <rte_vect.h>
> > 
> > +/* Common TX Descriptor QW1 Field Definitions */
> > +#define CI_TXD_QW1_DTYPE_S      0
> > +#define CI_TXD_QW1_DTYPE_M      (0xFUL << CI_TXD_QW1_DTYPE_S)
> > +#define CI_TXD_QW1_CMD_S        4
> > +#define CI_TXD_QW1_CMD_M        (0xFFFUL << CI_TXD_QW1_CMD_S)
> 
> This define is unused in the series.
> 

I think I'll keep it in for completeness. The definitions all seem to go in
shift-value and mask-value pairs.

> > +#define CI_TXD_QW1_OFFSET_S     16
> > +#define CI_TXD_QW1_OFFSET_M     (0x3FFFFULL <<
> > CI_TXD_QW1_OFFSET_S)
> > +#define CI_TXD_QW1_TX_BUF_SZ_S  34
> > +#define CI_TXD_QW1_TX_BUF_SZ_M  (0x3FFFULL <<
> > CI_TXD_QW1_TX_BUF_SZ_S)
> > +#define CI_TXD_QW1_L2TAG1_S     48
> > +#define CI_TXD_QW1_L2TAG1_M     (0xFFFFULL << CI_TXD_QW1_L2TAG1_S)
> > +
> > +/* Common Descriptor Types */
> > +#define CI_TX_DESC_DTYPE_DATA           0x0
> > +#define CI_TX_DESC_DTYPE_CTX            0x1
> 
> This define is also unused, although there is scope to use it in
> patch 7 net/ice: refactor context descriptor handling
> 

Will investigate. Again, even if unused, I think it is good to keep for
completeness.

> > +#define CI_TX_DESC_DTYPE_DESC_DONE      0xF
> > +
> > +/* Common TX Descriptor Command Flags */
> > +#define CI_TX_DESC_CMD_EOP              0x0001
> > +#define CI_TX_DESC_CMD_RS               0x0002
> > +#define CI_TX_DESC_CMD_ICRC             0x0004
> > +#define CI_TX_DESC_CMD_IL2TAG1          0x0008
> > +#define CI_TX_DESC_CMD_DUMMY            0x0010
> > +#define CI_TX_DESC_CMD_IIPT_IPV6        0x0020
> > +#define CI_TX_DESC_CMD_IIPT_IPV4        0x0040
> > +#define CI_TX_DESC_CMD_IIPT_IPV4_CSUM   0x0060
> > +#define CI_TX_DESC_CMD_L4T_EOFT_TCP     0x0100
> > +#define CI_TX_DESC_CMD_L4T_EOFT_SCTP    0x0200
> > +#define CI_TX_DESC_CMD_L4T_EOFT_UDP     0x0300
> > +
> > +/* Common TX Context Descriptor Commands */
> > +#define CI_TX_CTX_DESC_TSO              0x01
> > +#define CI_TX_CTX_DESC_TSYN             0x02
> > +#define CI_TX_CTX_DESC_IL2TAG2          0x04
> > +
> > +/* Common TX Descriptor Length Field Shifts */
> > +#define CI_TX_DESC_LEN_MACLEN_S         0  /* 7 BITS */
> > +#define CI_TX_DESC_LEN_IPLEN_S          7  /* 7 BITS */
> > +#define CI_TX_DESC_LEN_L4_LEN_S         14 /* 4 BITS */
> > +
> > +/* Common maximum data per TX descriptor */
> > +#define CI_MAX_DATA_PER_TXD     (CI_TXD_QW1_TX_BUF_SZ_M >>
> > CI_TXD_QW1_TX_BUF_SZ_S)
> > +
> > +/**
> > + * Common TX offload union for Intel drivers.
> > + * Supports both basic offloads (l2_len, l3_len, l4_len, tso_segsz) and
> > + * extended offloads (outer_l2_len, outer_l3_len) for tunneling support.
> > + */
> > +union ci_tx_offload {
> > +	uint64_t data;
> > +	struct {
> > +		uint64_t l2_len:7;        /**< L2 (MAC) Header Length. */
> > +		uint64_t l3_len:9;        /**< L3 (IP) Header Length. */
> > +		uint64_t l4_len:8;        /**< L4 Header Length. */
> > +		uint64_t tso_segsz:16;    /**< TCP TSO segment size */
> > +		uint64_t outer_l2_len:8;  /**< outer L2 Header Length */
> > +		uint64_t outer_l3_len:16; /**< outer L3 Header Length */
> > +	};
> > +};
> > +
> >  /*
> >   * Structure of a 16-byte Tx descriptor common across i40e, ice, iavf and idpf
> > drivers
> >   */
> > @@ -286,8 +346,8 @@ ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
> >  	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
> > 
> >  	/* Check if descriptor is done - all drivers use 0xF as done value in bits
> > 3:0 */
> 
> I think this comment referencing 0xF is out of place now that we're not using 0xF
> rather CI_TX_DESC_DTYPE_DESC_DONE in the code below.
> 

Ack. Will look to update.

/Bruce

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v3 05/36] net/intel: create separate header for Tx scalar fns
  2026-02-06 10:23     ` Loftus, Ciara
@ 2026-02-09 11:04       ` Bruce Richardson
  0 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 11:04 UTC (permalink / raw)
  To: Loftus, Ciara; +Cc: dev@dpdk.org

On Fri, Feb 06, 2026 at 10:23:43AM +0000, Loftus, Ciara wrote:
> > Subject: [PATCH v3 05/36] net/intel: create separate header for Tx scalar fns
> > 
> > Rather than having all Tx code in the one file, which could start
> > getting rather long, move the scalar datapath functions to a new header
> > file.
> > 
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > ---
> >  drivers/net/intel/common/tx.h            | 58 ++------------------
> >  drivers/net/intel/common/tx_scalar_fns.h | 67
> > ++++++++++++++++++++++++
> >  2 files changed, 72 insertions(+), 53 deletions(-)
> >  create mode 100644 drivers/net/intel/common/tx_scalar_fns.h
> 
> Why not create the file when ci_tx_xmit_cleanup was first introduced?
> I prefer the naming tx_scalar.h but keep tx_scalar_fns.h if you feel it's better.
> 

Not pushed about the name, so happy with your suggestion. I'll try and
create this file back in patch 3 - some heavy rebasing will be necessary!

> > 
> > diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
> > index 03245d4fba..01e42303b4 100644
> > --- a/drivers/net/intel/common/tx.h
> > +++ b/drivers/net/intel/common/tx.h
> > @@ -319,59 +319,6 @@ ci_tx_free_bufs_vec(struct ci_tx_queue *txq,
> > ci_desc_done_fn desc_done, bool ctx
> >  	return txq->tx_rs_thresh;
> >  }
> > 
> > -/*
> > - * Common transmit descriptor cleanup function for Intel drivers.
> > - * Used by ice, i40e, iavf, and idpf drivers.
> > - *
> > - * Returns:
> > - *   0 on success
> > - *  -1 if cleanup cannot proceed (descriptors not yet processed by HW)
> > - */
> > -static __rte_always_inline int
> > -ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
> > -{
> > -	struct ci_tx_entry *sw_ring = txq->sw_ring;
> > -	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
> > -	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
> > -	uint16_t nb_tx_desc = txq->nb_tx_desc;
> > -	uint16_t desc_to_clean_to;
> > -	uint16_t nb_tx_to_clean;
> > -
> > -	/* Determine the last descriptor needing to be cleaned */
> > -	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
> > -	if (desc_to_clean_to >= nb_tx_desc)
> > -		desc_to_clean_to = (uint16_t)(desc_to_clean_to -
> > nb_tx_desc);
> > -
> > -	/* Check to make sure the last descriptor to clean is done */
> > -	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
> > -
> > -	/* Check if descriptor is done - all drivers use 0xF as done value in bits
> > 3:0 */
> > -	if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
> > rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
> > -			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
> > {
> > -		/* Descriptor not yet processed by hardware */
> > -		return -1;
> > -	}
> > -
> > -	/* Figure out how many descriptors will be cleaned */
> > -	if (last_desc_cleaned > desc_to_clean_to)
> > -		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned)
> > + desc_to_clean_to);
> > -	else
> > -		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
> > last_desc_cleaned);
> > -
> > -	/* The last descriptor to clean is done, so that means all the
> > -	 * descriptors from the last descriptor that was cleaned
> > -	 * up to the last descriptor with the RS bit set
> > -	 * are done. Only reset the threshold descriptor.
> > -	 */
> > -	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
> > -
> > -	/* Update the txq to reflect the last descriptor that was cleaned */
> > -	txq->last_desc_cleaned = desc_to_clean_to;
> > -	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
> > -
> > -	return 0;
> > -}
> > -
> >  static inline void
> >  ci_txq_release_all_mbufs(struct ci_tx_queue *txq, bool use_ctx)
> >  {
> > @@ -490,4 +437,9 @@ ci_tx_path_select(const struct ci_tx_path_features
> > *req_features,
> >  	return idx;
> >  }
> > 
> > +/* include the scalar functions at the end, so they can use the common
> > definitions.
> > + * This is done so drivers can use all functions just by including tx.h
> > + */
> > +#include "tx_scalar_fns.h"
> > +
> >  #endif /* _COMMON_INTEL_TX_H_ */
> > diff --git a/drivers/net/intel/common/tx_scalar_fns.h
> > b/drivers/net/intel/common/tx_scalar_fns.h
> > new file mode 100644
> > index 0000000000..c79210d084
> > --- /dev/null
> > +++ b/drivers/net/intel/common/tx_scalar_fns.h
> > @@ -0,0 +1,67 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2025 Intel Corporation
> 
> Should this be 2026?
> 

Code was written in 2025 and actually submitted upstream for the first time
in December 2025, so I think this is correct as-is. [Open to correction]

> > + */
> > +
> > +#ifndef _COMMON_INTEL_TX_SCALAR_FNS_H_
> > +#define _COMMON_INTEL_TX_SCALAR_FNS_H_
> > +
> > +#include <stdint.h>
> > +#include <rte_byteorder.h>
> > +
> > +/* depends on common Tx definitions. */
> > +#include "tx.h"
> > +
> > +/*
> > + * Common transmit descriptor cleanup function for Intel drivers.
> > + * Used by ice, i40e, iavf, and idpf drivers.
> 
> Do we need to call out the driver names in the comment? If a new driver
> were to adopt this function they would need to patch this file too to
> update the comment.
> 

Good point, I'll try and generalize

/Bruce

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v3 06/36] net/intel: add common fn to calculate needed descriptors
  2026-02-06 10:25     ` Loftus, Ciara
@ 2026-02-09 11:15       ` Bruce Richardson
  0 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 11:15 UTC (permalink / raw)
  To: Loftus, Ciara
  Cc: dev@dpdk.org, Medvedkin, Vladimir, Burakov, Anatoly, Wu, Jingjing,
	Shetty,  Praveen

On Fri, Feb 06, 2026 at 10:25:36AM +0000, Loftus, Ciara wrote:
> > Subject: [PATCH v3 06/36] net/intel: add common fn to calculate needed
> > descriptors
> > 
> > Multiple drivers used the same logic to calculate how many Tx data
> > descriptors were needed. Move that calculation to common code. In the
> > process of updating drivers, fix idpf driver calculation for the TSO
> > case.
> 
> Should this fix be split out into a separate patch with a Fixes tag?
> 
I think it would need a separate independent fix for backporting.

/Bruce

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v3 07/36] net/ice: refactor context descriptor handling
  2026-02-06 10:47     ` Loftus, Ciara
@ 2026-02-09 11:16       ` Bruce Richardson
  0 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 11:16 UTC (permalink / raw)
  To: Loftus, Ciara; +Cc: dev@dpdk.org, Burakov, Anatoly

On Fri, Feb 06, 2026 at 10:47:11AM +0000, Loftus, Ciara wrote:
> > Subject: [PATCH v3 07/36] net/ice: refactor context descriptor handling
> > 
> > Create a single function to manage all context descriptor handling,
> > which returns either 0 or 1 depending on whether a descriptor is needed
> > or not, as well as returning directly the descriptor contents if
> > relevant.
> > 
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > ---
> >  drivers/net/intel/ice/ice_rxtx.c | 104 +++++++++++++++++--------------
> >  1 file changed, 57 insertions(+), 47 deletions(-)
> > 
> > diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
> > index 2a53b614b2..cc442fed75 100644
> > --- a/drivers/net/intel/ice/ice_rxtx.c
> > +++ b/drivers/net/intel/ice/ice_rxtx.c
> > @@ -2966,10 +2966,6 @@ ice_txd_enable_checksum(uint64_t ol_flags,
> >  			uint32_t *td_offset,
> >  			union ci_tx_offload tx_offload)
> >  {
> > -	/* Set MACLEN */
> > -	if (!(ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK))
> > -		*td_offset |= (tx_offload.l2_len >> 1)
> > -			<< CI_TX_DESC_LEN_MACLEN_S;
> > 
> >  	/* Enable L3 checksum offloads */
> >  	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
> > @@ -3052,7 +3048,7 @@ ice_calc_context_desc(uint64_t flags)
> > 
> >  /* set ice TSO context descriptor */
> >  static inline uint64_t
> > -ice_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
> > +ice_set_tso_ctx(uint64_t ol_flags, const struct rte_mbuf *mbuf, union
> > ci_tx_offload tx_offload)
> >  {
> >  	uint64_t ctx_desc = 0;
> >  	uint32_t cd_cmd, hdr_len, cd_tso_len;
> > @@ -3063,7 +3059,7 @@ ice_set_tso_ctx(struct rte_mbuf *mbuf, union
> > ci_tx_offload tx_offload)
> >  	}
> > 
> >  	hdr_len = tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len;
> > -	hdr_len += (mbuf->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) ?
> > +	hdr_len += (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) ?
> >  		   tx_offload.outer_l2_len + tx_offload.outer_l3_len : 0;
> > 
> >  	cd_cmd = CI_TX_CTX_DESC_TSO;
> > @@ -3075,6 +3071,49 @@ ice_set_tso_ctx(struct rte_mbuf *mbuf, union
> > ci_tx_offload tx_offload)
> >  	return ctx_desc;
> >  }
> > 
> > +/* compute a context descriptor if one is necessary based on the ol_flags
> > + *
> > + * Returns 0 if no descriptor is necessary.
> > + * Returns 1 if one is necessary and the contents of the descriptor are
> > returned
> > + *   in the values pointed to by qw0 and qw1. td_offset may also be modified.
> 
> Regarding the comment above, "td_offset" is not a variable in this function, I
> assume the comment is obsolete.
> 

Yes, it was a parameter previously. Will fix

/Bruce


^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v3 10/36] net/intel: consolidate checksum mask definition
  2026-02-06 11:25     ` Loftus, Ciara
@ 2026-02-09 11:40       ` Bruce Richardson
  2026-02-09 15:00         ` Bruce Richardson
  0 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 11:40 UTC (permalink / raw)
  To: Loftus, Ciara
  Cc: dev@dpdk.org, Medvedkin, Vladimir, Burakov, Anatoly, Wu, Jingjing,
	Shetty,  Praveen

On Fri, Feb 06, 2026 at 11:25:36AM +0000, Loftus, Ciara wrote:
> > Subject: [PATCH v3 10/36] net/intel: consolidate checksum mask definition
> > 
> > Create a common definition for checksum masks across iavf, idpf, i40e
> > and ice drivers.
> > 
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > ---
> >  drivers/net/intel/common/tx.h             | 8 ++++++++
> >  drivers/net/intel/i40e/i40e_rxtx.c        | 7 +------
> >  drivers/net/intel/iavf/iavf_rxtx.c        | 2 +-
> >  drivers/net/intel/iavf/iavf_rxtx.h        | 8 --------
> >  drivers/net/intel/ice/ice_rxtx.c          | 8 +-------
> >  drivers/net/intel/idpf/idpf_common_rxtx.c | 4 ++--
> >  drivers/net/intel/idpf/idpf_common_rxtx.h | 7 +------
> >  7 files changed, 14 insertions(+), 30 deletions(-)
> > 
> > diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
> > index 01e42303b4..928fad1df5 100644
> > --- a/drivers/net/intel/common/tx.h
> > +++ b/drivers/net/intel/common/tx.h
> > @@ -53,6 +53,14 @@
> >  /* Common maximum data per TX descriptor */
> >  #define CI_MAX_DATA_PER_TXD     (CI_TXD_QW1_TX_BUF_SZ_M >>
> > CI_TXD_QW1_TX_BUF_SZ_S)
> > 
> > +/* Checksum offload mask to identify packets requesting offload */
> > +#define CI_TX_CKSUM_OFFLOAD_MASK (RTE_MBUF_F_TX_IP_CKSUM |
> > 		 \
> > +				   RTE_MBUF_F_TX_L4_MASK |		 \
> > +				   RTE_MBUF_F_TX_TCP_SEG |		 \
> > +				   RTE_MBUF_F_TX_UDP_SEG |		 \
> > +				   RTE_MBUF_F_TX_OUTER_IP_CKSUM |
> > 	 \
> > +				   RTE_MBUF_F_TX_OUTER_UDP_CKSUM)
> > +
> >  /**
> 
> 
> <snip>
> 
> > diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.h
> > b/drivers/net/intel/idpf/idpf_common_rxtx.h
> > index b88a87402d..fe7094d434 100644
> > --- a/drivers/net/intel/idpf/idpf_common_rxtx.h
> > +++ b/drivers/net/intel/idpf/idpf_common_rxtx.h
> > @@ -39,13 +39,8 @@
> >  #define IDPF_RLAN_CTX_DBUF_S	7
> >  #define IDPF_RX_MAX_DATA_BUF_SIZE	(16 * 1024 - 128)
> > 
> > -#define IDPF_TX_CKSUM_OFFLOAD_MASK (		\
> > -		RTE_MBUF_F_TX_IP_CKSUM |	\
> > -		RTE_MBUF_F_TX_L4_MASK |		\
> > -		RTE_MBUF_F_TX_TCP_SEG)
> > -
> >  #define IDPF_TX_OFFLOAD_MASK (			\
> > -		IDPF_TX_CKSUM_OFFLOAD_MASK |	\
> > +		CI_TX_CKSUM_OFFLOAD_MASK |	\
> 
> With this change should the features in idpf.ini be updated to include
> Inner L3/L4 checksum?
> And IDPF_TX_SCALAR_OFFLOADS update to include
> RTE_ETH_TX_OFFLOAD_OUTER_IPV4_CKSUM
> RTE_ETH_TX_OFFLOAD_OUTER_UDP_CKSUM
>

I didn't add it because I hadn't tested it, but since it's using the same
code as other drivers it's probably safe enough to add, I suppose.

/Bruce

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v3 11/36] net/intel: create common checksum Tx offload function
  2026-02-06 11:37     ` Loftus, Ciara
@ 2026-02-09 11:41       ` Bruce Richardson
  0 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 11:41 UTC (permalink / raw)
  To: Loftus, Ciara; +Cc: dev@dpdk.org, Burakov, Anatoly

On Fri, Feb 06, 2026 at 11:37:07AM +0000, Loftus, Ciara wrote:
> > Subject: [PATCH v3 11/36] net/intel: create common checksum Tx offload
> > function
> > 
> > Since i40e and ice have the same checksum offload logic, merge their
> > functions into one. Future rework should enable this to be used by more
> > drivers also.
> > 
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > ---
> >  drivers/net/intel/common/tx_scalar_fns.h | 58
> > +++++++++++++++++++++++
> >  drivers/net/intel/i40e/i40e_rxtx.c       | 52 +-------------------
> >  drivers/net/intel/ice/ice_rxtx.c         | 60 +-----------------------
> >  3 files changed, 60 insertions(+), 110 deletions(-)
> > 
> 
> <snip> 
> 
> > -
> >  /* Construct the tx flags */
> >  static inline uint64_t
> >  i40e_build_ctob(uint32_t td_cmd,
> > @@ -1167,7 +1117,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf
> > **tx_pkts, uint16_t nb_pkts)
> > 
> >  		/* Enable checksum offloading */
> >  		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
> > -			i40e_txd_enable_checksum(ol_flags, &td_cmd,
> > +			ci_txd_enable_checksum(ol_flags, &td_cmd,
> >  						 &td_offset, tx_offload);
> 
> Now that it uses the common function which handles
> RTE_MBUF_F_TX_UDP_SEG this means the scalar path now supports the
> offload RTE_ETH_TX_OFFLOAD_UDP_TSO so I think it should be added to
> the I40E_TX_SCALAR_OFFLOADS. It seems to be missing from the device
> capabilities too. Same for ice.
> 
Ack.

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v3 10/36] net/intel: consolidate checksum mask definition
  2026-02-09 11:40       ` Bruce Richardson
@ 2026-02-09 15:00         ` Bruce Richardson
  0 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 15:00 UTC (permalink / raw)
  To: Loftus, Ciara
  Cc: dev@dpdk.org, Medvedkin, Vladimir, Burakov, Anatoly, Wu, Jingjing,
	Shetty,  Praveen

On Mon, Feb 09, 2026 at 11:40:45AM +0000, Bruce Richardson wrote:
> On Fri, Feb 06, 2026 at 11:25:36AM +0000, Loftus, Ciara wrote:
> > > Subject: [PATCH v3 10/36] net/intel: consolidate checksum mask definition
> > > 
> > > Create a common definition for checksum masks across iavf, idpf, i40e
> > > and ice drivers.
> > > 
> > > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > > ---
> > >  drivers/net/intel/common/tx.h             | 8 ++++++++
> > >  drivers/net/intel/i40e/i40e_rxtx.c        | 7 +------
> > >  drivers/net/intel/iavf/iavf_rxtx.c        | 2 +-
> > >  drivers/net/intel/iavf/iavf_rxtx.h        | 8 --------
> > >  drivers/net/intel/ice/ice_rxtx.c          | 8 +-------
> > >  drivers/net/intel/idpf/idpf_common_rxtx.c | 4 ++--
> > >  drivers/net/intel/idpf/idpf_common_rxtx.h | 7 +------
> > >  7 files changed, 14 insertions(+), 30 deletions(-)
> > > 
> > > diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
> > > index 01e42303b4..928fad1df5 100644
> > > --- a/drivers/net/intel/common/tx.h
> > > +++ b/drivers/net/intel/common/tx.h
> > > @@ -53,6 +53,14 @@
> > >  /* Common maximum data per TX descriptor */
> > >  #define CI_MAX_DATA_PER_TXD     (CI_TXD_QW1_TX_BUF_SZ_M >>
> > > CI_TXD_QW1_TX_BUF_SZ_S)
> > > 
> > > +/* Checksum offload mask to identify packets requesting offload */
> > > +#define CI_TX_CKSUM_OFFLOAD_MASK (RTE_MBUF_F_TX_IP_CKSUM |
> > > 		 \
> > > +				   RTE_MBUF_F_TX_L4_MASK |		 \
> > > +				   RTE_MBUF_F_TX_TCP_SEG |		 \
> > > +				   RTE_MBUF_F_TX_UDP_SEG |		 \
> > > +				   RTE_MBUF_F_TX_OUTER_IP_CKSUM |
> > > 	 \
> > > +				   RTE_MBUF_F_TX_OUTER_UDP_CKSUM)
> > > +
> > >  /**
> > 
> > 
> > <snip>
> > 
> > > diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.h
> > > b/drivers/net/intel/idpf/idpf_common_rxtx.h
> > > index b88a87402d..fe7094d434 100644
> > > --- a/drivers/net/intel/idpf/idpf_common_rxtx.h
> > > +++ b/drivers/net/intel/idpf/idpf_common_rxtx.h
> > > @@ -39,13 +39,8 @@
> > >  #define IDPF_RLAN_CTX_DBUF_S	7
> > >  #define IDPF_RX_MAX_DATA_BUF_SIZE	(16 * 1024 - 128)
> > > 
> > > -#define IDPF_TX_CKSUM_OFFLOAD_MASK (		\
> > > -		RTE_MBUF_F_TX_IP_CKSUM |	\
> > > -		RTE_MBUF_F_TX_L4_MASK |		\
> > > -		RTE_MBUF_F_TX_TCP_SEG)
> > > -
> > >  #define IDPF_TX_OFFLOAD_MASK (			\
> > > -		IDPF_TX_CKSUM_OFFLOAD_MASK |	\
> > > +		CI_TX_CKSUM_OFFLOAD_MASK |	\
> > 
> > With this change should the features in idpf.ini be updated to include
> > Inner L3/L4 checksum?
> > And IDPF_TX_SCALAR_OFFLOADS update to include
> > RTE_ETH_TX_OFFLOAD_OUTER_IPV4_CKSUM
> > RTE_ETH_TX_OFFLOAD_OUTER_UDP_CKSUM
> >
> 
> I didn't add it because I hadn't tested it, but since it's using the same
> code as other drivers it's probably safe enough to add, I suppose.
>
Changed my mind here, as I am doing the rework on this set. Leaving this
as-is for now until I get to validate it.

/Bruce 

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [PATCH v4 00/35]  combine multiple Intel scalar Tx paths
  2025-12-19 17:25 [RFC PATCH 00/27] combine multiple Intel scalar Tx paths Bruce Richardson
                   ` (28 preceding siblings ...)
  2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
@ 2026-02-09 16:44 ` Bruce Richardson
  2026-02-09 16:44   ` [PATCH v4 01/35] net/intel: create common Tx descriptor structure Bruce Richardson
                     ` (34 more replies)
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
  30 siblings, 35 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 16:44 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

The scalar Tx paths, with support for offloads and multiple mbufs
per packet, are almost identical across drivers ice, i40e, iavf and
the single-queue mode of idpf. Therefore, we can do some rework to
combine these code paths into a single function which is parameterized
by compile-time constants, allowing code saving to give us a single
path to optimize and maintain - apart from edge cases like IPSec
support in iavf.

The ixgbe driver has a number of similarities too, which we take
advantage of where we can, but the overall descriptor format is
sufficiently different that its main scalar code path is kept
separate.

Once merged, we can then optimize the drivers a bit to improve
performance, and also easily extend some drivers to use additional
paths for better performance, e.g. add the "simple scalar" path
to IDPF driver for better performance on platforms without AVX.

V4:
- Updates following review:
  - merged patches 3 and 5
  - renamed new file to tx_scalar.h
  - added UDP_TSO flag to drivers
  - other minor fixups

V3:
- rebase on top of latest next-net-intel tree
- fix issues with iavf and cpfl drivers seen in some testing

V2:
 - reworked the simple-scalar path as well as full scalar one
 - added simple scalar path support to idpf driver
 - small cleanups, e.g. issues flagged by checkpatch


Bruce Richardson (35):
  net/intel: create common Tx descriptor structure
  net/intel: use common Tx ring structure
  net/intel: create common post-Tx cleanup function
  net/intel: consolidate definitions for Tx desc fields
  net/intel: add common fn to calculate needed descriptors
  net/ice: refactor context descriptor handling
  net/i40e: refactor context descriptor handling
  net/idpf: refactor context descriptor handling
  net/intel: consolidate checksum mask definition
  net/intel: create common checksum Tx offload function
  net/intel: create a common scalar Tx function
  net/i40e: use common scalar Tx function
  net/intel: add IPsec hooks to common Tx function
  net/intel: support configurable VLAN tag insertion on Tx
  net/iavf: use common scalar Tx function
  net/i40e: document requirement for QinQ support
  net/idpf: use common scalar Tx function
  net/intel: avoid writing the final pkt descriptor twice
  eal: add macro for marking assumed alignment
  net/intel: write descriptors using non-volatile pointers
  net/intel: remove unnecessary flag clearing
  net/intel: mark mid-burst ring cleanup as unlikely
  net/intel: add special handling for single desc packets
  net/intel: use separate array for desc status tracking
  net/ixgbe: use separate array for desc status tracking
  net/intel: drop unused Tx queue used count
  net/intel: remove index for tracking end of packet
  net/intel: merge ring writes in simple Tx for ice and i40e
  net/intel: consolidate ice and i40e buffer free function
  net/intel: complete merging simple Tx paths
  net/intel: use non-volatile stores in simple Tx function
  net/intel: align scalar simple Tx path with vector logic
  net/intel: use vector SW ring entry for simple path
  net/intel: use vector mbuf cleanup from simple scalar path
  net/idpf: enable simple Tx function

 doc/guides/nics/i40e.rst                      |  18 +
 drivers/net/intel/common/tx.h                 | 117 ++-
 drivers/net/intel/common/tx_scalar.h          | 593 ++++++++++++++
 drivers/net/intel/cpfl/cpfl_rxtx.c            |  25 +-
 drivers/net/intel/i40e/i40e_fdir.c            |  34 +-
 drivers/net/intel/i40e/i40e_rxtx.c            | 673 +++-------------
 drivers/net/intel/i40e/i40e_rxtx.h            |  17 +-
 .../net/intel/i40e/i40e_rxtx_vec_altivec.c    |  25 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c   |  36 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c |  52 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_common.h |   6 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_neon.c   |  25 +-
 drivers/net/intel/iavf/iavf_rxtx.c            | 642 ++++-----------
 drivers/net/intel/iavf/iavf_rxtx.h            |  31 +-
 drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c   |  55 +-
 drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c | 104 +--
 drivers/net/intel/iavf/iavf_rxtx_vec_common.h |  36 +-
 drivers/net/intel/ice/ice_dcf_ethdev.c        |  10 +-
 drivers/net/intel/ice/ice_rxtx.c              | 740 ++++--------------
 drivers/net/intel/ice/ice_rxtx.h              |  16 +-
 drivers/net/intel/ice/ice_rxtx_vec_avx2.c     |  55 +-
 drivers/net/intel/ice/ice_rxtx_vec_avx512.c   |  53 +-
 drivers/net/intel/ice/ice_rxtx_vec_common.h   |  43 +-
 drivers/net/intel/idpf/idpf_common_device.h   |   2 +
 drivers/net/intel/idpf/idpf_common_rxtx.c     | 314 ++------
 drivers/net/intel/idpf/idpf_common_rxtx.h     |  24 +-
 .../net/intel/idpf/idpf_common_rxtx_avx2.c    |  53 +-
 .../net/intel/idpf/idpf_common_rxtx_avx512.c  |  55 +-
 drivers/net/intel/idpf/idpf_rxtx.c            |  43 +-
 drivers/net/intel/idpf/idpf_rxtx_vec_common.h |   6 +-
 drivers/net/intel/ixgbe/ixgbe_rxtx.c          | 103 ++-
 .../net/intel/ixgbe/ixgbe_rxtx_vec_common.c   |   3 +-
 lib/eal/include/rte_common.h                  |   6 +
 33 files changed, 1579 insertions(+), 2436 deletions(-)
 create mode 100644 drivers/net/intel/common/tx_scalar.h

--
2.51.0


^ permalink raw reply	[flat|nested] 274+ messages in thread

* [PATCH v4 01/35] net/intel: create common Tx descriptor structure
  2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
@ 2026-02-09 16:44   ` Bruce Richardson
  2026-02-09 16:45   ` [PATCH v4 02/35] net/intel: use common Tx ring structure Bruce Richardson
                     ` (33 subsequent siblings)
  34 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 16:44 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Ciara Loftus, Praveen Shetty,
	Vladimir Medvedkin, Anatoly Burakov, Jingjing Wu

The Tx descriptors used by the i40e, iavf, ice and idpf drivers are all
identical 16-byte descriptors, so define a common struct for them. Since
original struct definitions are in base code, leave them in place, but
only use the new structs in DPDK code.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Ciara Loftus <ciara.loftus@intel.com>
---
 drivers/net/intel/common/tx.h                 | 16 ++++++---
 drivers/net/intel/cpfl/cpfl_rxtx.c            |  2 +-
 drivers/net/intel/i40e/i40e_fdir.c            |  4 +--
 drivers/net/intel/i40e/i40e_rxtx.c            | 26 +++++++-------
 .../net/intel/i40e/i40e_rxtx_vec_altivec.c    |  6 ++--
 drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c   |  6 ++--
 drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c |  6 ++--
 drivers/net/intel/i40e/i40e_rxtx_vec_neon.c   |  6 ++--
 drivers/net/intel/iavf/iavf_rxtx.c            | 16 ++++-----
 drivers/net/intel/iavf/iavf_rxtx.h            |  2 +-
 drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c   |  6 ++--
 drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c | 12 +++----
 drivers/net/intel/ice/ice_dcf_ethdev.c        |  2 +-
 drivers/net/intel/ice/ice_rxtx.c              | 36 +++++++++----------
 drivers/net/intel/ice/ice_rxtx_vec_avx2.c     |  6 ++--
 drivers/net/intel/ice/ice_rxtx_vec_avx512.c   |  6 ++--
 drivers/net/intel/idpf/idpf_common_rxtx.c     | 20 +++++------
 drivers/net/intel/idpf/idpf_common_rxtx.h     |  2 +-
 .../net/intel/idpf/idpf_common_rxtx_avx2.c    |  8 ++---
 .../net/intel/idpf/idpf_common_rxtx_avx512.c  |  8 ++---
 drivers/net/intel/idpf/idpf_rxtx.c            |  2 +-
 drivers/net/intel/idpf/idpf_rxtx_vec_common.h |  2 +-
 22 files changed, 104 insertions(+), 96 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index e295d83e3a..d7561a2bbb 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -10,6 +10,14 @@
 #include <rte_ethdev.h>
 #include <rte_vect.h>
 
+/*
+ * Structure of a 16-byte Tx descriptor common across i40e, ice, iavf and idpf drivers
+ */
+struct ci_tx_desc {
+	uint64_t buffer_addr; /* Address of descriptor's data buf */
+	uint64_t cmd_type_offset_bsz;
+};
+
 /* forward declaration of the common intel (ci) queue structure */
 struct ci_tx_queue;
 
@@ -33,10 +41,10 @@ typedef void (*ice_tx_release_mbufs_t)(struct ci_tx_queue *txq);
 
 struct ci_tx_queue {
 	union { /* TX ring virtual address */
-		volatile struct i40e_tx_desc *i40e_tx_ring;
-		volatile struct iavf_tx_desc *iavf_tx_ring;
-		volatile struct ice_tx_desc *ice_tx_ring;
-		volatile struct idpf_base_tx_desc *idpf_tx_ring;
+		volatile struct ci_tx_desc *i40e_tx_ring;
+		volatile struct ci_tx_desc *iavf_tx_ring;
+		volatile struct ci_tx_desc *ice_tx_ring;
+		volatile struct ci_tx_desc *idpf_tx_ring;
 		volatile union ixgbe_adv_tx_desc *ixgbe_tx_ring;
 	};
 	volatile uint8_t *qtx_tail;               /* register address of tail */
diff --git a/drivers/net/intel/cpfl/cpfl_rxtx.c b/drivers/net/intel/cpfl/cpfl_rxtx.c
index d0438b5da0..78bc3e9b49 100644
--- a/drivers/net/intel/cpfl/cpfl_rxtx.c
+++ b/drivers/net/intel/cpfl/cpfl_rxtx.c
@@ -131,7 +131,7 @@ cpfl_dma_zone_reserve(struct rte_eth_dev *dev, uint16_t queue_idx,
 			ring_size = RTE_ALIGN(len * sizeof(struct idpf_flex_tx_sched_desc),
 					      CPFL_DMA_MEM_ALIGN);
 		else
-			ring_size = RTE_ALIGN(len * sizeof(struct idpf_base_tx_desc),
+			ring_size = RTE_ALIGN(len * sizeof(struct ci_tx_desc),
 					      CPFL_DMA_MEM_ALIGN);
 		memcpy(ring_name, "cpfl Tx ring", sizeof("cpfl Tx ring"));
 		break;
diff --git a/drivers/net/intel/i40e/i40e_fdir.c b/drivers/net/intel/i40e/i40e_fdir.c
index 55d18c5d4a..605df73c9e 100644
--- a/drivers/net/intel/i40e/i40e_fdir.c
+++ b/drivers/net/intel/i40e/i40e_fdir.c
@@ -1377,7 +1377,7 @@ i40e_find_available_buffer(struct rte_eth_dev *dev)
 	 */
 	if (fdir_info->txq_available_buf_count <= 0) {
 		uint16_t tmp_tail;
-		volatile struct i40e_tx_desc *tmp_txdp;
+		volatile struct ci_tx_desc *tmp_txdp;
 
 		tmp_tail = txq->tx_tail;
 		tmp_txdp = &txq->i40e_tx_ring[tmp_tail + 1];
@@ -1628,7 +1628,7 @@ i40e_flow_fdir_filter_programming(struct i40e_pf *pf,
 	struct ci_tx_queue *txq = pf->fdir.txq;
 	struct ci_rx_queue *rxq = pf->fdir.rxq;
 	const struct i40e_fdir_action *fdir_action = &filter->action;
-	volatile struct i40e_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	volatile struct i40e_filter_program_desc *fdirdp;
 	uint32_t td_cmd;
 	uint16_t vsi_id;
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 1c3586778c..92d49ccb79 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -388,7 +388,7 @@ static inline int
 i40e_xmit_cleanup(struct ci_tx_queue *txq)
 {
 	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	volatile struct i40e_tx_desc *txd = txq->i40e_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->i40e_tx_ring;
 	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
 	uint16_t nb_tx_desc = txq->nb_tx_desc;
 	uint16_t desc_to_clean_to;
@@ -1092,8 +1092,8 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	struct ci_tx_queue *txq;
 	struct ci_tx_entry *sw_ring;
 	struct ci_tx_entry *txe, *txn;
-	volatile struct i40e_tx_desc *txd;
-	volatile struct i40e_tx_desc *txr;
+	volatile struct ci_tx_desc *txd;
+	volatile struct ci_tx_desc *txr;
 	struct rte_mbuf *tx_pkt;
 	struct rte_mbuf *m_seg;
 	uint32_t cd_tunneling_params;
@@ -1398,7 +1398,7 @@ i40e_tx_free_bufs(struct ci_tx_queue *txq)
 
 /* Populate 4 descriptors with data from 4 mbufs */
 static inline void
-tx4(volatile struct i40e_tx_desc *txdp, struct rte_mbuf **pkts)
+tx4(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
 {
 	uint64_t dma_addr;
 	uint32_t i;
@@ -1414,7 +1414,7 @@ tx4(volatile struct i40e_tx_desc *txdp, struct rte_mbuf **pkts)
 
 /* Populate 1 descriptor with data from 1 mbuf */
 static inline void
-tx1(volatile struct i40e_tx_desc *txdp, struct rte_mbuf **pkts)
+tx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
 {
 	uint64_t dma_addr;
 
@@ -1431,7 +1431,7 @@ i40e_tx_fill_hw_ring(struct ci_tx_queue *txq,
 		     struct rte_mbuf **pkts,
 		     uint16_t nb_pkts)
 {
-	volatile struct i40e_tx_desc *txdp = &txq->i40e_tx_ring[txq->tx_tail];
+	volatile struct ci_tx_desc *txdp = &txq->i40e_tx_ring[txq->tx_tail];
 	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
 	const int N_PER_LOOP = 4;
 	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
@@ -1459,7 +1459,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
 	     uint16_t nb_pkts)
 {
-	volatile struct i40e_tx_desc *txr = txq->i40e_tx_ring;
+	volatile struct ci_tx_desc *txr = txq->i40e_tx_ring;
 	uint16_t n = 0;
 
 	/**
@@ -2616,7 +2616,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	}
 
 	/* Allocate TX hardware ring descriptors. */
-	ring_size = sizeof(struct i40e_tx_desc) * I40E_MAX_RING_DESC;
+	ring_size = sizeof(struct ci_tx_desc) * I40E_MAX_RING_DESC;
 	ring_size = RTE_ALIGN(ring_size, I40E_DMA_MEM_ALIGN);
 	tz = rte_eth_dma_zone_reserve(dev, "i40e_tx_ring", queue_idx,
 			      ring_size, I40E_RING_BASE_ALIGN, socket_id);
@@ -2640,7 +2640,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	txq->tx_deferred_start = tx_conf->tx_deferred_start;
 
 	txq->tx_ring_dma = tz->iova;
-	txq->i40e_tx_ring = (struct i40e_tx_desc *)tz->addr;
+	txq->i40e_tx_ring = (struct ci_tx_desc *)tz->addr;
 
 	/* Allocate software ring */
 	txq->sw_ring =
@@ -2913,13 +2913,13 @@ i40e_reset_tx_queue(struct ci_tx_queue *txq)
 	}
 
 	txe = txq->sw_ring;
-	size = sizeof(struct i40e_tx_desc) * txq->nb_tx_desc;
+	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
 		((volatile char *)txq->i40e_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		volatile struct i40e_tx_desc *txd = &txq->i40e_tx_ring[i];
+		volatile struct ci_tx_desc *txd = &txq->i40e_tx_ring[i];
 
 		txd->cmd_type_offset_bsz =
 			rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE);
@@ -3221,7 +3221,7 @@ i40e_fdir_setup_tx_resources(struct i40e_pf *pf)
 	}
 
 	/* Allocate TX hardware ring descriptors. */
-	ring_size = sizeof(struct i40e_tx_desc) * I40E_FDIR_NUM_TX_DESC;
+	ring_size = sizeof(struct ci_tx_desc) * I40E_FDIR_NUM_TX_DESC;
 	ring_size = RTE_ALIGN(ring_size, I40E_DMA_MEM_ALIGN);
 
 	tz = rte_eth_dma_zone_reserve(dev, "fdir_tx_ring",
@@ -3240,7 +3240,7 @@ i40e_fdir_setup_tx_resources(struct i40e_pf *pf)
 	txq->i40e_vsi = pf->fdir.fdir_vsi;
 
 	txq->tx_ring_dma = tz->iova;
-	txq->i40e_tx_ring = (struct i40e_tx_desc *)tz->addr;
+	txq->i40e_tx_ring = (struct ci_tx_desc *)tz->addr;
 
 	/*
 	 * don't need to allocate software ring and reset for the fdir
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
index bbb6d907cf..ef5b252898 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
@@ -446,7 +446,7 @@ i40e_recv_scattered_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 }
 
 static inline void
-vtx1(volatile struct i40e_tx_desc *txdp,
+vtx1(volatile struct ci_tx_desc *txdp,
 	struct rte_mbuf *pkt, uint64_t flags)
 {
 	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
@@ -459,7 +459,7 @@ vtx1(volatile struct i40e_tx_desc *txdp,
 }
 
 static inline void
-vtx(volatile struct i40e_tx_desc *txdp,
+vtx(volatile struct ci_tx_desc *txdp,
 	struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
 	int i;
@@ -473,7 +473,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 			  uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct i40e_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
index 4e398b3140..137c1f9765 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
@@ -681,7 +681,7 @@ i40e_recv_scattered_pkts_vec_avx2(void *rx_queue, struct rte_mbuf **rx_pkts,
 
 
 static inline void
-vtx1(volatile struct i40e_tx_desc *txdp,
+vtx1(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf *pkt, uint64_t flags)
 {
 	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
@@ -694,7 +694,7 @@ vtx1(volatile struct i40e_tx_desc *txdp,
 }
 
 static inline void
-vtx(volatile struct i40e_tx_desc *txdp,
+vtx(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
 	const uint64_t hi_qw_tmpl = (I40E_TX_DESC_DTYPE_DATA |
@@ -739,7 +739,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 			  uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct i40e_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
index 571987d27a..6971488750 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
@@ -750,7 +750,7 @@ i40e_recv_scattered_pkts_vec_avx512(void *rx_queue,
 }
 
 static inline void
-vtx1(volatile struct i40e_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags)
+vtx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags)
 {
 	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
 		((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT) |
@@ -762,7 +762,7 @@ vtx1(volatile struct i40e_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags)
 }
 
 static inline void
-vtx(volatile struct i40e_tx_desc *txdp,
+vtx(volatile struct ci_tx_desc *txdp,
 	struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
 	const uint64_t hi_qw_tmpl = (I40E_TX_DESC_DTYPE_DATA |
@@ -807,7 +807,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 				 uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct i40e_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
index b5be0c1b59..6404b70c56 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
@@ -597,7 +597,7 @@ i40e_recv_scattered_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 }
 
 static inline void
-vtx1(volatile struct i40e_tx_desc *txdp,
+vtx1(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf *pkt, uint64_t flags)
 {
 	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
@@ -609,7 +609,7 @@ vtx1(volatile struct i40e_tx_desc *txdp,
 }
 
 static inline void
-vtx(volatile struct i40e_tx_desc *txdp, struct rte_mbuf **pkt,
+vtx(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkt,
 		uint16_t nb_pkts,  uint64_t flags)
 {
 	int i;
@@ -623,7 +623,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict tx_queue,
 	struct rte_mbuf **__rte_restrict tx_pkts, uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct i40e_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index 4b763627bc..e4421a9932 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -267,7 +267,7 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	}
 
 	txe = txq->sw_ring;
-	size = sizeof(struct iavf_tx_desc) * txq->nb_tx_desc;
+	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
 		((volatile char *)txq->iavf_tx_ring)[i] = 0;
 
@@ -827,7 +827,7 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	}
 
 	/* Allocate TX hardware ring descriptors. */
-	ring_size = sizeof(struct iavf_tx_desc) * IAVF_MAX_RING_DESC;
+	ring_size = sizeof(struct ci_tx_desc) * IAVF_MAX_RING_DESC;
 	ring_size = RTE_ALIGN(ring_size, IAVF_DMA_MEM_ALIGN);
 	mz = rte_eth_dma_zone_reserve(dev, "iavf_tx_ring", queue_idx,
 				      ring_size, IAVF_RING_BASE_ALIGN,
@@ -839,7 +839,7 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
 		return -ENOMEM;
 	}
 	txq->tx_ring_dma = mz->iova;
-	txq->iavf_tx_ring = (struct iavf_tx_desc *)mz->addr;
+	txq->iavf_tx_ring = (struct ci_tx_desc *)mz->addr;
 
 	txq->mz = mz;
 	reset_tx_queue(txq);
@@ -2333,7 +2333,7 @@ iavf_xmit_cleanup(struct ci_tx_queue *txq)
 	uint16_t desc_to_clean_to;
 	uint16_t nb_tx_to_clean;
 
-	volatile struct iavf_tx_desc *txd = txq->iavf_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->iavf_tx_ring;
 
 	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
 	if (desc_to_clean_to >= nb_tx_desc)
@@ -2723,7 +2723,7 @@ iavf_calc_pkt_desc(struct rte_mbuf *tx_pkt)
 }
 
 static inline void
-iavf_fill_data_desc(volatile struct iavf_tx_desc *desc,
+iavf_fill_data_desc(volatile struct ci_tx_desc *desc,
 	uint64_t desc_template,	uint16_t buffsz,
 	uint64_t buffer_addr)
 {
@@ -2756,7 +2756,7 @@ uint16_t
 iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = tx_queue;
-	volatile struct iavf_tx_desc *txr = txq->iavf_tx_ring;
+	volatile struct ci_tx_desc *txr = txq->iavf_tx_ring;
 	struct ci_tx_entry *txe_ring = txq->sw_ring;
 	struct ci_tx_entry *txe, *txn;
 	struct rte_mbuf *mb, *mb_seg;
@@ -2774,7 +2774,7 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	txe = &txe_ring[desc_idx];
 
 	for (idx = 0; idx < nb_pkts; idx++) {
-		volatile struct iavf_tx_desc *ddesc;
+		volatile struct ci_tx_desc *ddesc;
 		struct iavf_ipsec_crypto_pkt_metadata *ipsec_md;
 
 		uint16_t nb_desc_ctx, nb_desc_ipsec;
@@ -2895,7 +2895,7 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		mb_seg = mb;
 
 		do {
-			ddesc = (volatile struct iavf_tx_desc *)
+			ddesc = (volatile struct ci_tx_desc *)
 					&txr[desc_idx];
 
 			txn = &txe_ring[txe->next_id];
diff --git a/drivers/net/intel/iavf/iavf_rxtx.h b/drivers/net/intel/iavf/iavf_rxtx.h
index e1f78dcde0..dd6d884fc1 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.h
+++ b/drivers/net/intel/iavf/iavf_rxtx.h
@@ -678,7 +678,7 @@ void iavf_dump_tx_descriptor(const struct ci_tx_queue *txq,
 			    const volatile void *desc, uint16_t tx_id)
 {
 	const char *name;
-	const volatile struct iavf_tx_desc *tx_desc = desc;
+	const volatile struct ci_tx_desc *tx_desc = desc;
 	enum iavf_tx_desc_dtype_value type;
 
 
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
index e29958e0bc..5b62d51cf7 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
@@ -1630,7 +1630,7 @@ iavf_recv_scattered_pkts_vec_avx2_flex_rxd_offload(void *rx_queue,
 
 
 static __rte_always_inline void
-iavf_vtx1(volatile struct iavf_tx_desc *txdp,
+iavf_vtx1(volatile struct ci_tx_desc *txdp,
 	  struct rte_mbuf *pkt, uint64_t flags, bool offload, uint8_t vlan_flag)
 {
 	uint64_t high_qw =
@@ -1646,7 +1646,7 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp,
 }
 
 static __rte_always_inline void
-iavf_vtx(volatile struct iavf_tx_desc *txdp,
+iavf_vtx(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags, bool offload, uint8_t vlan_flag)
 {
 	const uint64_t hi_qw_tmpl = (IAVF_TX_DESC_DTYPE_DATA |
@@ -1713,7 +1713,7 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 			       uint16_t nb_pkts, bool offload)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct iavf_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	/* bit2 is reserved and must be set to 1 according to Spec */
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c b/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
index 7c0907b7cf..d79d96c7b7 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
@@ -1840,7 +1840,7 @@ tx_backlog_entry_avx512(struct ci_tx_entry_vec *txep,
 }
 
 static __rte_always_inline void
-iavf_vtx1(volatile struct iavf_tx_desc *txdp,
+iavf_vtx1(volatile struct ci_tx_desc *txdp,
 	  struct rte_mbuf *pkt, uint64_t flags,
 	  bool offload, uint8_t vlan_flag)
 {
@@ -1859,7 +1859,7 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp,
 #define IAVF_TX_LEN_MASK 0xAA
 #define IAVF_TX_OFF_MASK 0x55
 static __rte_always_inline void
-iavf_vtx(volatile struct iavf_tx_desc *txdp,
+iavf_vtx(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags,
 		bool offload, uint8_t vlan_flag)
 {
@@ -2068,7 +2068,7 @@ iavf_fill_ctx_desc_tunnelling_field(volatile uint64_t *qw0,
 }
 
 static __rte_always_inline void
-ctx_vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt,
+ctx_vtx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf *pkt,
 		uint64_t flags, bool offload, uint8_t vlan_flag)
 {
 	uint64_t high_ctx_qw = IAVF_TX_DESC_DTYPE_CONTEXT;
@@ -2106,7 +2106,7 @@ ctx_vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt,
 }
 
 static __rte_always_inline void
-ctx_vtx(volatile struct iavf_tx_desc *txdp,
+ctx_vtx(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags,
 		bool offload, uint8_t vlan_flag)
 {
@@ -2203,7 +2203,7 @@ iavf_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 				 uint16_t nb_pkts, bool offload)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct iavf_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	/* bit2 is reserved and must be set to 1 according to Spec */
@@ -2271,7 +2271,7 @@ iavf_xmit_fixed_burst_vec_avx512_ctx(void *tx_queue, struct rte_mbuf **tx_pkts,
 				 uint16_t nb_pkts, bool offload)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct iavf_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, nb_mbuf, tx_id;
 	/* bit2 is reserved and must be set to 1 according to Spec */
diff --git a/drivers/net/intel/ice/ice_dcf_ethdev.c b/drivers/net/intel/ice/ice_dcf_ethdev.c
index 81da5a4656..ab1d499cef 100644
--- a/drivers/net/intel/ice/ice_dcf_ethdev.c
+++ b/drivers/net/intel/ice/ice_dcf_ethdev.c
@@ -399,7 +399,7 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	}
 
 	txe = txq->sw_ring;
-	size = sizeof(struct ice_tx_desc) * txq->nb_tx_desc;
+	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
 		((volatile char *)txq->ice_tx_ring)[i] = 0;
 
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index f3bc79423d..74b80e7df3 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -1115,13 +1115,13 @@ ice_reset_tx_queue(struct ci_tx_queue *txq)
 	}
 
 	txe = txq->sw_ring;
-	size = sizeof(struct ice_tx_desc) * txq->nb_tx_desc;
+	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
 		((volatile char *)txq->ice_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		volatile struct ice_tx_desc *txd = &txq->ice_tx_ring[i];
+		volatile struct ci_tx_desc *txd = &txq->ice_tx_ring[i];
 
 		txd->cmd_type_offset_bsz =
 			rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE);
@@ -1623,7 +1623,7 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
 	}
 
 	/* Allocate TX hardware ring descriptors. */
-	ring_size = sizeof(struct ice_tx_desc) * ICE_MAX_NUM_DESC_BY_MAC(hw);
+	ring_size = sizeof(struct ci_tx_desc) * ICE_MAX_NUM_DESC_BY_MAC(hw);
 	ring_size = RTE_ALIGN(ring_size, ICE_DMA_MEM_ALIGN);
 	tz = rte_eth_dma_zone_reserve(dev, "ice_tx_ring", queue_idx,
 				      ring_size, ICE_RING_BASE_ALIGN,
@@ -2619,7 +2619,7 @@ ice_fdir_setup_tx_resources(struct ice_pf *pf)
 	}
 
 	/* Allocate TX hardware ring descriptors. */
-	ring_size = sizeof(struct ice_tx_desc) * ICE_FDIR_NUM_TX_DESC;
+	ring_size = sizeof(struct ci_tx_desc) * ICE_FDIR_NUM_TX_DESC;
 	ring_size = RTE_ALIGN(ring_size, ICE_DMA_MEM_ALIGN);
 
 	tz = rte_eth_dma_zone_reserve(dev, "fdir_tx_ring",
@@ -2638,7 +2638,7 @@ ice_fdir_setup_tx_resources(struct ice_pf *pf)
 	txq->ice_vsi = pf->fdir.fdir_vsi;
 
 	txq->tx_ring_dma = tz->iova;
-	txq->ice_tx_ring = (struct ice_tx_desc *)tz->addr;
+	txq->ice_tx_ring = (struct ci_tx_desc *)tz->addr;
 	/*
 	 * don't need to allocate software ring and reset for the fdir
 	 * program queue just set the queue has been configured.
@@ -3027,7 +3027,7 @@ static inline int
 ice_xmit_cleanup(struct ci_tx_queue *txq)
 {
 	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	volatile struct ice_tx_desc *txd = txq->ice_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->ice_tx_ring;
 	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
 	uint16_t nb_tx_desc = txq->nb_tx_desc;
 	uint16_t desc_to_clean_to;
@@ -3148,8 +3148,8 @@ uint16_t
 ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq;
-	volatile struct ice_tx_desc *ice_tx_ring;
-	volatile struct ice_tx_desc *txd;
+	volatile struct ci_tx_desc *ice_tx_ring;
+	volatile struct ci_tx_desc *txd;
 	struct ci_tx_entry *sw_ring;
 	struct ci_tx_entry *txe, *txn;
 	struct rte_mbuf *tx_pkt;
@@ -3312,7 +3312,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 			while ((ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) &&
 				unlikely(slen > ICE_MAX_DATA_PER_TXD)) {
-				txd->buf_addr = rte_cpu_to_le_64(buf_dma_addr);
+				txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
 				txd->cmd_type_offset_bsz =
 				rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DATA |
 				((uint64_t)td_cmd << ICE_TXD_QW1_CMD_S) |
@@ -3331,7 +3331,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 				txn = &sw_ring[txe->next_id];
 			}
 
-			txd->buf_addr = rte_cpu_to_le_64(buf_dma_addr);
+			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
 			txd->cmd_type_offset_bsz =
 				rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DATA |
 				((uint64_t)td_cmd << ICE_TXD_QW1_CMD_S) |
@@ -3563,14 +3563,14 @@ ice_tx_done_cleanup(void *txq, uint32_t free_cnt)
 
 /* Populate 4 descriptors with data from 4 mbufs */
 static inline void
-tx4(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkts)
+tx4(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
 {
 	uint64_t dma_addr;
 	uint32_t i;
 
 	for (i = 0; i < 4; i++, txdp++, pkts++) {
 		dma_addr = rte_mbuf_data_iova(*pkts);
-		txdp->buf_addr = rte_cpu_to_le_64(dma_addr);
+		txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
 		txdp->cmd_type_offset_bsz =
 			ice_build_ctob((uint32_t)ICE_TD_CMD, 0,
 				       (*pkts)->data_len, 0);
@@ -3579,12 +3579,12 @@ tx4(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkts)
 
 /* Populate 1 descriptor with data from 1 mbuf */
 static inline void
-tx1(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkts)
+tx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
 {
 	uint64_t dma_addr;
 
 	dma_addr = rte_mbuf_data_iova(*pkts);
-	txdp->buf_addr = rte_cpu_to_le_64(dma_addr);
+	txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
 	txdp->cmd_type_offset_bsz =
 		ice_build_ctob((uint32_t)ICE_TD_CMD, 0,
 			       (*pkts)->data_len, 0);
@@ -3594,7 +3594,7 @@ static inline void
 ice_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
 		    uint16_t nb_pkts)
 {
-	volatile struct ice_tx_desc *txdp = &txq->ice_tx_ring[txq->tx_tail];
+	volatile struct ci_tx_desc *txdp = &txq->ice_tx_ring[txq->tx_tail];
 	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
 	const int N_PER_LOOP = 4;
 	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
@@ -3627,7 +3627,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
 	     uint16_t nb_pkts)
 {
-	volatile struct ice_tx_desc *txr = txq->ice_tx_ring;
+	volatile struct ci_tx_desc *txr = txq->ice_tx_ring;
 	uint16_t n = 0;
 
 	/**
@@ -4882,7 +4882,7 @@ ice_fdir_programming(struct ice_pf *pf, struct ice_fltr_desc *fdir_desc)
 	struct ci_tx_queue *txq = pf->fdir.txq;
 	struct ci_rx_queue *rxq = pf->fdir.rxq;
 	volatile struct ice_fltr_desc *fdirdp;
-	volatile struct ice_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	uint32_t td_cmd;
 	uint16_t i;
 
@@ -4892,7 +4892,7 @@ ice_fdir_programming(struct ice_pf *pf, struct ice_fltr_desc *fdir_desc)
 	fdirdp->dtype_cmd_vsi_fdid = fdir_desc->dtype_cmd_vsi_fdid;
 
 	txdp = &txq->ice_tx_ring[txq->tx_tail + 1];
-	txdp->buf_addr = rte_cpu_to_le_64(pf->fdir.dma_addr);
+	txdp->buffer_addr = rte_cpu_to_le_64(pf->fdir.dma_addr);
 	td_cmd = ICE_TX_DESC_CMD_EOP |
 		ICE_TX_DESC_CMD_RS  |
 		ICE_TX_DESC_CMD_DUMMY;
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
index 0ba1d557ca..bef7bb00ba 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
@@ -774,7 +774,7 @@ ice_recv_scattered_pkts_vec_avx2_offload(void *rx_queue,
 }
 
 static __rte_always_inline void
-ice_vtx1(volatile struct ice_tx_desc *txdp,
+ice_vtx1(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf *pkt, uint64_t flags, bool offload)
 {
 	uint64_t high_qw =
@@ -789,7 +789,7 @@ ice_vtx1(volatile struct ice_tx_desc *txdp,
 }
 
 static __rte_always_inline void
-ice_vtx(volatile struct ice_tx_desc *txdp,
+ice_vtx(volatile struct ci_tx_desc *txdp,
 	struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags, bool offload)
 {
 	const uint64_t hi_qw_tmpl = (ICE_TX_DESC_DTYPE_DATA |
@@ -852,7 +852,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 			      uint16_t nb_pkts, bool offload)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct ice_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = ICE_TD_CMD;
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
index 7c6fe82072..1f6bf5fc8e 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
@@ -847,7 +847,7 @@ ice_recv_scattered_pkts_vec_avx512_offload(void *rx_queue,
 }
 
 static __rte_always_inline void
-ice_vtx1(volatile struct ice_tx_desc *txdp,
+ice_vtx1(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf *pkt, uint64_t flags, bool do_offload)
 {
 	uint64_t high_qw =
@@ -863,7 +863,7 @@ ice_vtx1(volatile struct ice_tx_desc *txdp,
 }
 
 static __rte_always_inline void
-ice_vtx(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkt,
+ice_vtx(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkt,
 	uint16_t nb_pkts,  uint64_t flags, bool do_offload)
 {
 	const uint64_t hi_qw_tmpl = (ICE_TX_DESC_DTYPE_DATA |
@@ -916,7 +916,7 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 				uint16_t nb_pkts, bool do_offload)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct ice_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = ICE_TD_CMD;
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 797ee515dd..be3c1ef216 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -264,13 +264,13 @@ idpf_qc_single_tx_queue_reset(struct ci_tx_queue *txq)
 	}
 
 	txe = txq->sw_ring;
-	size = sizeof(struct idpf_base_tx_desc) * txq->nb_tx_desc;
+	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
 		((volatile char *)txq->idpf_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		txq->idpf_tx_ring[i].qw1 =
+		txq->idpf_tx_ring[i].cmd_type_offset_bsz =
 			rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
@@ -1335,14 +1335,14 @@ idpf_xmit_cleanup(struct ci_tx_queue *txq)
 	uint16_t desc_to_clean_to;
 	uint16_t nb_tx_to_clean;
 
-	volatile struct idpf_base_tx_desc *txd = txq->idpf_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->idpf_tx_ring;
 
 	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
 	if (desc_to_clean_to >= nb_tx_desc)
 		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
 
 	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-	if ((txd[desc_to_clean_to].qw1 &
+	if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
 	     rte_cpu_to_le_64(IDPF_TXD_QW1_DTYPE_M)) !=
 	    rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE)) {
 		TX_LOG(DEBUG, "TX descriptor %4u is not done "
@@ -1358,7 +1358,7 @@ idpf_xmit_cleanup(struct ci_tx_queue *txq)
 		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
 					    last_desc_cleaned);
 
-	txd[desc_to_clean_to].qw1 = 0;
+	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
 
 	txq->last_desc_cleaned = desc_to_clean_to;
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
@@ -1372,8 +1372,8 @@ uint16_t
 idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			  uint16_t nb_pkts)
 {
-	volatile struct idpf_base_tx_desc *txd;
-	volatile struct idpf_base_tx_desc *txr;
+	volatile struct ci_tx_desc *txd;
+	volatile struct ci_tx_desc *txr;
 	union idpf_tx_offload tx_offload = {0};
 	struct ci_tx_entry *txe, *txn;
 	struct ci_tx_entry *sw_ring;
@@ -1491,8 +1491,8 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			/* Setup TX Descriptor */
 			slen = m_seg->data_len;
 			buf_dma_addr = rte_mbuf_data_iova(m_seg);
-			txd->buf_addr = rte_cpu_to_le_64(buf_dma_addr);
-			txd->qw1 = rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DATA |
+			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
+			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DATA |
 				((uint64_t)td_cmd  << IDPF_TXD_QW1_CMD_S) |
 				((uint64_t)td_offset << IDPF_TXD_QW1_OFFSET_S) |
 				((uint64_t)slen << IDPF_TXD_QW1_TX_BUF_SZ_S));
@@ -1519,7 +1519,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			txq->nb_tx_used = 0;
 		}
 
-		txd->qw1 |= rte_cpu_to_le_16(td_cmd << IDPF_TXD_QW1_CMD_S);
+		txd->cmd_type_offset_bsz |= rte_cpu_to_le_16(td_cmd << IDPF_TXD_QW1_CMD_S);
 	}
 
 end_of_tx:
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.h b/drivers/net/intel/idpf/idpf_common_rxtx.h
index 7c6ff5d047..2f2fa153b2 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.h
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.h
@@ -182,7 +182,7 @@ union idpf_tx_offload {
 };
 
 union idpf_tx_desc {
-	struct idpf_base_tx_desc *tx_ring;
+	struct ci_tx_desc *tx_ring;
 	struct idpf_flex_tx_sched_desc *desc_ring;
 	struct idpf_splitq_tx_compl_desc *compl_ring;
 };
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c b/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
index 21c8f79254..5f5d538dcb 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
@@ -483,7 +483,7 @@ idpf_dp_singleq_recv_pkts_avx2(void *rx_queue, struct rte_mbuf **rx_pkts, uint16
 }
 
 static inline void
-idpf_singleq_vtx1(volatile struct idpf_base_tx_desc *txdp,
+idpf_singleq_vtx1(volatile struct ci_tx_desc *txdp,
 		  struct rte_mbuf *pkt, uint64_t flags)
 {
 	uint64_t high_qw =
@@ -497,7 +497,7 @@ idpf_singleq_vtx1(volatile struct idpf_base_tx_desc *txdp,
 }
 
 static inline void
-idpf_singleq_vtx(volatile struct idpf_base_tx_desc *txdp,
+idpf_singleq_vtx(volatile struct ci_tx_desc *txdp,
 		 struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
 	const uint64_t hi_qw_tmpl = (IDPF_TX_DESC_DTYPE_DATA |
@@ -556,7 +556,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts
 				       uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct idpf_base_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = IDPF_TX_DESC_CMD_EOP;
@@ -604,7 +604,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->idpf_tx_ring[txq->tx_next_rs].qw1 |=
+		txq->idpf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)IDPF_TX_DESC_CMD_RS) <<
 					 IDPF_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
index bc2cadd738..c1ec3d1222 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
@@ -1000,7 +1000,7 @@ idpf_dp_splitq_recv_pkts_avx512(void *rx_queue, struct rte_mbuf **rx_pkts,
 }
 
 static __rte_always_inline void
-idpf_singleq_vtx1(volatile struct idpf_base_tx_desc *txdp,
+idpf_singleq_vtx1(volatile struct ci_tx_desc *txdp,
 	  struct rte_mbuf *pkt, uint64_t flags)
 {
 	uint64_t high_qw =
@@ -1016,7 +1016,7 @@ idpf_singleq_vtx1(volatile struct idpf_base_tx_desc *txdp,
 #define IDPF_TX_LEN_MASK 0xAA
 #define IDPF_TX_OFF_MASK 0x55
 static __rte_always_inline void
-idpf_singleq_vtx(volatile struct idpf_base_tx_desc *txdp,
+idpf_singleq_vtx(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
 	const uint64_t hi_qw_tmpl = (IDPF_TX_DESC_DTYPE_DATA  |
@@ -1072,7 +1072,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pk
 					 uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = tx_queue;
-	volatile struct idpf_base_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = IDPF_TX_DESC_CMD_EOP;
@@ -1123,7 +1123,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pk
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->idpf_tx_ring[txq->tx_next_rs].qw1 |=
+		txq->idpf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)IDPF_TX_DESC_CMD_RS) <<
 					 IDPF_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/idpf/idpf_rxtx.c b/drivers/net/intel/idpf/idpf_rxtx.c
index cee454244f..8aa44585fe 100644
--- a/drivers/net/intel/idpf/idpf_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_rxtx.c
@@ -72,7 +72,7 @@ idpf_dma_zone_reserve(struct rte_eth_dev *dev, uint16_t queue_idx,
 			ring_size = RTE_ALIGN(len * sizeof(struct idpf_flex_tx_sched_desc),
 					      IDPF_DMA_MEM_ALIGN);
 		else
-			ring_size = RTE_ALIGN(len * sizeof(struct idpf_base_tx_desc),
+			ring_size = RTE_ALIGN(len * sizeof(struct ci_tx_desc),
 					      IDPF_DMA_MEM_ALIGN);
 		rte_memcpy(ring_name, "idpf Tx ring", sizeof("idpf Tx ring"));
 		break;
diff --git a/drivers/net/intel/idpf/idpf_rxtx_vec_common.h b/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
index 425f0792a1..4702061484 100644
--- a/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
+++ b/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
@@ -31,7 +31,7 @@ idpf_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 	if (txq->complq != NULL)
 		return 1;
 
-	return (txq->idpf_tx_ring[idx].qw1 &
+	return (txq->idpf_tx_ring[idx].cmd_type_offset_bsz &
 			rte_cpu_to_le_64(IDPF_TXD_QW1_DTYPE_M)) ==
 				rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE);
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v4 02/35] net/intel: use common Tx ring structure
  2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
  2026-02-09 16:44   ` [PATCH v4 01/35] net/intel: create common Tx descriptor structure Bruce Richardson
@ 2026-02-09 16:45   ` Bruce Richardson
  2026-02-09 16:45   ` [PATCH v4 03/35] net/intel: create common post-Tx cleanup function Bruce Richardson
                     ` (32 subsequent siblings)
  34 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 16:45 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Ciara Loftus, Praveen Shetty,
	Vladimir Medvedkin, Anatoly Burakov, Jingjing Wu

Rather than having separate per-driver ring pointers in a union, since
we now have a common descriptor type, we can merge all but the ixgbe
pointer into one value.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Ciara Loftus <ciara.loftus@intel.com>
---
 drivers/net/intel/common/tx.h                 |  5 +--
 drivers/net/intel/cpfl/cpfl_rxtx.c            |  2 +-
 drivers/net/intel/i40e/i40e_fdir.c            |  6 ++--
 drivers/net/intel/i40e/i40e_rxtx.c            | 22 ++++++------
 .../net/intel/i40e/i40e_rxtx_vec_altivec.c    |  6 ++--
 drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c   |  6 ++--
 drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c |  6 ++--
 drivers/net/intel/i40e/i40e_rxtx_vec_common.h |  2 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_neon.c   |  6 ++--
 drivers/net/intel/iavf/iavf_rxtx.c            | 14 ++++----
 drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c   |  6 ++--
 drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c | 12 +++----
 drivers/net/intel/iavf/iavf_rxtx_vec_common.h |  2 +-
 drivers/net/intel/ice/ice_dcf_ethdev.c        |  4 +--
 drivers/net/intel/ice/ice_rxtx.c              | 34 +++++++++----------
 drivers/net/intel/ice/ice_rxtx_vec_avx2.c     |  6 ++--
 drivers/net/intel/ice/ice_rxtx_vec_avx512.c   |  6 ++--
 drivers/net/intel/ice/ice_rxtx_vec_common.h   |  2 +-
 drivers/net/intel/idpf/idpf_common_rxtx.c     |  8 ++---
 .../net/intel/idpf/idpf_common_rxtx_avx2.c    |  6 ++--
 .../net/intel/idpf/idpf_common_rxtx_avx512.c  |  6 ++--
 drivers/net/intel/idpf/idpf_rxtx.c            |  2 +-
 drivers/net/intel/idpf/idpf_rxtx_vec_common.h |  2 +-
 23 files changed, 84 insertions(+), 87 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index d7561a2bbb..8cf63e59ab 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -41,10 +41,7 @@ typedef void (*ice_tx_release_mbufs_t)(struct ci_tx_queue *txq);
 
 struct ci_tx_queue {
 	union { /* TX ring virtual address */
-		volatile struct ci_tx_desc *i40e_tx_ring;
-		volatile struct ci_tx_desc *iavf_tx_ring;
-		volatile struct ci_tx_desc *ice_tx_ring;
-		volatile struct ci_tx_desc *idpf_tx_ring;
+		volatile struct ci_tx_desc *ci_tx_ring;
 		volatile union ixgbe_adv_tx_desc *ixgbe_tx_ring;
 	};
 	volatile uint8_t *qtx_tail;               /* register address of tail */
diff --git a/drivers/net/intel/cpfl/cpfl_rxtx.c b/drivers/net/intel/cpfl/cpfl_rxtx.c
index 78bc3e9b49..bc5bec65f0 100644
--- a/drivers/net/intel/cpfl/cpfl_rxtx.c
+++ b/drivers/net/intel/cpfl/cpfl_rxtx.c
@@ -606,7 +606,7 @@ cpfl_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 	}
 
 	if (!is_splitq) {
-		txq->idpf_tx_ring = mz->addr;
+		txq->ci_tx_ring = mz->addr;
 		idpf_qc_single_tx_queue_reset(txq);
 	} else {
 		txq->desc_ring = mz->addr;
diff --git a/drivers/net/intel/i40e/i40e_fdir.c b/drivers/net/intel/i40e/i40e_fdir.c
index 605df73c9e..8a01aec0e2 100644
--- a/drivers/net/intel/i40e/i40e_fdir.c
+++ b/drivers/net/intel/i40e/i40e_fdir.c
@@ -1380,7 +1380,7 @@ i40e_find_available_buffer(struct rte_eth_dev *dev)
 		volatile struct ci_tx_desc *tmp_txdp;
 
 		tmp_tail = txq->tx_tail;
-		tmp_txdp = &txq->i40e_tx_ring[tmp_tail + 1];
+		tmp_txdp = &txq->ci_tx_ring[tmp_tail + 1];
 
 		do {
 			if ((tmp_txdp->cmd_type_offset_bsz &
@@ -1637,7 +1637,7 @@ i40e_flow_fdir_filter_programming(struct i40e_pf *pf,
 
 	PMD_DRV_LOG(INFO, "filling filter programming descriptor.");
 	fdirdp = (volatile struct i40e_filter_program_desc *)
-				(&txq->i40e_tx_ring[txq->tx_tail]);
+				(&txq->ci_tx_ring[txq->tx_tail]);
 
 	fdirdp->qindex_flex_ptype_vsi =
 			rte_cpu_to_le_32((fdir_action->rx_queue <<
@@ -1707,7 +1707,7 @@ i40e_flow_fdir_filter_programming(struct i40e_pf *pf,
 	fdirdp->fd_id = rte_cpu_to_le_32(filter->soft_id);
 
 	PMD_DRV_LOG(INFO, "filling transmit descriptor.");
-	txdp = &txq->i40e_tx_ring[txq->tx_tail + 1];
+	txdp = &txq->ci_tx_ring[txq->tx_tail + 1];
 	txdp->buffer_addr = rte_cpu_to_le_64(pf->fdir.dma_addr[txq->tx_tail >> 1]);
 
 	td_cmd = I40E_TX_DESC_CMD_EOP |
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 92d49ccb79..210fc0201e 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -388,7 +388,7 @@ static inline int
 i40e_xmit_cleanup(struct ci_tx_queue *txq)
 {
 	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	volatile struct ci_tx_desc *txd = txq->i40e_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
 	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
 	uint16_t nb_tx_desc = txq->nb_tx_desc;
 	uint16_t desc_to_clean_to;
@@ -1112,7 +1112,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 	txq = tx_queue;
 	sw_ring = txq->sw_ring;
-	txr = txq->i40e_tx_ring;
+	txr = txq->ci_tx_ring;
 	tx_id = txq->tx_tail;
 	txe = &sw_ring[tx_id];
 
@@ -1347,7 +1347,7 @@ i40e_tx_free_bufs(struct ci_tx_queue *txq)
 	const uint16_t k = RTE_ALIGN_FLOOR(tx_rs_thresh, I40E_TX_MAX_FREE_BUF_SZ);
 	const uint16_t m = tx_rs_thresh % I40E_TX_MAX_FREE_BUF_SZ;
 
-	if ((txq->i40e_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
+	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
 			rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) !=
 			rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE))
 		return 0;
@@ -1431,7 +1431,7 @@ i40e_tx_fill_hw_ring(struct ci_tx_queue *txq,
 		     struct rte_mbuf **pkts,
 		     uint16_t nb_pkts)
 {
-	volatile struct ci_tx_desc *txdp = &txq->i40e_tx_ring[txq->tx_tail];
+	volatile struct ci_tx_desc *txdp = &txq->ci_tx_ring[txq->tx_tail];
 	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
 	const int N_PER_LOOP = 4;
 	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
@@ -1459,7 +1459,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
 	     uint16_t nb_pkts)
 {
-	volatile struct ci_tx_desc *txr = txq->i40e_tx_ring;
+	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
 	uint16_t n = 0;
 
 	/**
@@ -2421,7 +2421,7 @@ i40e_dev_tx_descriptor_status(void *tx_queue, uint16_t offset)
 			desc -= txq->nb_tx_desc;
 	}
 
-	status = &txq->i40e_tx_ring[desc].cmd_type_offset_bsz;
+	status = &txq->ci_tx_ring[desc].cmd_type_offset_bsz;
 	mask = rte_le_to_cpu_64(I40E_TXD_QW1_DTYPE_MASK);
 	expect = rte_cpu_to_le_64(
 		I40E_TX_DESC_DTYPE_DESC_DONE << I40E_TXD_QW1_DTYPE_SHIFT);
@@ -2618,7 +2618,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	/* Allocate TX hardware ring descriptors. */
 	ring_size = sizeof(struct ci_tx_desc) * I40E_MAX_RING_DESC;
 	ring_size = RTE_ALIGN(ring_size, I40E_DMA_MEM_ALIGN);
-	tz = rte_eth_dma_zone_reserve(dev, "i40e_tx_ring", queue_idx,
+	tz = rte_eth_dma_zone_reserve(dev, "ci_tx_ring", queue_idx,
 			      ring_size, I40E_RING_BASE_ALIGN, socket_id);
 	if (!tz) {
 		i40e_tx_queue_release(txq);
@@ -2640,7 +2640,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	txq->tx_deferred_start = tx_conf->tx_deferred_start;
 
 	txq->tx_ring_dma = tz->iova;
-	txq->i40e_tx_ring = (struct ci_tx_desc *)tz->addr;
+	txq->ci_tx_ring = (struct ci_tx_desc *)tz->addr;
 
 	/* Allocate software ring */
 	txq->sw_ring =
@@ -2915,11 +2915,11 @@ i40e_reset_tx_queue(struct ci_tx_queue *txq)
 	txe = txq->sw_ring;
 	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
-		((volatile char *)txq->i40e_tx_ring)[i] = 0;
+		((volatile char *)txq->ci_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		volatile struct ci_tx_desc *txd = &txq->i40e_tx_ring[i];
+		volatile struct ci_tx_desc *txd = &txq->ci_tx_ring[i];
 
 		txd->cmd_type_offset_bsz =
 			rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE);
@@ -3240,7 +3240,7 @@ i40e_fdir_setup_tx_resources(struct i40e_pf *pf)
 	txq->i40e_vsi = pf->fdir.fdir_vsi;
 
 	txq->tx_ring_dma = tz->iova;
-	txq->i40e_tx_ring = (struct ci_tx_desc *)tz->addr;
+	txq->ci_tx_ring = (struct ci_tx_desc *)tz->addr;
 
 	/*
 	 * don't need to allocate software ring and reset for the fdir
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
index ef5b252898..81e9e2bc0b 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
@@ -489,7 +489,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->i40e_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = &txq->sw_ring_vec[tx_id];
 
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
@@ -509,7 +509,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->i40e_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = &txq->sw_ring_vec[tx_id];
 	}
 
@@ -519,7 +519,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->i40e_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
 						I40E_TXD_QW1_CMD_SHIFT);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
index 137c1f9765..f054bd41bf 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
@@ -753,7 +753,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->i40e_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = &txq->sw_ring_vec[tx_id];
 
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
@@ -774,7 +774,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->i40e_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = &txq->sw_ring_vec[tx_id];
 	}
 
@@ -784,7 +784,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->i40e_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
 						I40E_TXD_QW1_CMD_SHIFT);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
index 6971488750..9a967faeee 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
@@ -821,7 +821,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->i40e_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = (void *)txq->sw_ring;
 	txep += tx_id;
 
@@ -843,7 +843,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = txq->i40e_tx_ring;
+		txdp = txq->ci_tx_ring;
 		txep = (void *)txq->sw_ring;
 	}
 
@@ -853,7 +853,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->i40e_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
 						I40E_TXD_QW1_CMD_SHIFT);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_common.h b/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
index 14651f2f06..1fd7fc75bf 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
@@ -15,7 +15,7 @@
 static inline int
 i40e_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 {
-	return (txq->i40e_tx_ring[idx].cmd_type_offset_bsz &
+	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
 			rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) ==
 				rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE);
 }
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
index 6404b70c56..0b95152232 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
@@ -638,7 +638,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict tx_queue,
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->i40e_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = &txq->sw_ring_vec[tx_id];
 
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
@@ -658,7 +658,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict tx_queue,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->i40e_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = &txq->sw_ring_vec[tx_id];
 	}
 
@@ -668,7 +668,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict tx_queue,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->i40e_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
 						I40E_TXD_QW1_CMD_SHIFT);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index e4421a9932..807bc92a45 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -269,11 +269,11 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	txe = txq->sw_ring;
 	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
-		((volatile char *)txq->iavf_tx_ring)[i] = 0;
+		((volatile char *)txq->ci_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		txq->iavf_tx_ring[i].cmd_type_offset_bsz =
+		txq->ci_tx_ring[i].cmd_type_offset_bsz =
 			rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
@@ -829,7 +829,7 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	/* Allocate TX hardware ring descriptors. */
 	ring_size = sizeof(struct ci_tx_desc) * IAVF_MAX_RING_DESC;
 	ring_size = RTE_ALIGN(ring_size, IAVF_DMA_MEM_ALIGN);
-	mz = rte_eth_dma_zone_reserve(dev, "iavf_tx_ring", queue_idx,
+	mz = rte_eth_dma_zone_reserve(dev, "ci_tx_ring", queue_idx,
 				      ring_size, IAVF_RING_BASE_ALIGN,
 				      socket_id);
 	if (!mz) {
@@ -839,7 +839,7 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
 		return -ENOMEM;
 	}
 	txq->tx_ring_dma = mz->iova;
-	txq->iavf_tx_ring = (struct ci_tx_desc *)mz->addr;
+	txq->ci_tx_ring = (struct ci_tx_desc *)mz->addr;
 
 	txq->mz = mz;
 	reset_tx_queue(txq);
@@ -2333,7 +2333,7 @@ iavf_xmit_cleanup(struct ci_tx_queue *txq)
 	uint16_t desc_to_clean_to;
 	uint16_t nb_tx_to_clean;
 
-	volatile struct ci_tx_desc *txd = txq->iavf_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
 
 	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
 	if (desc_to_clean_to >= nb_tx_desc)
@@ -2756,7 +2756,7 @@ uint16_t
 iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = tx_queue;
-	volatile struct ci_tx_desc *txr = txq->iavf_tx_ring;
+	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
 	struct ci_tx_entry *txe_ring = txq->sw_ring;
 	struct ci_tx_entry *txe, *txn;
 	struct rte_mbuf *mb, *mb_seg;
@@ -4462,7 +4462,7 @@ iavf_dev_tx_desc_status(void *tx_queue, uint16_t offset)
 			desc -= txq->nb_tx_desc;
 	}
 
-	status = &txq->iavf_tx_ring[desc].cmd_type_offset_bsz;
+	status = &txq->ci_tx_ring[desc].cmd_type_offset_bsz;
 	mask = rte_le_to_cpu_64(IAVF_TXD_QW1_DTYPE_MASK);
 	expect = rte_cpu_to_le_64(
 		 IAVF_TX_DESC_DTYPE_DESC_DONE << IAVF_TXD_QW1_DTYPE_SHIFT);
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
index 5b62d51cf7..89ce841b9e 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
@@ -1729,7 +1729,7 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	nb_commit = nb_pkts;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->iavf_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = &txq->sw_ring_vec[tx_id];
 
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
@@ -1750,7 +1750,7 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->iavf_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = &txq->sw_ring_vec[tx_id];
 	}
 
@@ -1760,7 +1760,7 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->iavf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
 					 IAVF_TXD_QW1_CMD_SHIFT);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c b/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
index d79d96c7b7..ad1b0b90cd 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
@@ -2219,7 +2219,7 @@ iavf_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	nb_commit = nb_pkts;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->iavf_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = (void *)txq->sw_ring;
 	txep += tx_id;
 
@@ -2241,7 +2241,7 @@ iavf_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->iavf_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = (void *)txq->sw_ring;
 		txep += tx_id;
 	}
@@ -2252,7 +2252,7 @@ iavf_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->iavf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
 					 IAVF_TXD_QW1_CMD_SHIFT);
 		txq->tx_next_rs =
@@ -2288,7 +2288,7 @@ iavf_xmit_fixed_burst_vec_avx512_ctx(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	nb_pkts = nb_commit >> 1;
 	tx_id = txq->tx_tail;
-	txdp = &txq->iavf_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = (void *)txq->sw_ring;
 	txep += (tx_id >> 1);
 
@@ -2309,7 +2309,7 @@ iavf_xmit_fixed_burst_vec_avx512_ctx(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 		tx_id = 0;
 		/* avoid reach the end of ring */
-		txdp = txq->iavf_tx_ring;
+		txdp = txq->ci_tx_ring;
 		txep = (void *)txq->sw_ring;
 	}
 
@@ -2320,7 +2320,7 @@ iavf_xmit_fixed_burst_vec_avx512_ctx(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 
 	if (tx_id > txq->tx_next_rs) {
-		txq->iavf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
 					 IAVF_TXD_QW1_CMD_SHIFT);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_common.h b/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
index f1ea57034f..1832b76f89 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
@@ -14,7 +14,7 @@
 static inline int
 iavf_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 {
-	return (txq->iavf_tx_ring[idx].cmd_type_offset_bsz &
+	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
 			rte_cpu_to_le_64(IAVF_TXD_QW1_DTYPE_MASK)) ==
 				rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE);
 }
diff --git a/drivers/net/intel/ice/ice_dcf_ethdev.c b/drivers/net/intel/ice/ice_dcf_ethdev.c
index ab1d499cef..5f537b4c12 100644
--- a/drivers/net/intel/ice/ice_dcf_ethdev.c
+++ b/drivers/net/intel/ice/ice_dcf_ethdev.c
@@ -401,11 +401,11 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	txe = txq->sw_ring;
 	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
-		((volatile char *)txq->ice_tx_ring)[i] = 0;
+		((volatile char *)txq->ci_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		txq->ice_tx_ring[i].cmd_type_offset_bsz =
+		txq->ci_tx_ring[i].cmd_type_offset_bsz =
 			rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 74b80e7df3..e3ffbdb587 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -1117,11 +1117,11 @@ ice_reset_tx_queue(struct ci_tx_queue *txq)
 	txe = txq->sw_ring;
 	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
-		((volatile char *)txq->ice_tx_ring)[i] = 0;
+		((volatile char *)txq->ci_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		volatile struct ci_tx_desc *txd = &txq->ice_tx_ring[i];
+		volatile struct ci_tx_desc *txd = &txq->ci_tx_ring[i];
 
 		txd->cmd_type_offset_bsz =
 			rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE);
@@ -1625,7 +1625,7 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
 	/* Allocate TX hardware ring descriptors. */
 	ring_size = sizeof(struct ci_tx_desc) * ICE_MAX_NUM_DESC_BY_MAC(hw);
 	ring_size = RTE_ALIGN(ring_size, ICE_DMA_MEM_ALIGN);
-	tz = rte_eth_dma_zone_reserve(dev, "ice_tx_ring", queue_idx,
+	tz = rte_eth_dma_zone_reserve(dev, "ci_tx_ring", queue_idx,
 				      ring_size, ICE_RING_BASE_ALIGN,
 				      socket_id);
 	if (!tz) {
@@ -1649,7 +1649,7 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
 	txq->tx_deferred_start = tx_conf->tx_deferred_start;
 
 	txq->tx_ring_dma = tz->iova;
-	txq->ice_tx_ring = tz->addr;
+	txq->ci_tx_ring = tz->addr;
 
 	/* Allocate software ring */
 	txq->sw_ring =
@@ -2555,7 +2555,7 @@ ice_tx_descriptor_status(void *tx_queue, uint16_t offset)
 			desc -= txq->nb_tx_desc;
 	}
 
-	status = &txq->ice_tx_ring[desc].cmd_type_offset_bsz;
+	status = &txq->ci_tx_ring[desc].cmd_type_offset_bsz;
 	mask = rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M);
 	expect = rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE <<
 				  ICE_TXD_QW1_DTYPE_S);
@@ -2638,7 +2638,7 @@ ice_fdir_setup_tx_resources(struct ice_pf *pf)
 	txq->ice_vsi = pf->fdir.fdir_vsi;
 
 	txq->tx_ring_dma = tz->iova;
-	txq->ice_tx_ring = (struct ci_tx_desc *)tz->addr;
+	txq->ci_tx_ring = (struct ci_tx_desc *)tz->addr;
 	/*
 	 * don't need to allocate software ring and reset for the fdir
 	 * program queue just set the queue has been configured.
@@ -3027,7 +3027,7 @@ static inline int
 ice_xmit_cleanup(struct ci_tx_queue *txq)
 {
 	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	volatile struct ci_tx_desc *txd = txq->ice_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
 	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
 	uint16_t nb_tx_desc = txq->nb_tx_desc;
 	uint16_t desc_to_clean_to;
@@ -3148,7 +3148,7 @@ uint16_t
 ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq;
-	volatile struct ci_tx_desc *ice_tx_ring;
+	volatile struct ci_tx_desc *ci_tx_ring;
 	volatile struct ci_tx_desc *txd;
 	struct ci_tx_entry *sw_ring;
 	struct ci_tx_entry *txe, *txn;
@@ -3171,7 +3171,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 	txq = tx_queue;
 	sw_ring = txq->sw_ring;
-	ice_tx_ring = txq->ice_tx_ring;
+	ci_tx_ring = txq->ci_tx_ring;
 	tx_id = txq->tx_tail;
 	txe = &sw_ring[tx_id];
 
@@ -3257,7 +3257,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			/* Setup TX context descriptor if required */
 			volatile struct ice_tx_ctx_desc *ctx_txd =
 				(volatile struct ice_tx_ctx_desc *)
-					&ice_tx_ring[tx_id];
+					&ci_tx_ring[tx_id];
 			uint16_t cd_l2tag2 = 0;
 			uint64_t cd_type_cmd_tso_mss = ICE_TX_DESC_DTYPE_CTX;
 
@@ -3299,7 +3299,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		m_seg = tx_pkt;
 
 		do {
-			txd = &ice_tx_ring[tx_id];
+			txd = &ci_tx_ring[tx_id];
 			txn = &sw_ring[txe->next_id];
 
 			if (txe->mbuf)
@@ -3327,7 +3327,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 				txe->last_id = tx_last;
 				tx_id = txe->next_id;
 				txe = txn;
-				txd = &ice_tx_ring[tx_id];
+				txd = &ci_tx_ring[tx_id];
 				txn = &sw_ring[txe->next_id];
 			}
 
@@ -3410,7 +3410,7 @@ ice_tx_free_bufs(struct ci_tx_queue *txq)
 	struct ci_tx_entry *txep;
 	uint16_t i;
 
-	if ((txq->ice_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
+	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
 	     rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M)) !=
 	    rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE))
 		return 0;
@@ -3594,7 +3594,7 @@ static inline void
 ice_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
 		    uint16_t nb_pkts)
 {
-	volatile struct ci_tx_desc *txdp = &txq->ice_tx_ring[txq->tx_tail];
+	volatile struct ci_tx_desc *txdp = &txq->ci_tx_ring[txq->tx_tail];
 	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
 	const int N_PER_LOOP = 4;
 	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
@@ -3627,7 +3627,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
 	     uint16_t nb_pkts)
 {
-	volatile struct ci_tx_desc *txr = txq->ice_tx_ring;
+	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
 	uint16_t n = 0;
 
 	/**
@@ -4887,11 +4887,11 @@ ice_fdir_programming(struct ice_pf *pf, struct ice_fltr_desc *fdir_desc)
 	uint16_t i;
 
 	fdirdp = (volatile struct ice_fltr_desc *)
-		(&txq->ice_tx_ring[txq->tx_tail]);
+		(&txq->ci_tx_ring[txq->tx_tail]);
 	fdirdp->qidx_compq_space_stat = fdir_desc->qidx_compq_space_stat;
 	fdirdp->dtype_cmd_vsi_fdid = fdir_desc->dtype_cmd_vsi_fdid;
 
-	txdp = &txq->ice_tx_ring[txq->tx_tail + 1];
+	txdp = &txq->ci_tx_ring[txq->tx_tail + 1];
 	txdp->buffer_addr = rte_cpu_to_le_64(pf->fdir.dma_addr);
 	td_cmd = ICE_TX_DESC_CMD_EOP |
 		ICE_TX_DESC_CMD_RS  |
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
index bef7bb00ba..0a1df0b2f6 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
@@ -869,7 +869,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->ice_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = &txq->sw_ring_vec[tx_id];
 
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
@@ -890,7 +890,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->ice_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = &txq->sw_ring_vec[tx_id];
 	}
 
@@ -900,7 +900,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->ice_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS) <<
 					 ICE_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
index 1f6bf5fc8e..d42f41461f 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
@@ -933,7 +933,7 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->ice_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = (void *)txq->sw_ring;
 	txep += tx_id;
 
@@ -955,7 +955,7 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = txq->ice_tx_ring;
+		txdp = txq->ci_tx_ring;
 		txep = (void *)txq->sw_ring;
 	}
 
@@ -965,7 +965,7 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->ice_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS) <<
 					 ICE_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_common.h b/drivers/net/intel/ice/ice_rxtx_vec_common.h
index ff46a8fb49..8ba591e403 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_common.h
+++ b/drivers/net/intel/ice/ice_rxtx_vec_common.h
@@ -11,7 +11,7 @@
 static inline int
 ice_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 {
-	return (txq->ice_tx_ring[idx].cmd_type_offset_bsz &
+	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
 			rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M)) ==
 				rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE);
 }
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index be3c1ef216..51074bda3a 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -266,11 +266,11 @@ idpf_qc_single_tx_queue_reset(struct ci_tx_queue *txq)
 	txe = txq->sw_ring;
 	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
-		((volatile char *)txq->idpf_tx_ring)[i] = 0;
+		((volatile char *)txq->ci_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		txq->idpf_tx_ring[i].cmd_type_offset_bsz =
+		txq->ci_tx_ring[i].cmd_type_offset_bsz =
 			rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
@@ -1335,7 +1335,7 @@ idpf_xmit_cleanup(struct ci_tx_queue *txq)
 	uint16_t desc_to_clean_to;
 	uint16_t nb_tx_to_clean;
 
-	volatile struct ci_tx_desc *txd = txq->idpf_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
 
 	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
 	if (desc_to_clean_to >= nb_tx_desc)
@@ -1398,7 +1398,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		return nb_tx;
 
 	sw_ring = txq->sw_ring;
-	txr = txq->idpf_tx_ring;
+	txr = txq->ci_tx_ring;
 	tx_id = txq->tx_tail;
 	txe = &sw_ring[tx_id];
 
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c b/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
index 5f5d538dcb..04efee3722 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
@@ -573,7 +573,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->idpf_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = &txq->sw_ring_vec[tx_id];
 
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
@@ -594,7 +594,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->idpf_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = &txq->sw_ring_vec[tx_id];
 	}
 
@@ -604,7 +604,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->idpf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)IDPF_TX_DESC_CMD_RS) <<
 					 IDPF_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
index c1ec3d1222..d5e5a2ca5f 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
@@ -1090,7 +1090,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pk
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->idpf_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = (void *)txq->sw_ring;
 	txep += tx_id;
 
@@ -1112,7 +1112,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pk
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->idpf_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = (void *)txq->sw_ring;
 		txep += tx_id;
 	}
@@ -1123,7 +1123,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pk
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->idpf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)IDPF_TX_DESC_CMD_RS) <<
 					 IDPF_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/idpf/idpf_rxtx.c b/drivers/net/intel/idpf/idpf_rxtx.c
index 8aa44585fe..0de54d9305 100644
--- a/drivers/net/intel/idpf/idpf_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_rxtx.c
@@ -481,7 +481,7 @@ idpf_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 	}
 
 	if (!is_splitq) {
-		txq->idpf_tx_ring = mz->addr;
+		txq->ci_tx_ring = mz->addr;
 		idpf_qc_single_tx_queue_reset(txq);
 	} else {
 		txq->desc_ring = mz->addr;
diff --git a/drivers/net/intel/idpf/idpf_rxtx_vec_common.h b/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
index 4702061484..b5e8574667 100644
--- a/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
+++ b/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
@@ -31,7 +31,7 @@ idpf_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 	if (txq->complq != NULL)
 		return 1;
 
-	return (txq->idpf_tx_ring[idx].cmd_type_offset_bsz &
+	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
 			rte_cpu_to_le_64(IDPF_TXD_QW1_DTYPE_M)) ==
 				rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE);
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v4 03/35] net/intel: create common post-Tx cleanup function
  2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
  2026-02-09 16:44   ` [PATCH v4 01/35] net/intel: create common Tx descriptor structure Bruce Richardson
  2026-02-09 16:45   ` [PATCH v4 02/35] net/intel: use common Tx ring structure Bruce Richardson
@ 2026-02-09 16:45   ` Bruce Richardson
  2026-02-10 12:18     ` Burakov, Anatoly
  2026-02-09 16:45   ` [PATCH v4 04/35] net/intel: consolidate definitions for Tx desc fields Bruce Richardson
                     ` (31 subsequent siblings)
  34 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 16:45 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Vladimir Medvedkin, Anatoly Burakov,
	Jingjing Wu, Praveen Shetty

The code used in ice, iavf, idpf and i40e for doing cleanup of mbufs
after they had been transmitted was identical. Therefore deduplicate it
by moving to common and remove the driver-specific versions.

Rather than having all Tx code in the one file, which could start
getting rather long, create a new header file for scalar datapath
functions.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h             |  5 ++
 drivers/net/intel/common/tx_scalar.h      | 62 +++++++++++++++++++++++
 drivers/net/intel/i40e/i40e_rxtx.c        | 49 ++----------------
 drivers/net/intel/iavf/iavf_rxtx.c        | 50 ++----------------
 drivers/net/intel/ice/ice_rxtx.c          | 60 ++--------------------
 drivers/net/intel/idpf/idpf_common_rxtx.c | 46 ++---------------
 6 files changed, 85 insertions(+), 187 deletions(-)
 create mode 100644 drivers/net/intel/common/tx_scalar.h

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 8cf63e59ab..558c861df0 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -377,4 +377,9 @@ ci_tx_path_select(const struct ci_tx_path_features *req_features,
 	return idx;
 }
 
+/* include the scalar functions at the end, so they can use the common definitions.
+ * This is done so drivers can use all functions just by including tx.h
+ */
+#include "tx_scalar.h"
+
 #endif /* _COMMON_INTEL_TX_H_ */
diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
new file mode 100644
index 0000000000..181629d856
--- /dev/null
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2025 Intel Corporation
+ */
+
+#ifndef _COMMON_INTEL_TX_SCALAR_H_
+#define _COMMON_INTEL_TX_SCALAR_H_
+
+#include <stdint.h>
+#include <rte_byteorder.h>
+
+/* depends on common Tx definitions. */
+#include "tx.h"
+
+/*
+ * Common transmit descriptor cleanup function for Intel drivers.
+ *
+ * Returns:
+ *   0 on success
+ *  -1 if cleanup cannot proceed (descriptors not yet processed by HW)
+ */
+static __rte_always_inline int
+ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
+{
+	struct ci_tx_entry *sw_ring = txq->sw_ring;
+	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
+	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
+	uint16_t nb_tx_desc = txq->nb_tx_desc;
+	uint16_t desc_to_clean_to;
+	uint16_t nb_tx_to_clean;
+
+	/* Determine the last descriptor needing to be cleaned */
+	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
+	if (desc_to_clean_to >= nb_tx_desc)
+		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
+
+	/* Check if descriptor is done - all drivers use 0xF as done value in bits 3:0 */
+	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
+	if ((txd[desc_to_clean_to].cmd_type_offset_bsz & rte_cpu_to_le_64(0x0FUL)) !=
+			rte_cpu_to_le_64(0x0FUL))
+		return -1;
+
+	/* Figure out how many descriptors will be cleaned */
+	if (last_desc_cleaned > desc_to_clean_to)
+		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) + desc_to_clean_to);
+	else
+		nb_tx_to_clean = (uint16_t)(desc_to_clean_to - last_desc_cleaned);
+
+	/* The last descriptor to clean is done, so that means all the
+	 * descriptors from the last descriptor that was cleaned
+	 * up to the last descriptor with the RS bit set
+	 * are done. Only reset the threshold descriptor.
+	 */
+	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
+
+	/* Update the txq to reflect the last descriptor that was cleaned */
+	txq->last_desc_cleaned = desc_to_clean_to;
+	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
+
+	return 0;
+}
+
+#endif /* _COMMON_INTEL_TX_SCALAR_H_ */
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 210fc0201e..2760e76e99 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -384,45 +384,6 @@ i40e_build_ctob(uint32_t td_cmd,
 			((uint64_t)td_tag  << I40E_TXD_QW1_L2TAG1_SHIFT));
 }
 
-static inline int
-i40e_xmit_cleanup(struct ci_tx_queue *txq)
-{
-	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
-	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
-	uint16_t nb_tx_desc = txq->nb_tx_desc;
-	uint16_t desc_to_clean_to;
-	uint16_t nb_tx_to_clean;
-
-	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
-	if (desc_to_clean_to >= nb_tx_desc)
-		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
-
-	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-	if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) !=
-			rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE)) {
-		PMD_TX_LOG(DEBUG, "TX descriptor %4u is not done "
-			   "(port=%d queue=%d)", desc_to_clean_to,
-			   txq->port_id, txq->queue_id);
-		return -1;
-	}
-
-	if (last_desc_cleaned > desc_to_clean_to)
-		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) +
-							desc_to_clean_to);
-	else
-		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
-					last_desc_cleaned);
-
-	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
-
-	txq->last_desc_cleaned = desc_to_clean_to;
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
-
-	return 0;
-}
-
 static inline int
 #ifdef RTE_LIBRTE_I40E_RX_ALLOW_BULK_ALLOC
 check_rx_burst_bulk_alloc_preconditions(struct ci_rx_queue *rxq)
@@ -1118,7 +1079,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 	/* Check if the descriptor ring needs to be cleaned. */
 	if (txq->nb_tx_free < txq->tx_free_thresh)
-		(void)i40e_xmit_cleanup(txq);
+		(void)ci_tx_xmit_cleanup(txq);
 
 	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
 		td_cmd = 0;
@@ -1159,14 +1120,14 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
 
 		if (nb_used > txq->nb_tx_free) {
-			if (i40e_xmit_cleanup(txq) != 0) {
+			if (ci_tx_xmit_cleanup(txq) != 0) {
 				if (nb_tx == 0)
 					return 0;
 				goto end_of_tx;
 			}
 			if (unlikely(nb_used > txq->tx_rs_thresh)) {
 				while (nb_used > txq->nb_tx_free) {
-					if (i40e_xmit_cleanup(txq) != 0) {
+					if (ci_tx_xmit_cleanup(txq) != 0) {
 						if (nb_tx == 0)
 							return 0;
 						goto end_of_tx;
@@ -2808,7 +2769,7 @@ i40e_tx_done_cleanup_full(struct ci_tx_queue *txq,
 	tx_last = txq->tx_tail;
 	tx_id  = swr_ring[tx_last].next_id;
 
-	if (txq->nb_tx_free == 0 && i40e_xmit_cleanup(txq))
+	if (txq->nb_tx_free == 0 && ci_tx_xmit_cleanup(txq))
 		return 0;
 
 	nb_tx_to_clean = txq->nb_tx_free;
@@ -2842,7 +2803,7 @@ i40e_tx_done_cleanup_full(struct ci_tx_queue *txq,
 			break;
 
 		if (pkt_cnt < free_cnt) {
-			if (i40e_xmit_cleanup(txq))
+			if (ci_tx_xmit_cleanup(txq))
 				break;
 
 			nb_tx_to_clean = txq->nb_tx_free - nb_tx_free_last;
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index 807bc92a45..560abfc1ef 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -2324,46 +2324,6 @@ iavf_recv_pkts_bulk_alloc(void *rx_queue,
 	return nb_rx;
 }
 
-static inline int
-iavf_xmit_cleanup(struct ci_tx_queue *txq)
-{
-	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
-	uint16_t nb_tx_desc = txq->nb_tx_desc;
-	uint16_t desc_to_clean_to;
-	uint16_t nb_tx_to_clean;
-
-	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
-
-	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
-	if (desc_to_clean_to >= nb_tx_desc)
-		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
-
-	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-	if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(IAVF_TXD_QW1_DTYPE_MASK)) !=
-			rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE)) {
-		PMD_TX_LOG(DEBUG, "TX descriptor %4u is not done "
-			   "(port=%d queue=%d)", desc_to_clean_to,
-			   txq->port_id, txq->queue_id);
-		return -1;
-	}
-
-	if (last_desc_cleaned > desc_to_clean_to)
-		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) +
-							desc_to_clean_to);
-	else
-		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
-					last_desc_cleaned);
-
-	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
-
-	txq->last_desc_cleaned = desc_to_clean_to;
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
-
-	return 0;
-}
-
 /* Check if the context descriptor is needed for TX offloading */
 static inline uint16_t
 iavf_calc_context_desc(struct rte_mbuf *mb, uint8_t vlan_flag)
@@ -2768,7 +2728,7 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 	/* Check if the descriptor ring needs to be cleaned. */
 	if (txq->nb_tx_free < txq->tx_free_thresh)
-		iavf_xmit_cleanup(txq);
+		ci_tx_xmit_cleanup(txq);
 
 	desc_idx = txq->tx_tail;
 	txe = &txe_ring[desc_idx];
@@ -2823,14 +2783,14 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			txq->port_id, txq->queue_id, desc_idx, desc_idx_last);
 
 		if (nb_desc_required > txq->nb_tx_free) {
-			if (iavf_xmit_cleanup(txq)) {
+			if (ci_tx_xmit_cleanup(txq)) {
 				if (idx == 0)
 					return 0;
 				goto end_of_tx;
 			}
 			if (unlikely(nb_desc_required > txq->tx_rs_thresh)) {
 				while (nb_desc_required > txq->nb_tx_free) {
-					if (iavf_xmit_cleanup(txq)) {
+					if (ci_tx_xmit_cleanup(txq)) {
 						if (idx == 0)
 							return 0;
 						goto end_of_tx;
@@ -4300,7 +4260,7 @@ iavf_tx_done_cleanup_full(struct ci_tx_queue *txq,
 	tx_id = txq->tx_tail;
 	tx_last = tx_id;
 
-	if (txq->nb_tx_free == 0 && iavf_xmit_cleanup(txq))
+	if (txq->nb_tx_free == 0 && ci_tx_xmit_cleanup(txq))
 		return 0;
 
 	nb_tx_to_clean = txq->nb_tx_free;
@@ -4332,7 +4292,7 @@ iavf_tx_done_cleanup_full(struct ci_tx_queue *txq,
 			break;
 
 		if (pkt_cnt < free_cnt) {
-			if (iavf_xmit_cleanup(txq))
+			if (ci_tx_xmit_cleanup(txq))
 				break;
 
 			nb_tx_to_clean = txq->nb_tx_free - nb_tx_free_last;
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index e3ffbdb587..7a33e1e980 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3023,56 +3023,6 @@ ice_txd_enable_checksum(uint64_t ol_flags,
 	}
 }
 
-static inline int
-ice_xmit_cleanup(struct ci_tx_queue *txq)
-{
-	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
-	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
-	uint16_t nb_tx_desc = txq->nb_tx_desc;
-	uint16_t desc_to_clean_to;
-	uint16_t nb_tx_to_clean;
-
-	/* Determine the last descriptor needing to be cleaned */
-	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
-	if (desc_to_clean_to >= nb_tx_desc)
-		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
-
-	/* Check to make sure the last descriptor to clean is done */
-	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-	if (!(txd[desc_to_clean_to].cmd_type_offset_bsz &
-	    rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE))) {
-		PMD_TX_LOG(DEBUG, "TX descriptor %4u is not done "
-			   "(port=%d queue=%d) value=0x%"PRIx64,
-			   desc_to_clean_to,
-			   txq->port_id, txq->queue_id,
-			   txd[desc_to_clean_to].cmd_type_offset_bsz);
-		/* Failed to clean any descriptors */
-		return -1;
-	}
-
-	/* Figure out how many descriptors will be cleaned */
-	if (last_desc_cleaned > desc_to_clean_to)
-		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) +
-					    desc_to_clean_to);
-	else
-		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
-					    last_desc_cleaned);
-
-	/* The last descriptor to clean is done, so that means all the
-	 * descriptors from the last descriptor that was cleaned
-	 * up to the last descriptor with the RS bit set
-	 * are done. Only reset the threshold descriptor.
-	 */
-	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
-
-	/* Update the txq to reflect the last descriptor that was cleaned */
-	txq->last_desc_cleaned = desc_to_clean_to;
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
-
-	return 0;
-}
-
 /* Construct the tx flags */
 static inline uint64_t
 ice_build_ctob(uint32_t td_cmd,
@@ -3180,7 +3130,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 	/* Check if the descriptor ring needs to be cleaned. */
 	if (txq->nb_tx_free < txq->tx_free_thresh)
-		(void)ice_xmit_cleanup(txq);
+		(void)ci_tx_xmit_cleanup(txq);
 
 	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
 		tx_pkt = *tx_pkts++;
@@ -3217,14 +3167,14 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
 
 		if (nb_used > txq->nb_tx_free) {
-			if (ice_xmit_cleanup(txq) != 0) {
+			if (ci_tx_xmit_cleanup(txq) != 0) {
 				if (nb_tx == 0)
 					return 0;
 				goto end_of_tx;
 			}
 			if (unlikely(nb_used > txq->tx_rs_thresh)) {
 				while (nb_used > txq->nb_tx_free) {
-					if (ice_xmit_cleanup(txq) != 0) {
+					if (ci_tx_xmit_cleanup(txq) != 0) {
 						if (nb_tx == 0)
 							return 0;
 						goto end_of_tx;
@@ -3459,7 +3409,7 @@ ice_tx_done_cleanup_full(struct ci_tx_queue *txq,
 	tx_last = txq->tx_tail;
 	tx_id  = swr_ring[tx_last].next_id;
 
-	if (txq->nb_tx_free == 0 && ice_xmit_cleanup(txq))
+	if (txq->nb_tx_free == 0 && ci_tx_xmit_cleanup(txq))
 		return 0;
 
 	nb_tx_to_clean = txq->nb_tx_free;
@@ -3493,7 +3443,7 @@ ice_tx_done_cleanup_full(struct ci_tx_queue *txq,
 			break;
 
 		if (pkt_cnt < free_cnt) {
-			if (ice_xmit_cleanup(txq))
+			if (ci_tx_xmit_cleanup(txq))
 				break;
 
 			nb_tx_to_clean = txq->nb_tx_free - nb_tx_free_last;
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 51074bda3a..23666539ab 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -1326,46 +1326,6 @@ idpf_dp_singleq_recv_scatter_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 	return nb_rx;
 }
 
-static inline int
-idpf_xmit_cleanup(struct ci_tx_queue *txq)
-{
-	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
-	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	uint16_t nb_tx_desc = txq->nb_tx_desc;
-	uint16_t desc_to_clean_to;
-	uint16_t nb_tx_to_clean;
-
-	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
-
-	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
-	if (desc_to_clean_to >= nb_tx_desc)
-		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
-
-	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-	if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
-	     rte_cpu_to_le_64(IDPF_TXD_QW1_DTYPE_M)) !=
-	    rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE)) {
-		TX_LOG(DEBUG, "TX descriptor %4u is not done "
-		       "(port=%d queue=%d)", desc_to_clean_to,
-		       txq->port_id, txq->queue_id);
-		return -1;
-	}
-
-	if (last_desc_cleaned > desc_to_clean_to)
-		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) +
-					    desc_to_clean_to);
-	else
-		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
-					    last_desc_cleaned);
-
-	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
-
-	txq->last_desc_cleaned = desc_to_clean_to;
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
-
-	return 0;
-}
-
 /* TX function */
 RTE_EXPORT_INTERNAL_SYMBOL(idpf_dp_singleq_xmit_pkts)
 uint16_t
@@ -1404,7 +1364,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	/* Check if the descriptor ring needs to be cleaned. */
 	if (txq->nb_tx_free < txq->tx_free_thresh)
-		(void)idpf_xmit_cleanup(txq);
+		(void)ci_tx_xmit_cleanup(txq);
 
 	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
 		td_cmd = 0;
@@ -1437,14 +1397,14 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		       txq->port_id, txq->queue_id, tx_id, tx_last);
 
 		if (nb_used > txq->nb_tx_free) {
-			if (idpf_xmit_cleanup(txq) != 0) {
+			if (ci_tx_xmit_cleanup(txq) != 0) {
 				if (nb_tx == 0)
 					return 0;
 				goto end_of_tx;
 			}
 			if (unlikely(nb_used > txq->tx_rs_thresh)) {
 				while (nb_used > txq->nb_tx_free) {
-					if (idpf_xmit_cleanup(txq) != 0) {
+					if (ci_tx_xmit_cleanup(txq) != 0) {
 						if (nb_tx == 0)
 							return 0;
 						goto end_of_tx;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v4 04/35] net/intel: consolidate definitions for Tx desc fields
  2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (2 preceding siblings ...)
  2026-02-09 16:45   ` [PATCH v4 03/35] net/intel: create common post-Tx cleanup function Bruce Richardson
@ 2026-02-09 16:45   ` Bruce Richardson
  2026-02-10 12:26     ` Burakov, Anatoly
  2026-02-09 16:45   ` [PATCH v4 05/35] net/intel: add common fn to calculate needed descriptors Bruce Richardson
                     ` (30 subsequent siblings)
  34 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 16:45 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Vladimir Medvedkin, Anatoly Burakov,
	Jingjing Wu, Praveen Shetty

The offsets of the various fields within the Tx descriptors are common
for i40e, iavf, ice and idpf, so put a single set of defines in tx.h and
use those throughout all drivers. (NOTE: there was a small difference in
mask of CMD field between drivers depending on whether reserved fields
or not were included. Those can be ignored as those bits are unused in
the drivers for which they are reserved). Similarly, the various flag
fields, such as End-of-packet (EOP) and Report-status (RS) are the same,
as are offload definitions so consolidate them.

Original definitions are in base code, and are left in place because of
that, but are unused.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h                 |  60 ++++++++
 drivers/net/intel/common/tx_scalar.h          |   6 +-
 drivers/net/intel/i40e/i40e_fdir.c            |  24 +--
 drivers/net/intel/i40e/i40e_rxtx.c            |  92 ++++++------
 drivers/net/intel/i40e/i40e_rxtx.h            |  17 +--
 .../net/intel/i40e/i40e_rxtx_vec_altivec.c    |  11 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c   |  22 ++-
 drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c |  38 ++---
 drivers/net/intel/i40e/i40e_rxtx_vec_common.h |   4 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_neon.c   |  11 +-
 drivers/net/intel/iavf/iavf_rxtx.c            |  68 +++++----
 drivers/net/intel/iavf/iavf_rxtx.h            |  20 +--
 drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c   |  41 ++----
 drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c |  80 ++++------
 drivers/net/intel/iavf/iavf_rxtx_vec_common.h |  34 ++---
 drivers/net/intel/ice/ice_dcf_ethdev.c        |   2 +-
 drivers/net/intel/ice/ice_rxtx.c              | 137 ++++++++----------
 drivers/net/intel/ice/ice_rxtx.h              |  15 +-
 drivers/net/intel/ice/ice_rxtx_vec_avx2.c     |  41 ++----
 drivers/net/intel/ice/ice_rxtx_vec_avx512.c   |  39 ++---
 drivers/net/intel/ice/ice_rxtx_vec_common.h   |  41 +++---
 drivers/net/intel/idpf/idpf_common_rxtx.c     |  22 +--
 drivers/net/intel/idpf/idpf_common_rxtx.h     |  12 --
 .../net/intel/idpf/idpf_common_rxtx_avx2.c    |  41 ++----
 .../net/intel/idpf/idpf_common_rxtx_avx512.c  |  41 ++----
 drivers/net/intel/idpf/idpf_rxtx_vec_common.h |   4 +-
 26 files changed, 409 insertions(+), 514 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 558c861df0..091f220f1c 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -10,6 +10,66 @@
 #include <rte_ethdev.h>
 #include <rte_vect.h>
 
+/* Common TX Descriptor QW1 Field Definitions */
+#define CI_TXD_QW1_DTYPE_S      0
+#define CI_TXD_QW1_DTYPE_M      (0xFUL << CI_TXD_QW1_DTYPE_S)
+#define CI_TXD_QW1_CMD_S        4
+#define CI_TXD_QW1_CMD_M        (0xFFFUL << CI_TXD_QW1_CMD_S)
+#define CI_TXD_QW1_OFFSET_S     16
+#define CI_TXD_QW1_OFFSET_M     (0x3FFFFULL << CI_TXD_QW1_OFFSET_S)
+#define CI_TXD_QW1_TX_BUF_SZ_S  34
+#define CI_TXD_QW1_TX_BUF_SZ_M  (0x3FFFULL << CI_TXD_QW1_TX_BUF_SZ_S)
+#define CI_TXD_QW1_L2TAG1_S     48
+#define CI_TXD_QW1_L2TAG1_M     (0xFFFFULL << CI_TXD_QW1_L2TAG1_S)
+
+/* Common Descriptor Types */
+#define CI_TX_DESC_DTYPE_DATA           0x0
+#define CI_TX_DESC_DTYPE_CTX            0x1
+#define CI_TX_DESC_DTYPE_DESC_DONE      0xF
+
+/* Common TX Descriptor Command Flags */
+#define CI_TX_DESC_CMD_EOP              0x0001
+#define CI_TX_DESC_CMD_RS               0x0002
+#define CI_TX_DESC_CMD_ICRC             0x0004
+#define CI_TX_DESC_CMD_IL2TAG1          0x0008
+#define CI_TX_DESC_CMD_DUMMY            0x0010
+#define CI_TX_DESC_CMD_IIPT_IPV6        0x0020
+#define CI_TX_DESC_CMD_IIPT_IPV4        0x0040
+#define CI_TX_DESC_CMD_IIPT_IPV4_CSUM   0x0060
+#define CI_TX_DESC_CMD_L4T_EOFT_TCP     0x0100
+#define CI_TX_DESC_CMD_L4T_EOFT_SCTP    0x0200
+#define CI_TX_DESC_CMD_L4T_EOFT_UDP     0x0300
+
+/* Common TX Context Descriptor Commands */
+#define CI_TX_CTX_DESC_TSO              0x01
+#define CI_TX_CTX_DESC_TSYN             0x02
+#define CI_TX_CTX_DESC_IL2TAG2          0x04
+
+/* Common TX Descriptor Length Field Shifts */
+#define CI_TX_DESC_LEN_MACLEN_S         0  /* 7 BITS */
+#define CI_TX_DESC_LEN_IPLEN_S          7  /* 7 BITS */
+#define CI_TX_DESC_LEN_L4_LEN_S         14 /* 4 BITS */
+
+/* Common maximum data per TX descriptor */
+#define CI_MAX_DATA_PER_TXD     (CI_TXD_QW1_TX_BUF_SZ_M >> CI_TXD_QW1_TX_BUF_SZ_S)
+
+/**
+ * Common TX offload union for Intel drivers.
+ * Supports both basic offloads (l2_len, l3_len, l4_len, tso_segsz) and
+ * extended offloads (outer_l2_len, outer_l3_len) for tunneling support.
+ */
+union ci_tx_offload {
+	uint64_t data;
+	struct {
+		uint64_t l2_len:7;        /**< L2 (MAC) Header Length. */
+		uint64_t l3_len:9;        /**< L3 (IP) Header Length. */
+		uint64_t l4_len:8;        /**< L4 Header Length. */
+		uint64_t tso_segsz:16;    /**< TCP TSO segment size */
+		uint64_t outer_l2_len:8;  /**< outer L2 Header Length */
+		uint64_t outer_l3_len:16; /**< outer L3 Header Length */
+	};
+};
+
 /*
  * Structure of a 16-byte Tx descriptor common across i40e, ice, iavf and idpf drivers
  */
diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index 181629d856..6f2024273b 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -33,10 +33,10 @@ ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
 	if (desc_to_clean_to >= nb_tx_desc)
 		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
 
-	/* Check if descriptor is done - all drivers use 0xF as done value in bits 3:0 */
+	/* Check if descriptor is done */
 	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-	if ((txd[desc_to_clean_to].cmd_type_offset_bsz & rte_cpu_to_le_64(0x0FUL)) !=
-			rte_cpu_to_le_64(0x0FUL))
+	if ((txd[desc_to_clean_to].cmd_type_offset_bsz & rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
 		return -1;
 
 	/* Figure out how many descriptors will be cleaned */
diff --git a/drivers/net/intel/i40e/i40e_fdir.c b/drivers/net/intel/i40e/i40e_fdir.c
index 8a01aec0e2..3b099d5a9e 100644
--- a/drivers/net/intel/i40e/i40e_fdir.c
+++ b/drivers/net/intel/i40e/i40e_fdir.c
@@ -916,11 +916,11 @@ i40e_build_ctob(uint32_t td_cmd,
 		unsigned int size,
 		uint32_t td_tag)
 {
-	return rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DATA |
-			((uint64_t)td_cmd  << I40E_TXD_QW1_CMD_SHIFT) |
-			((uint64_t)td_offset << I40E_TXD_QW1_OFFSET_SHIFT) |
-			((uint64_t)size  << I40E_TXD_QW1_TX_BUF_SZ_SHIFT) |
-			((uint64_t)td_tag  << I40E_TXD_QW1_L2TAG1_SHIFT));
+	return rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+			((uint64_t)td_cmd  << CI_TXD_QW1_CMD_S) |
+			((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+			((uint64_t)size  << CI_TXD_QW1_TX_BUF_SZ_S) |
+			((uint64_t)td_tag  << CI_TXD_QW1_L2TAG1_S));
 }
 
 /*
@@ -1384,8 +1384,8 @@ i40e_find_available_buffer(struct rte_eth_dev *dev)
 
 		do {
 			if ((tmp_txdp->cmd_type_offset_bsz &
-					rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) ==
-					rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE))
+					rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+					rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
 				fdir_info->txq_available_buf_count++;
 			else
 				break;
@@ -1710,9 +1710,9 @@ i40e_flow_fdir_filter_programming(struct i40e_pf *pf,
 	txdp = &txq->ci_tx_ring[txq->tx_tail + 1];
 	txdp->buffer_addr = rte_cpu_to_le_64(pf->fdir.dma_addr[txq->tx_tail >> 1]);
 
-	td_cmd = I40E_TX_DESC_CMD_EOP |
-		 I40E_TX_DESC_CMD_RS  |
-		 I40E_TX_DESC_CMD_DUMMY;
+	td_cmd = CI_TX_DESC_CMD_EOP |
+		 CI_TX_DESC_CMD_RS  |
+		 CI_TX_DESC_CMD_DUMMY;
 
 	txdp->cmd_type_offset_bsz =
 		i40e_build_ctob(td_cmd, 0, I40E_FDIR_PKT_LEN, 0);
@@ -1731,8 +1731,8 @@ i40e_flow_fdir_filter_programming(struct i40e_pf *pf,
 	if (wait_status) {
 		for (i = 0; i < I40E_FDIR_MAX_WAIT_US; i++) {
 			if ((txdp->cmd_type_offset_bsz &
-					rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) ==
-					rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE))
+					rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+					rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
 				break;
 			rte_delay_us(1);
 		}
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 2760e76e99..f96c5c7f1e 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -45,7 +45,7 @@
 /* Base address of the HW descriptor ring should be 128B aligned. */
 #define I40E_RING_BASE_ALIGN	128
 
-#define I40E_TXD_CMD (I40E_TX_DESC_CMD_EOP | I40E_TX_DESC_CMD_RS)
+#define I40E_TXD_CMD (CI_TX_DESC_CMD_EOP | CI_TX_DESC_CMD_RS)
 
 #ifdef RTE_LIBRTE_IEEE1588
 #define I40E_TX_IEEE1588_TMST RTE_MBUF_F_TX_IEEE1588_TMST
@@ -260,7 +260,7 @@ i40e_rxd_build_fdir(volatile union ci_rx_desc *rxdp, struct rte_mbuf *mb)
 
 static inline void
 i40e_parse_tunneling_params(uint64_t ol_flags,
-			    union i40e_tx_offload tx_offload,
+			    union ci_tx_offload tx_offload,
 			    uint32_t *cd_tunneling)
 {
 	/* EIPT: External (outer) IP header type */
@@ -319,51 +319,51 @@ static inline void
 i40e_txd_enable_checksum(uint64_t ol_flags,
 			uint32_t *td_cmd,
 			uint32_t *td_offset,
-			union i40e_tx_offload tx_offload)
+			union ci_tx_offload tx_offload)
 {
 	/* Set MACLEN */
 	if (!(ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK))
 		*td_offset |= (tx_offload.l2_len >> 1)
-			<< I40E_TX_DESC_LENGTH_MACLEN_SHIFT;
+			<< CI_TX_DESC_LEN_MACLEN_S;
 
 	/* Enable L3 checksum offloads */
 	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
-		*td_cmd |= I40E_TX_DESC_CMD_IIPT_IPV4_CSUM;
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
 		*td_offset |= (tx_offload.l3_len >> 2)
-				<< I40E_TX_DESC_LENGTH_IPLEN_SHIFT;
+				<< CI_TX_DESC_LEN_IPLEN_S;
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
-		*td_cmd |= I40E_TX_DESC_CMD_IIPT_IPV4;
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
 		*td_offset |= (tx_offload.l3_len >> 2)
-				<< I40E_TX_DESC_LENGTH_IPLEN_SHIFT;
+				<< CI_TX_DESC_LEN_IPLEN_S;
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
-		*td_cmd |= I40E_TX_DESC_CMD_IIPT_IPV6;
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
 		*td_offset |= (tx_offload.l3_len >> 2)
-				<< I40E_TX_DESC_LENGTH_IPLEN_SHIFT;
+				<< CI_TX_DESC_LEN_IPLEN_S;
 	}
 
 	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
-		*td_cmd |= I40E_TX_DESC_CMD_L4T_EOFT_TCP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
 		*td_offset |= (tx_offload.l4_len >> 2)
-			<< I40E_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+			<< CI_TX_DESC_LEN_L4_LEN_S;
 		return;
 	}
 
 	/* Enable L4 checksum offloads */
 	switch (ol_flags & RTE_MBUF_F_TX_L4_MASK) {
 	case RTE_MBUF_F_TX_TCP_CKSUM:
-		*td_cmd |= I40E_TX_DESC_CMD_L4T_EOFT_TCP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
 		*td_offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
-				I40E_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+				CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_SCTP_CKSUM:
-		*td_cmd |= I40E_TX_DESC_CMD_L4T_EOFT_SCTP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
 		*td_offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
-				I40E_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+				CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_UDP_CKSUM:
-		*td_cmd |= I40E_TX_DESC_CMD_L4T_EOFT_UDP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
 		*td_offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
-				I40E_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+				CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	default:
 		break;
@@ -377,11 +377,11 @@ i40e_build_ctob(uint32_t td_cmd,
 		unsigned int size,
 		uint32_t td_tag)
 {
-	return rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DATA |
-			((uint64_t)td_cmd  << I40E_TXD_QW1_CMD_SHIFT) |
-			((uint64_t)td_offset << I40E_TXD_QW1_OFFSET_SHIFT) |
-			((uint64_t)size  << I40E_TXD_QW1_TX_BUF_SZ_SHIFT) |
-			((uint64_t)td_tag  << I40E_TXD_QW1_L2TAG1_SHIFT));
+	return rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+			((uint64_t)td_cmd  << CI_TXD_QW1_CMD_S) |
+			((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+			((uint64_t)size  << CI_TXD_QW1_TX_BUF_SZ_S) |
+			((uint64_t)td_tag  << CI_TXD_QW1_L2TAG1_S));
 }
 
 static inline int
@@ -1004,7 +1004,7 @@ i40e_calc_context_desc(uint64_t flags)
 
 /* set i40e TSO context descriptor */
 static inline uint64_t
-i40e_set_tso_ctx(struct rte_mbuf *mbuf, union i40e_tx_offload tx_offload)
+i40e_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 {
 	uint64_t ctx_desc = 0;
 	uint32_t cd_cmd, hdr_len, cd_tso_len;
@@ -1029,9 +1029,6 @@ i40e_set_tso_ctx(struct rte_mbuf *mbuf, union i40e_tx_offload tx_offload)
 	return ctx_desc;
 }
 
-/* HW requires that Tx buffer size ranges from 1B up to (16K-1)B. */
-#define I40E_MAX_DATA_PER_TXD \
-	(I40E_TXD_QW1_TX_BUF_SZ_MASK >> I40E_TXD_QW1_TX_BUF_SZ_SHIFT)
 /* Calculate the number of TX descriptors needed for each pkt */
 static inline uint16_t
 i40e_calc_pkt_desc(struct rte_mbuf *tx_pkt)
@@ -1040,7 +1037,7 @@ i40e_calc_pkt_desc(struct rte_mbuf *tx_pkt)
 	uint16_t count = 0;
 
 	while (txd != NULL) {
-		count += DIV_ROUND_UP(txd->data_len, I40E_MAX_DATA_PER_TXD);
+		count += DIV_ROUND_UP(txd->data_len, CI_MAX_DATA_PER_TXD);
 		txd = txd->next;
 	}
 
@@ -1069,7 +1066,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	uint16_t tx_last;
 	uint16_t slen;
 	uint64_t buf_dma_addr;
-	union i40e_tx_offload tx_offload = {0};
+	union ci_tx_offload tx_offload = {0};
 
 	txq = tx_queue;
 	sw_ring = txq->sw_ring;
@@ -1138,18 +1135,18 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		/* Descriptor based VLAN insertion */
 		if (ol_flags & (RTE_MBUF_F_TX_VLAN | RTE_MBUF_F_TX_QINQ)) {
-			td_cmd |= I40E_TX_DESC_CMD_IL2TAG1;
+			td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
 			td_tag = tx_pkt->vlan_tci;
 		}
 
 		/* Always enable CRC offload insertion */
-		td_cmd |= I40E_TX_DESC_CMD_ICRC;
+		td_cmd |= CI_TX_DESC_CMD_ICRC;
 
 		/* Fill in tunneling parameters if necessary */
 		cd_tunneling_params = 0;
 		if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) {
 			td_offset |= (tx_offload.outer_l2_len >> 1)
-					<< I40E_TX_DESC_LENGTH_MACLEN_SHIFT;
+					<< CI_TX_DESC_LEN_MACLEN_S;
 			i40e_parse_tunneling_params(ol_flags, tx_offload,
 						    &cd_tunneling_params);
 		}
@@ -1229,16 +1226,16 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			buf_dma_addr = rte_mbuf_data_iova(m_seg);
 
 			while ((ol_flags & RTE_MBUF_F_TX_TCP_SEG) &&
-				unlikely(slen > I40E_MAX_DATA_PER_TXD)) {
+				unlikely(slen > CI_MAX_DATA_PER_TXD)) {
 				txd->buffer_addr =
 					rte_cpu_to_le_64(buf_dma_addr);
 				txd->cmd_type_offset_bsz =
 					i40e_build_ctob(td_cmd,
-					td_offset, I40E_MAX_DATA_PER_TXD,
+					td_offset, CI_MAX_DATA_PER_TXD,
 					td_tag);
 
-				buf_dma_addr += I40E_MAX_DATA_PER_TXD;
-				slen -= I40E_MAX_DATA_PER_TXD;
+				buf_dma_addr += CI_MAX_DATA_PER_TXD;
+				slen -= CI_MAX_DATA_PER_TXD;
 
 				txe->last_id = tx_last;
 				tx_id = txe->next_id;
@@ -1265,7 +1262,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		} while (m_seg != NULL);
 
 		/* The last packet data descriptor needs End Of Packet (EOP) */
-		td_cmd |= I40E_TX_DESC_CMD_EOP;
+		td_cmd |= CI_TX_DESC_CMD_EOP;
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
@@ -1275,15 +1272,14 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 				   "%4u (port=%d queue=%d)",
 				   tx_last, txq->port_id, txq->queue_id);
 
-			td_cmd |= I40E_TX_DESC_CMD_RS;
+			td_cmd |= CI_TX_DESC_CMD_RS;
 
 			/* Update txq RS bit counters */
 			txq->nb_tx_used = 0;
 		}
 
 		txd->cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)td_cmd) <<
-					I40E_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S);
 	}
 
 end_of_tx:
@@ -1309,8 +1305,8 @@ i40e_tx_free_bufs(struct ci_tx_queue *txq)
 	const uint16_t m = tx_rs_thresh % I40E_TX_MAX_FREE_BUF_SZ;
 
 	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) !=
-			rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE))
+			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
 		return 0;
 
 	txep = &txq->sw_ring[txq->tx_next_dd - (tx_rs_thresh - 1)];
@@ -1441,8 +1437,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 		n = (uint16_t)(txq->nb_tx_desc - txq->tx_tail);
 		i40e_tx_fill_hw_ring(txq, tx_pkts, n);
 		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
-						I40E_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 		txq->tx_tail = 0;
 	}
@@ -1454,8 +1449,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	/* Determine if RS bit needs to be set */
 	if (txq->tx_tail > txq->tx_next_rs) {
 		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
-						I40E_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 		if (txq->tx_next_rs >= txq->nb_tx_desc)
@@ -2383,9 +2377,9 @@ i40e_dev_tx_descriptor_status(void *tx_queue, uint16_t offset)
 	}
 
 	status = &txq->ci_tx_ring[desc].cmd_type_offset_bsz;
-	mask = rte_le_to_cpu_64(I40E_TXD_QW1_DTYPE_MASK);
+	mask = rte_le_to_cpu_64(CI_TXD_QW1_DTYPE_M);
 	expect = rte_cpu_to_le_64(
-		I40E_TX_DESC_DTYPE_DESC_DONE << I40E_TXD_QW1_DTYPE_SHIFT);
+		CI_TX_DESC_DTYPE_DESC_DONE << CI_TXD_QW1_DTYPE_S);
 	if ((*status & mask) == expect)
 		return RTE_ETH_TX_DESC_DONE;
 
@@ -2883,7 +2877,7 @@ i40e_reset_tx_queue(struct ci_tx_queue *txq)
 		volatile struct ci_tx_desc *txd = &txq->ci_tx_ring[i];
 
 		txd->cmd_type_offset_bsz =
-			rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
 		txe[prev].next_id = i;
diff --git a/drivers/net/intel/i40e/i40e_rxtx.h b/drivers/net/intel/i40e/i40e_rxtx.h
index ed173d8f17..307ffa3049 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.h
+++ b/drivers/net/intel/i40e/i40e_rxtx.h
@@ -47,8 +47,8 @@
 #define I40E_RX_DESC_EXT_STATUS_FLEXBL_MASK   0x03
 #define I40E_RX_DESC_EXT_STATUS_FLEXBL_FLEX   0x01
 
-#define I40E_TD_CMD (I40E_TX_DESC_CMD_ICRC |\
-		     I40E_TX_DESC_CMD_EOP)
+#define I40E_TD_CMD (CI_TX_DESC_CMD_ICRC |\
+		     CI_TX_DESC_CMD_EOP)
 
 enum i40e_header_split_mode {
 	i40e_header_split_none = 0,
@@ -110,19 +110,6 @@ enum i40e_header_split_mode {
 
 #define I40E_TX_VECTOR_OFFLOADS RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE
 
-/** Offload features */
-union i40e_tx_offload {
-	uint64_t data;
-	struct {
-		uint64_t l2_len:7; /**< L2 (MAC) Header Length. */
-		uint64_t l3_len:9; /**< L3 (IP) Header Length. */
-		uint64_t l4_len:8; /**< L4 Header Length. */
-		uint64_t tso_segsz:16; /**< TCP TSO segment size */
-		uint64_t outer_l2_len:8; /**< outer L2 Header Length */
-		uint64_t outer_l3_len:16; /**< outer L3 Header Length */
-	};
-};
-
 int i40e_dev_rx_queue_start(struct rte_eth_dev *dev, uint16_t rx_queue_id);
 int i40e_dev_rx_queue_stop(struct rte_eth_dev *dev, uint16_t rx_queue_id);
 int i40e_dev_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id);
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
index 81e9e2bc0b..4c36748d94 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
@@ -449,9 +449,9 @@ static inline void
 vtx1(volatile struct ci_tx_desc *txdp,
 	struct rte_mbuf *pkt, uint64_t flags)
 {
-	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
-		((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT) |
-		((uint64_t)pkt->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+		((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	__vector unsigned long descriptor = (__vector unsigned long){
 		pkt->buf_iova + pkt->data_off, high_qw};
@@ -477,7 +477,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
-	uint64_t rs = I40E_TX_DESC_CMD_RS | I40E_TD_CMD;
+	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
 	int i;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
@@ -520,8 +520,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
-						I40E_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
index f054bd41bf..502a1842c6 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
@@ -684,9 +684,9 @@ static inline void
 vtx1(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf *pkt, uint64_t flags)
 {
-	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT) |
-			((uint64_t)pkt->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+			((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+			((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	__m128i descriptor = _mm_set_epi64x(high_qw,
 				pkt->buf_iova + pkt->data_off);
@@ -697,8 +697,7 @@ static inline void
 vtx(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
-	const uint64_t hi_qw_tmpl = (I40E_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags << CI_TXD_QW1_CMD_S));
 
 	/* if unaligned on 32-bit boundary, do one to align */
 	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
@@ -709,13 +708,13 @@ vtx(volatile struct ci_tx_desc *txdp,
 	/* do two at a time while possible, in bursts */
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
 		uint64_t hi_qw3 = hi_qw_tmpl |
-				((uint64_t)pkt[3]->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
+				((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		uint64_t hi_qw2 = hi_qw_tmpl |
-				((uint64_t)pkt[2]->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
+				((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		uint64_t hi_qw1 = hi_qw_tmpl |
-				((uint64_t)pkt[1]->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
+				((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		uint64_t hi_qw0 = hi_qw_tmpl |
-				((uint64_t)pkt[0]->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
+				((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 
 		__m256i desc2_3 = _mm256_set_epi64x(
 				hi_qw3, pkt[3]->buf_iova + pkt[3]->data_off,
@@ -743,7 +742,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
-	uint64_t rs = I40E_TX_DESC_CMD_RS | I40E_TD_CMD;
+	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
 		ci_tx_free_bufs_vec(txq, i40e_tx_desc_done, false);
@@ -785,8 +784,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
-						I40E_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
index 9a967faeee..d48ff9f51e 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
@@ -752,9 +752,9 @@ i40e_recv_scattered_pkts_vec_avx512(void *rx_queue,
 static inline void
 vtx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags)
 {
-	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
-		((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT) |
-		((uint64_t)pkt->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+		((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	__m128i descriptor = _mm_set_epi64x(high_qw,
 				pkt->buf_iova + pkt->data_off);
@@ -765,26 +765,17 @@ static inline void
 vtx(volatile struct ci_tx_desc *txdp,
 	struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
-	const uint64_t hi_qw_tmpl = (I40E_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags << CI_TXD_QW1_CMD_S));
 
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
-		uint64_t hi_qw3 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[3]->data_len <<
-			 I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
-		uint64_t hi_qw2 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[2]->data_len <<
-			 I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
-		uint64_t hi_qw1 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[1]->data_len <<
-			 I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
-		uint64_t hi_qw0 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[0]->data_len <<
-			 I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
+		uint64_t hi_qw3 = hi_qw_tmpl |
+			((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw2 = hi_qw_tmpl |
+			((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw1 = hi_qw_tmpl |
+			((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw0 = hi_qw_tmpl |
+			((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 
 		__m512i desc0_3 =
 			_mm512_set_epi64
@@ -811,7 +802,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
-	uint64_t rs = I40E_TX_DESC_CMD_RS | I40E_TD_CMD;
+	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
 		ci_tx_free_bufs_vec(txq, i40e_tx_desc_done, false);
@@ -854,8 +845,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
-						I40E_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_common.h b/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
index 1fd7fc75bf..292a39501e 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
@@ -16,8 +16,8 @@ static inline int
 i40e_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 {
 	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) ==
-				rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+				rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 }
 
 static inline void
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
index 0b95152232..be4c64942e 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
@@ -600,9 +600,9 @@ static inline void
 vtx1(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf *pkt, uint64_t flags)
 {
-	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT) |
-			((uint64_t)pkt->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+			((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+			((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off, high_qw};
 	vst1q_u64(RTE_CAST_PTR(uint64_t *, txdp), descriptor);
@@ -627,7 +627,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict tx_queue,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
-	uint64_t rs = I40E_TX_DESC_CMD_RS | I40E_TD_CMD;
+	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
 	int i;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
@@ -669,8 +669,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict tx_queue,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
-						I40E_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index 560abfc1ef..947b6c24d2 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -274,7 +274,7 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
 		txq->ci_tx_ring[i].cmd_type_offset_bsz =
-			rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
 		txe[prev].next_id = i;
@@ -2351,12 +2351,12 @@ iavf_fill_ctx_desc_cmd_field(volatile uint64_t *field, struct rte_mbuf *m,
 
 	/* TSO enabled */
 	if (m->ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
-		cmd = IAVF_TX_CTX_DESC_TSO << IAVF_TXD_CTX_QW1_CMD_SHIFT;
+		cmd = CI_TX_CTX_DESC_TSO << IAVF_TXD_CTX_QW1_CMD_SHIFT;
 
 	if ((m->ol_flags & RTE_MBUF_F_TX_VLAN &&
 			vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2) ||
 			m->ol_flags & RTE_MBUF_F_TX_QINQ) {
-		cmd |= IAVF_TX_CTX_DESC_IL2TAG2
+		cmd |= CI_TX_CTX_DESC_IL2TAG2
 			<< IAVF_TXD_CTX_QW1_CMD_SHIFT;
 	}
 
@@ -2577,20 +2577,20 @@ iavf_build_data_desc_cmd_offset_fields(volatile uint64_t *qw1,
 	uint64_t offset = 0;
 	uint64_t l2tag1 = 0;
 
-	*qw1 = IAVF_TX_DESC_DTYPE_DATA;
+	*qw1 = CI_TX_DESC_DTYPE_DATA;
 
-	command = (uint64_t)IAVF_TX_DESC_CMD_ICRC;
+	command = (uint64_t)CI_TX_DESC_CMD_ICRC;
 
 	/* Descriptor based VLAN insertion */
 	if ((vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1) &&
 			m->ol_flags & RTE_MBUF_F_TX_VLAN) {
-		command |= (uint64_t)IAVF_TX_DESC_CMD_IL2TAG1;
+		command |= (uint64_t)CI_TX_DESC_CMD_IL2TAG1;
 		l2tag1 |= m->vlan_tci;
 	}
 
 	/* Descriptor based QinQ insertion. vlan_flag specifies outer tag location. */
 	if (m->ol_flags & RTE_MBUF_F_TX_QINQ) {
-		command |= (uint64_t)IAVF_TX_DESC_CMD_IL2TAG1;
+		command |= (uint64_t)CI_TX_DESC_CMD_IL2TAG1;
 		l2tag1 = vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1 ? m->vlan_tci_outer :
 									m->vlan_tci;
 	}
@@ -2603,32 +2603,32 @@ iavf_build_data_desc_cmd_offset_fields(volatile uint64_t *qw1,
 	if (m->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK &&
 			!(m->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD))
 		offset |= (m->outer_l2_len >> 1)
-			<< IAVF_TX_DESC_LENGTH_MACLEN_SHIFT;
+			<< CI_TX_DESC_LEN_MACLEN_S;
 	else
 		offset |= (m->l2_len >> 1)
-			<< IAVF_TX_DESC_LENGTH_MACLEN_SHIFT;
+			<< CI_TX_DESC_LEN_MACLEN_S;
 
 	/* Enable L3 checksum offloading inner */
 	if (m->ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
 		if (m->ol_flags & RTE_MBUF_F_TX_IPV4) {
-			command |= IAVF_TX_DESC_CMD_IIPT_IPV4_CSUM;
-			offset |= (m->l3_len >> 2) << IAVF_TX_DESC_LENGTH_IPLEN_SHIFT;
+			command |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
+			offset |= (m->l3_len >> 2) << CI_TX_DESC_LEN_IPLEN_S;
 		}
 	} else if (m->ol_flags & RTE_MBUF_F_TX_IPV4) {
-		command |= IAVF_TX_DESC_CMD_IIPT_IPV4;
-		offset |= (m->l3_len >> 2) << IAVF_TX_DESC_LENGTH_IPLEN_SHIFT;
+		command |= CI_TX_DESC_CMD_IIPT_IPV4;
+		offset |= (m->l3_len >> 2) << CI_TX_DESC_LEN_IPLEN_S;
 	} else if (m->ol_flags & RTE_MBUF_F_TX_IPV6) {
-		command |= IAVF_TX_DESC_CMD_IIPT_IPV6;
-		offset |= (m->l3_len >> 2) << IAVF_TX_DESC_LENGTH_IPLEN_SHIFT;
+		command |= CI_TX_DESC_CMD_IIPT_IPV6;
+		offset |= (m->l3_len >> 2) << CI_TX_DESC_LEN_IPLEN_S;
 	}
 
 	if (m->ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) {
 		if (m->ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-			command |= IAVF_TX_DESC_CMD_L4T_EOFT_TCP;
+			command |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
 		else
-			command |= IAVF_TX_DESC_CMD_L4T_EOFT_UDP;
+			command |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
 		offset |= (m->l4_len >> 2) <<
-			      IAVF_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+			      CI_TX_DESC_LEN_L4_LEN_S;
 
 		*qw1 = rte_cpu_to_le_64((((uint64_t)command <<
 			IAVF_TXD_DATA_QW1_CMD_SHIFT) & IAVF_TXD_DATA_QW1_CMD_MASK) |
@@ -2642,19 +2642,19 @@ iavf_build_data_desc_cmd_offset_fields(volatile uint64_t *qw1,
 	/* Enable L4 checksum offloads */
 	switch (m->ol_flags & RTE_MBUF_F_TX_L4_MASK) {
 	case RTE_MBUF_F_TX_TCP_CKSUM:
-		command |= IAVF_TX_DESC_CMD_L4T_EOFT_TCP;
+		command |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
 		offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
-				IAVF_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+				CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_SCTP_CKSUM:
-		command |= IAVF_TX_DESC_CMD_L4T_EOFT_SCTP;
+		command |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
 		offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
-				IAVF_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+				CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_UDP_CKSUM:
-		command |= IAVF_TX_DESC_CMD_L4T_EOFT_UDP;
+		command |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
 		offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
-				IAVF_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+				CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	}
 
@@ -2674,8 +2674,7 @@ iavf_calc_pkt_desc(struct rte_mbuf *tx_pkt)
 	uint16_t count = 0;
 
 	while (txd != NULL) {
-		count += (txd->data_len + IAVF_MAX_DATA_PER_TXD - 1) /
-			IAVF_MAX_DATA_PER_TXD;
+		count += (txd->data_len + CI_MAX_DATA_PER_TXD - 1) / CI_MAX_DATA_PER_TXD;
 		txd = txd->next;
 	}
 
@@ -2881,14 +2880,14 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			buf_dma_addr = rte_mbuf_data_iova(mb_seg);
 			while ((mb_seg->ol_flags & (RTE_MBUF_F_TX_TCP_SEG |
 					RTE_MBUF_F_TX_UDP_SEG)) &&
-					unlikely(slen > IAVF_MAX_DATA_PER_TXD)) {
+					unlikely(slen > CI_MAX_DATA_PER_TXD)) {
 				iavf_fill_data_desc(ddesc, ddesc_template,
-					IAVF_MAX_DATA_PER_TXD, buf_dma_addr);
+					CI_MAX_DATA_PER_TXD, buf_dma_addr);
 
 				IAVF_DUMP_TX_DESC(txq, ddesc, desc_idx);
 
-				buf_dma_addr += IAVF_MAX_DATA_PER_TXD;
-				slen -= IAVF_MAX_DATA_PER_TXD;
+				buf_dma_addr += CI_MAX_DATA_PER_TXD;
+				slen -= CI_MAX_DATA_PER_TXD;
 
 				txe->last_id = desc_idx_last;
 				desc_idx = txe->next_id;
@@ -2909,7 +2908,7 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		} while (mb_seg);
 
 		/* The last packet data descriptor needs End Of Packet (EOP) */
-		ddesc_cmd = IAVF_TX_DESC_CMD_EOP;
+		ddesc_cmd = CI_TX_DESC_CMD_EOP;
 
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_desc_required);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_desc_required);
@@ -2919,7 +2918,7 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 				   "%4u (port=%d queue=%d)",
 				   desc_idx_last, txq->port_id, txq->queue_id);
 
-			ddesc_cmd |= IAVF_TX_DESC_CMD_RS;
+			ddesc_cmd |= CI_TX_DESC_CMD_RS;
 
 			/* Update txq RS bit counters */
 			txq->nb_tx_used = 0;
@@ -4423,9 +4422,8 @@ iavf_dev_tx_desc_status(void *tx_queue, uint16_t offset)
 	}
 
 	status = &txq->ci_tx_ring[desc].cmd_type_offset_bsz;
-	mask = rte_le_to_cpu_64(IAVF_TXD_QW1_DTYPE_MASK);
-	expect = rte_cpu_to_le_64(
-		 IAVF_TX_DESC_DTYPE_DESC_DONE << IAVF_TXD_QW1_DTYPE_SHIFT);
+	mask = rte_le_to_cpu_64(CI_TXD_QW1_DTYPE_M);
+	expect = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE << CI_TXD_QW1_DTYPE_S);
 	if ((*status & mask) == expect)
 		return RTE_ETH_TX_DESC_DONE;
 
diff --git a/drivers/net/intel/iavf/iavf_rxtx.h b/drivers/net/intel/iavf/iavf_rxtx.h
index dd6d884fc1..395d97b4ee 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.h
+++ b/drivers/net/intel/iavf/iavf_rxtx.h
@@ -162,10 +162,6 @@
 #define IAVF_TX_OFFLOAD_NOTSUP_MASK \
 		(RTE_MBUF_F_TX_OFFLOAD_MASK ^ IAVF_TX_OFFLOAD_MASK)
 
-/* HW requires that TX buffer size ranges from 1B up to (16K-1)B. */
-#define IAVF_MAX_DATA_PER_TXD \
-	(IAVF_TXD_QW1_TX_BUF_SZ_MASK >> IAVF_TXD_QW1_TX_BUF_SZ_SHIFT)
-
 #define IAVF_TX_LLDP_DYNFIELD "intel_pmd_dynfield_tx_lldp"
 #define IAVF_CHECK_TX_LLDP(m) \
 	((rte_pmd_iavf_tx_lldp_dynfield_offset > 0) && \
@@ -195,18 +191,6 @@ struct iavf_rx_queue_stats {
 	struct iavf_ipsec_crypto_stats ipsec_crypto;
 };
 
-/* Offload features */
-union iavf_tx_offload {
-	uint64_t data;
-	struct {
-		uint64_t l2_len:7; /* L2 (MAC) Header Length. */
-		uint64_t l3_len:9; /* L3 (IP) Header Length. */
-		uint64_t l4_len:8; /* L4 Header Length. */
-		uint64_t tso_segsz:16; /* TCP TSO segment size */
-		/* uint64_t unused : 24; */
-	};
-};
-
 /* Rx Flex Descriptor
  * RxDID Profile ID 16-21
  * Flex-field 0: RSS hash lower 16-bits
@@ -409,7 +393,7 @@ enum iavf_rx_flex_desc_ipsec_crypto_status {
 
 
 #define IAVF_TXD_DATA_QW1_DTYPE_SHIFT	(0)
-#define IAVF_TXD_DATA_QW1_DTYPE_MASK	(0xFUL << IAVF_TXD_QW1_DTYPE_SHIFT)
+#define IAVF_TXD_DATA_QW1_DTYPE_MASK	(0xFUL << CI_TXD_QW1_DTYPE_S)
 
 #define IAVF_TXD_DATA_QW1_CMD_SHIFT	(4)
 #define IAVF_TXD_DATA_QW1_CMD_MASK	(0x3FFUL << IAVF_TXD_DATA_QW1_CMD_SHIFT)
@@ -686,7 +670,7 @@ void iavf_dump_tx_descriptor(const struct ci_tx_queue *txq,
 		rte_le_to_cpu_64(tx_desc->cmd_type_offset_bsz &
 			rte_cpu_to_le_64(IAVF_TXD_DATA_QW1_DTYPE_MASK));
 	switch (type) {
-	case IAVF_TX_DESC_DTYPE_DATA:
+	case CI_TX_DESC_DTYPE_DATA:
 		name = "Tx_data_desc";
 		break;
 	case IAVF_TX_DESC_DTYPE_CONTEXT:
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
index 89ce841b9e..cea4ee9863 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
@@ -1633,10 +1633,9 @@ static __rte_always_inline void
 iavf_vtx1(volatile struct ci_tx_desc *txdp,
 	  struct rte_mbuf *pkt, uint64_t flags, bool offload, uint8_t vlan_flag)
 {
-	uint64_t high_qw =
-		(IAVF_TX_DESC_DTYPE_DATA |
-		 ((uint64_t)flags  << IAVF_TXD_QW1_CMD_SHIFT) |
-		 ((uint64_t)pkt->data_len << IAVF_TXD_QW1_TX_BUF_SZ_SHIFT));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		 ((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+		 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 	if (offload)
 		iavf_txd_enable_offload(pkt, &high_qw, vlan_flag);
 
@@ -1649,8 +1648,7 @@ static __rte_always_inline void
 iavf_vtx(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags, bool offload, uint8_t vlan_flag)
 {
-	const uint64_t hi_qw_tmpl = (IAVF_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << IAVF_TXD_QW1_CMD_SHIFT));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags << CI_TXD_QW1_CMD_S));
 
 	/* if unaligned on 32-bit boundary, do one to align */
 	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
@@ -1660,28 +1658,20 @@ iavf_vtx(volatile struct ci_tx_desc *txdp,
 
 	/* do two at a time while possible, in bursts */
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
-		uint64_t hi_qw3 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[3]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
+		uint64_t hi_qw3 = hi_qw_tmpl |
+			((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			iavf_txd_enable_offload(pkt[3], &hi_qw3, vlan_flag);
-		uint64_t hi_qw2 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[2]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
+		uint64_t hi_qw2 = hi_qw_tmpl |
+			((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			iavf_txd_enable_offload(pkt[2], &hi_qw2, vlan_flag);
-		uint64_t hi_qw1 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[1]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
+		uint64_t hi_qw1 = hi_qw_tmpl |
+			((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			iavf_txd_enable_offload(pkt[1], &hi_qw1, vlan_flag);
-		uint64_t hi_qw0 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[0]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
+		uint64_t hi_qw0 = hi_qw_tmpl |
+			((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			iavf_txd_enable_offload(pkt[0], &hi_qw0, vlan_flag);
 
@@ -1717,8 +1707,8 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	/* bit2 is reserved and must be set to 1 according to Spec */
-	uint64_t flags = IAVF_TX_DESC_CMD_EOP | IAVF_TX_DESC_CMD_ICRC;
-	uint64_t rs = IAVF_TX_DESC_CMD_RS | flags;
+	uint64_t flags = CI_TX_DESC_CMD_EOP | CI_TX_DESC_CMD_ICRC;
+	uint64_t rs = CI_TX_DESC_CMD_RS | flags;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
 		ci_tx_free_bufs_vec(txq, iavf_tx_desc_done, false);
@@ -1761,8 +1751,7 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
-					 IAVF_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c b/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
index ad1b0b90cd..01477fd501 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
@@ -1844,10 +1844,9 @@ iavf_vtx1(volatile struct ci_tx_desc *txdp,
 	  struct rte_mbuf *pkt, uint64_t flags,
 	  bool offload, uint8_t vlan_flag)
 {
-	uint64_t high_qw =
-		(IAVF_TX_DESC_DTYPE_DATA |
-		 ((uint64_t)flags  << IAVF_TXD_QW1_CMD_SHIFT) |
-		 ((uint64_t)pkt->data_len << IAVF_TXD_QW1_TX_BUF_SZ_SHIFT));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		 ((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+		 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 	if (offload)
 		iavf_txd_enable_offload(pkt, &high_qw, vlan_flag);
 
@@ -1863,8 +1862,7 @@ iavf_vtx(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags,
 		bool offload, uint8_t vlan_flag)
 {
-	const uint64_t hi_qw_tmpl = (IAVF_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << IAVF_TXD_QW1_CMD_SHIFT));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags << CI_TXD_QW1_CMD_S));
 
 	/* if unaligned on 32-bit boundary, do one to align */
 	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
@@ -1874,22 +1872,14 @@ iavf_vtx(volatile struct ci_tx_desc *txdp,
 
 	/* do 4 at a time while possible, in bursts */
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
-		uint64_t hi_qw3 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[3]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
-		uint64_t hi_qw2 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[2]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
-		uint64_t hi_qw1 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[1]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
-		uint64_t hi_qw0 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[0]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
+		uint64_t hi_qw3 = hi_qw_tmpl |
+			((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw2 = hi_qw_tmpl |
+			((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw1 = hi_qw_tmpl |
+			((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw0 = hi_qw_tmpl |
+			((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload) {
 			iavf_txd_enable_offload(pkt[3], &hi_qw3, vlan_flag);
 			iavf_txd_enable_offload(pkt[2], &hi_qw2, vlan_flag);
@@ -2093,9 +2083,9 @@ ctx_vtx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf *pkt,
 	if (IAVF_CHECK_TX_LLDP(pkt))
 		high_ctx_qw |= IAVF_TX_CTX_DESC_SWTCH_UPLINK
 			<< IAVF_TXD_CTX_QW1_CMD_SHIFT;
-	uint64_t high_data_qw = (IAVF_TX_DESC_DTYPE_DATA |
-				((uint64_t)flags  << IAVF_TXD_QW1_CMD_SHIFT) |
-				((uint64_t)pkt->data_len << IAVF_TXD_QW1_TX_BUF_SZ_SHIFT));
+	uint64_t high_data_qw = (CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+				((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 	if (offload)
 		iavf_txd_enable_offload(pkt, &high_data_qw, vlan_flag);
 
@@ -2110,8 +2100,7 @@ ctx_vtx(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags,
 		bool offload, uint8_t vlan_flag)
 {
-	uint64_t hi_data_qw_tmpl = (IAVF_TX_DESC_DTYPE_DATA |
-					((uint64_t)flags  << IAVF_TXD_QW1_CMD_SHIFT));
+	uint64_t hi_data_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags << CI_TXD_QW1_CMD_S));
 
 	/* if unaligned on 32-bit boundary, do one to align */
 	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
@@ -2128,11 +2117,9 @@ ctx_vtx(volatile struct ci_tx_desc *txdp,
 		uint64_t hi_data_qw0 = 0;
 
 		hi_data_qw1 = hi_data_qw_tmpl |
-				((uint64_t)pkt[1]->data_len <<
-					IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
+				((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		hi_data_qw0 = hi_data_qw_tmpl |
-				((uint64_t)pkt[0]->data_len <<
-					IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
+				((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 
 #ifdef IAVF_TX_VLAN_QINQ_OFFLOAD
 		if (offload) {
@@ -2140,13 +2127,11 @@ ctx_vtx(volatile struct ci_tx_desc *txdp,
 				uint64_t qinq_tag = vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2 ?
 					(uint64_t)pkt[1]->vlan_tci :
 					(uint64_t)pkt[1]->vlan_tci_outer;
-				hi_ctx_qw1 |= IAVF_TX_CTX_DESC_IL2TAG2 <<
-						IAVF_TXD_CTX_QW1_CMD_SHIFT;
+				hi_ctx_qw1 |= CI_TX_CTX_DESC_IL2TAG2 << CI_TXD_QW1_CMD_S;
 				low_ctx_qw1 |= qinq_tag << IAVF_TXD_CTX_QW0_L2TAG2_PARAM;
 			} else if (pkt[1]->ol_flags & RTE_MBUF_F_TX_VLAN &&
 					vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2) {
-				hi_ctx_qw1 |=
-					IAVF_TX_CTX_DESC_IL2TAG2 << IAVF_TXD_CTX_QW1_CMD_SHIFT;
+				hi_ctx_qw1 |= IAVF_TX_CTX_DESC_IL2TAG2 << CI_TXD_QW1_CMD_S;
 				low_ctx_qw1 |=
 					(uint64_t)pkt[1]->vlan_tci << IAVF_TXD_CTX_QW0_L2TAG2_PARAM;
 			}
@@ -2154,7 +2139,7 @@ ctx_vtx(volatile struct ci_tx_desc *txdp,
 #endif
 		if (IAVF_CHECK_TX_LLDP(pkt[1]))
 			hi_ctx_qw1 |= IAVF_TX_CTX_DESC_SWTCH_UPLINK
-				<< IAVF_TXD_CTX_QW1_CMD_SHIFT;
+				<< CI_TXD_QW1_CMD_S;
 
 #ifdef IAVF_TX_VLAN_QINQ_OFFLOAD
 		if (offload) {
@@ -2162,21 +2147,18 @@ ctx_vtx(volatile struct ci_tx_desc *txdp,
 				uint64_t qinq_tag = vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2 ?
 					(uint64_t)pkt[0]->vlan_tci :
 					(uint64_t)pkt[0]->vlan_tci_outer;
-				hi_ctx_qw0 |= IAVF_TX_CTX_DESC_IL2TAG2 <<
-						IAVF_TXD_CTX_QW1_CMD_SHIFT;
+				hi_ctx_qw0 |= IAVF_TX_CTX_DESC_IL2TAG2 << CI_TXD_QW1_CMD_S;
 				low_ctx_qw0 |= qinq_tag << IAVF_TXD_CTX_QW0_L2TAG2_PARAM;
 			} else if (pkt[0]->ol_flags & RTE_MBUF_F_TX_VLAN &&
 					vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2) {
-				hi_ctx_qw0 |=
-					IAVF_TX_CTX_DESC_IL2TAG2 << IAVF_TXD_CTX_QW1_CMD_SHIFT;
+				hi_ctx_qw0 |= IAVF_TX_CTX_DESC_IL2TAG2 << CI_TXD_QW1_CMD_S;
 				low_ctx_qw0 |=
 					(uint64_t)pkt[0]->vlan_tci << IAVF_TXD_CTX_QW0_L2TAG2_PARAM;
 			}
 		}
 #endif
 		if (IAVF_CHECK_TX_LLDP(pkt[0]))
-			hi_ctx_qw0 |= IAVF_TX_CTX_DESC_SWTCH_UPLINK
-				<< IAVF_TXD_CTX_QW1_CMD_SHIFT;
+			hi_ctx_qw0 |= IAVF_TX_CTX_DESC_SWTCH_UPLINK << CI_TXD_QW1_CMD_S;
 
 		if (offload) {
 			iavf_txd_enable_offload(pkt[1], &hi_data_qw1, vlan_flag);
@@ -2207,8 +2189,8 @@ iavf_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	/* bit2 is reserved and must be set to 1 according to Spec */
-	uint64_t flags = IAVF_TX_DESC_CMD_EOP | IAVF_TX_DESC_CMD_ICRC;
-	uint64_t rs = IAVF_TX_DESC_CMD_RS | flags;
+	uint64_t flags = CI_TX_DESC_CMD_EOP | CI_TX_DESC_CMD_ICRC;
+	uint64_t rs = CI_TX_DESC_CMD_RS | flags;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
 		ci_tx_free_bufs_vec(txq, iavf_tx_desc_done, false);
@@ -2253,8 +2235,7 @@ iavf_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
-					 IAVF_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
@@ -2275,8 +2256,8 @@ iavf_xmit_fixed_burst_vec_avx512_ctx(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, nb_mbuf, tx_id;
 	/* bit2 is reserved and must be set to 1 according to Spec */
-	uint64_t flags = IAVF_TX_DESC_CMD_EOP | IAVF_TX_DESC_CMD_ICRC;
-	uint64_t rs = IAVF_TX_DESC_CMD_RS | flags;
+	uint64_t flags = CI_TX_DESC_CMD_EOP | CI_TX_DESC_CMD_ICRC;
+	uint64_t rs = CI_TX_DESC_CMD_RS | flags;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
 		ci_tx_free_bufs_vec(txq, iavf_tx_desc_done, true);
@@ -2321,8 +2302,7 @@ iavf_xmit_fixed_burst_vec_avx512_ctx(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
-					 IAVF_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_common.h b/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
index 1832b76f89..1538a44892 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
@@ -15,8 +15,8 @@ static inline int
 iavf_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 {
 	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(IAVF_TXD_QW1_DTYPE_MASK)) ==
-				rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+				rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 }
 
 static inline void
@@ -147,26 +147,26 @@ iavf_txd_enable_offload(__rte_unused struct rte_mbuf *tx_pkt,
 	/* Set MACLEN */
 	if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)
 		td_offset |= (tx_pkt->outer_l2_len >> 1)
-			<< IAVF_TX_DESC_LENGTH_MACLEN_SHIFT;
+			<< CI_TX_DESC_LEN_MACLEN_S;
 	else
 		td_offset |= (tx_pkt->l2_len >> 1)
-			<< IAVF_TX_DESC_LENGTH_MACLEN_SHIFT;
+			<< CI_TX_DESC_LEN_MACLEN_S;
 
 	/* Enable L3 checksum offloads */
 	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
 		if (ol_flags & RTE_MBUF_F_TX_IPV4) {
-			td_cmd |= IAVF_TX_DESC_CMD_IIPT_IPV4_CSUM;
+			td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
 			td_offset |= (tx_pkt->l3_len >> 2) <<
-				     IAVF_TX_DESC_LENGTH_IPLEN_SHIFT;
+				     CI_TX_DESC_LEN_IPLEN_S;
 		}
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
-		td_cmd |= IAVF_TX_DESC_CMD_IIPT_IPV4;
+		td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
 		td_offset |= (tx_pkt->l3_len >> 2) <<
-			     IAVF_TX_DESC_LENGTH_IPLEN_SHIFT;
+			     CI_TX_DESC_LEN_IPLEN_S;
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
-		td_cmd |= IAVF_TX_DESC_CMD_IIPT_IPV6;
+		td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
 		td_offset |= (tx_pkt->l3_len >> 2) <<
-			     IAVF_TX_DESC_LENGTH_IPLEN_SHIFT;
+			     CI_TX_DESC_LEN_IPLEN_S;
 	}
 
 	/* Enable L4 checksum offloads */
@@ -190,7 +190,7 @@ iavf_txd_enable_offload(__rte_unused struct rte_mbuf *tx_pkt,
 		break;
 	}
 
-	*txd_hi |= ((uint64_t)td_offset) << IAVF_TXD_QW1_OFFSET_SHIFT;
+	*txd_hi |= ((uint64_t)td_offset) << CI_TXD_QW1_OFFSET_S;
 #endif
 
 #ifdef IAVF_TX_VLAN_QINQ_OFFLOAD
@@ -198,17 +198,15 @@ iavf_txd_enable_offload(__rte_unused struct rte_mbuf *tx_pkt,
 		td_cmd |= IAVF_TX_DESC_CMD_IL2TAG1;
 		/* vlan_flag specifies outer tag location for QinQ. */
 		if (vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1)
-			*txd_hi |= ((uint64_t)tx_pkt->vlan_tci_outer <<
-					IAVF_TXD_QW1_L2TAG1_SHIFT);
+			*txd_hi |= ((uint64_t)tx_pkt->vlan_tci_outer << CI_TXD_QW1_L2TAG1_S);
 		else
-			*txd_hi |= ((uint64_t)tx_pkt->vlan_tci <<
-					IAVF_TXD_QW1_L2TAG1_SHIFT);
+			*txd_hi |= ((uint64_t)tx_pkt->vlan_tci << CI_TXD_QW1_L2TAG1_S);
 	} else if (ol_flags & RTE_MBUF_F_TX_VLAN && vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1) {
-		td_cmd |= IAVF_TX_DESC_CMD_IL2TAG1;
-		*txd_hi |= ((uint64_t)tx_pkt->vlan_tci << IAVF_TXD_QW1_L2TAG1_SHIFT);
+		td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
+		*txd_hi |= ((uint64_t)tx_pkt->vlan_tci << CI_TXD_QW1_L2TAG1_S);
 	}
 #endif
 
-	*txd_hi |= ((uint64_t)td_cmd) << IAVF_TXD_QW1_CMD_SHIFT;
+	*txd_hi |= ((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S;
 }
 #endif
diff --git a/drivers/net/intel/ice/ice_dcf_ethdev.c b/drivers/net/intel/ice/ice_dcf_ethdev.c
index 5f537b4c12..4ceecc15c6 100644
--- a/drivers/net/intel/ice/ice_dcf_ethdev.c
+++ b/drivers/net/intel/ice/ice_dcf_ethdev.c
@@ -406,7 +406,7 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
 		txq->ci_tx_ring[i].cmd_type_offset_bsz =
-			rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
 		txe[prev].next_id = i;
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 7a33e1e980..52bbf95967 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -1124,7 +1124,7 @@ ice_reset_tx_queue(struct ci_tx_queue *txq)
 		volatile struct ci_tx_desc *txd = &txq->ci_tx_ring[i];
 
 		txd->cmd_type_offset_bsz =
-			rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
 		txe[prev].next_id = i;
@@ -2556,9 +2556,8 @@ ice_tx_descriptor_status(void *tx_queue, uint16_t offset)
 	}
 
 	status = &txq->ci_tx_ring[desc].cmd_type_offset_bsz;
-	mask = rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M);
-	expect = rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE <<
-				  ICE_TXD_QW1_DTYPE_S);
+	mask = rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M);
+	expect = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE << CI_TXD_QW1_DTYPE_S);
 	if ((*status & mask) == expect)
 		return RTE_ETH_TX_DESC_DONE;
 
@@ -2904,7 +2903,7 @@ ice_recv_pkts(void *rx_queue,
 
 static inline void
 ice_parse_tunneling_params(uint64_t ol_flags,
-			    union ice_tx_offload tx_offload,
+			    union ci_tx_offload tx_offload,
 			    uint32_t *cd_tunneling)
 {
 	/* EIPT: External (outer) IP header type */
@@ -2965,58 +2964,58 @@ static inline void
 ice_txd_enable_checksum(uint64_t ol_flags,
 			uint32_t *td_cmd,
 			uint32_t *td_offset,
-			union ice_tx_offload tx_offload)
+			union ci_tx_offload tx_offload)
 {
 	/* Set MACLEN */
 	if (!(ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK))
 		*td_offset |= (tx_offload.l2_len >> 1)
-			<< ICE_TX_DESC_LEN_MACLEN_S;
+			<< CI_TX_DESC_LEN_MACLEN_S;
 
 	/* Enable L3 checksum offloads */
 	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
-		*td_cmd |= ICE_TX_DESC_CMD_IIPT_IPV4_CSUM;
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
 		*td_offset |= (tx_offload.l3_len >> 2) <<
-			ICE_TX_DESC_LEN_IPLEN_S;
+			CI_TX_DESC_LEN_IPLEN_S;
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
-		*td_cmd |= ICE_TX_DESC_CMD_IIPT_IPV4;
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
 		*td_offset |= (tx_offload.l3_len >> 2) <<
-			ICE_TX_DESC_LEN_IPLEN_S;
+			CI_TX_DESC_LEN_IPLEN_S;
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
-		*td_cmd |= ICE_TX_DESC_CMD_IIPT_IPV6;
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
 		*td_offset |= (tx_offload.l3_len >> 2) <<
-			ICE_TX_DESC_LEN_IPLEN_S;
+			CI_TX_DESC_LEN_IPLEN_S;
 	}
 
 	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
-		*td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_TCP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
 		*td_offset |= (tx_offload.l4_len >> 2) <<
-			      ICE_TX_DESC_LEN_L4_LEN_S;
+			      CI_TX_DESC_LEN_L4_LEN_S;
 		return;
 	}
 
 	if (ol_flags & RTE_MBUF_F_TX_UDP_SEG) {
-		*td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_UDP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
 		*td_offset |= (tx_offload.l4_len >> 2) <<
-			      ICE_TX_DESC_LEN_L4_LEN_S;
+			      CI_TX_DESC_LEN_L4_LEN_S;
 		return;
 	}
 
 	/* Enable L4 checksum offloads */
 	switch (ol_flags & RTE_MBUF_F_TX_L4_MASK) {
 	case RTE_MBUF_F_TX_TCP_CKSUM:
-		*td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_TCP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
 		*td_offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
-			      ICE_TX_DESC_LEN_L4_LEN_S;
+			      CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_SCTP_CKSUM:
-		*td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_SCTP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
 		*td_offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
-			      ICE_TX_DESC_LEN_L4_LEN_S;
+			      CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_UDP_CKSUM:
-		*td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_UDP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
 		*td_offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
-			      ICE_TX_DESC_LEN_L4_LEN_S;
+			      CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	default:
 		break;
@@ -3030,11 +3029,11 @@ ice_build_ctob(uint32_t td_cmd,
 	       uint16_t size,
 	       uint32_t td_tag)
 {
-	return rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DATA |
-				((uint64_t)td_cmd << ICE_TXD_QW1_CMD_S) |
-				((uint64_t)td_offset << ICE_TXD_QW1_OFFSET_S) |
-				((uint64_t)size << ICE_TXD_QW1_TX_BUF_SZ_S) |
-				((uint64_t)td_tag << ICE_TXD_QW1_L2TAG1_S));
+	return rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
+				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+				((uint64_t)size << CI_TXD_QW1_TX_BUF_SZ_S) |
+				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
 }
 
 /* Check if the context descriptor is needed for TX offloading */
@@ -3053,7 +3052,7 @@ ice_calc_context_desc(uint64_t flags)
 
 /* set ice TSO context descriptor */
 static inline uint64_t
-ice_set_tso_ctx(struct rte_mbuf *mbuf, union ice_tx_offload tx_offload)
+ice_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 {
 	uint64_t ctx_desc = 0;
 	uint32_t cd_cmd, hdr_len, cd_tso_len;
@@ -3067,18 +3066,15 @@ ice_set_tso_ctx(struct rte_mbuf *mbuf, union ice_tx_offload tx_offload)
 	hdr_len += (mbuf->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) ?
 		   tx_offload.outer_l2_len + tx_offload.outer_l3_len : 0;
 
-	cd_cmd = ICE_TX_CTX_DESC_TSO;
+	cd_cmd = CI_TX_CTX_DESC_TSO;
 	cd_tso_len = mbuf->pkt_len - hdr_len;
-	ctx_desc |= ((uint64_t)cd_cmd << ICE_TXD_CTX_QW1_CMD_S) |
+	ctx_desc |= ((uint64_t)cd_cmd << CI_TXD_QW1_CMD_S) |
 		    ((uint64_t)cd_tso_len << ICE_TXD_CTX_QW1_TSO_LEN_S) |
 		    ((uint64_t)mbuf->tso_segsz << ICE_TXD_CTX_QW1_MSS_S);
 
 	return ctx_desc;
 }
 
-/* HW requires that TX buffer size ranges from 1B up to (16K-1)B. */
-#define ICE_MAX_DATA_PER_TXD \
-	(ICE_TXD_QW1_TX_BUF_SZ_M >> ICE_TXD_QW1_TX_BUF_SZ_S)
 /* Calculate the number of TX descriptors needed for each pkt */
 static inline uint16_t
 ice_calc_pkt_desc(struct rte_mbuf *tx_pkt)
@@ -3087,7 +3083,7 @@ ice_calc_pkt_desc(struct rte_mbuf *tx_pkt)
 	uint16_t count = 0;
 
 	while (txd != NULL) {
-		count += DIV_ROUND_UP(txd->data_len, ICE_MAX_DATA_PER_TXD);
+		count += DIV_ROUND_UP(txd->data_len, CI_MAX_DATA_PER_TXD);
 		txd = txd->next;
 	}
 
@@ -3117,7 +3113,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	uint16_t slen;
 	uint64_t buf_dma_addr;
 	uint64_t ol_flags;
-	union ice_tx_offload tx_offload = {0};
+	union ci_tx_offload tx_offload = {0};
 
 	txq = tx_queue;
 	sw_ring = txq->sw_ring;
@@ -3185,7 +3181,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		/* Descriptor based VLAN insertion */
 		if (ol_flags & (RTE_MBUF_F_TX_VLAN | RTE_MBUF_F_TX_QINQ)) {
-			td_cmd |= ICE_TX_DESC_CMD_IL2TAG1;
+			td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
 			td_tag = tx_pkt->vlan_tci;
 		}
 
@@ -3193,7 +3189,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		cd_tunneling_params = 0;
 		if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) {
 			td_offset |= (tx_offload.outer_l2_len >> 1)
-				<< ICE_TX_DESC_LEN_MACLEN_S;
+				<< CI_TX_DESC_LEN_MACLEN_S;
 			ice_parse_tunneling_params(ol_flags, tx_offload,
 						   &cd_tunneling_params);
 		}
@@ -3223,8 +3219,8 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 					ice_set_tso_ctx(tx_pkt, tx_offload);
 			else if (ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST)
 				cd_type_cmd_tso_mss |=
-					((uint64_t)ICE_TX_CTX_DESC_TSYN <<
-					ICE_TXD_CTX_QW1_CMD_S) |
+					((uint64_t)CI_TX_CTX_DESC_TSYN <<
+					CI_TXD_QW1_CMD_S) |
 					 (((uint64_t)txq->ice_vsi->adapter->ptp_tx_index <<
 					 ICE_TXD_CTX_QW1_TSYN_S) & ICE_TXD_CTX_QW1_TSYN_M);
 
@@ -3235,8 +3231,8 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			if (ol_flags & RTE_MBUF_F_TX_QINQ) {
 				cd_l2tag2 = tx_pkt->vlan_tci_outer;
 				cd_type_cmd_tso_mss |=
-					((uint64_t)ICE_TX_CTX_DESC_IL2TAG2 <<
-					 ICE_TXD_CTX_QW1_CMD_S);
+					((uint64_t)CI_TX_CTX_DESC_IL2TAG2 <<
+					 CI_TXD_QW1_CMD_S);
 			}
 			ctx_txd->l2tag2 = rte_cpu_to_le_16(cd_l2tag2);
 			ctx_txd->qw1 =
@@ -3261,18 +3257,16 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			buf_dma_addr = rte_mbuf_data_iova(m_seg);
 
 			while ((ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) &&
-				unlikely(slen > ICE_MAX_DATA_PER_TXD)) {
+					unlikely(slen > CI_MAX_DATA_PER_TXD)) {
 				txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-				txd->cmd_type_offset_bsz =
-				rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DATA |
-				((uint64_t)td_cmd << ICE_TXD_QW1_CMD_S) |
-				((uint64_t)td_offset << ICE_TXD_QW1_OFFSET_S) |
-				((uint64_t)ICE_MAX_DATA_PER_TXD <<
-				 ICE_TXD_QW1_TX_BUF_SZ_S) |
-				((uint64_t)td_tag << ICE_TXD_QW1_L2TAG1_S));
+				txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+					((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
+					((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+					((uint64_t)CI_MAX_DATA_PER_TXD << CI_TXD_QW1_TX_BUF_SZ_S) |
+					((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
 
-				buf_dma_addr += ICE_MAX_DATA_PER_TXD;
-				slen -= ICE_MAX_DATA_PER_TXD;
+				buf_dma_addr += CI_MAX_DATA_PER_TXD;
+				slen -= CI_MAX_DATA_PER_TXD;
 
 				txe->last_id = tx_last;
 				tx_id = txe->next_id;
@@ -3282,12 +3276,11 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			}
 
 			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-			txd->cmd_type_offset_bsz =
-				rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DATA |
-				((uint64_t)td_cmd << ICE_TXD_QW1_CMD_S) |
-				((uint64_t)td_offset << ICE_TXD_QW1_OFFSET_S) |
-				((uint64_t)slen << ICE_TXD_QW1_TX_BUF_SZ_S) |
-				((uint64_t)td_tag << ICE_TXD_QW1_L2TAG1_S));
+			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
+				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+				((uint64_t)slen << CI_TXD_QW1_TX_BUF_SZ_S) |
+				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
 
 			txe->last_id = tx_last;
 			tx_id = txe->next_id;
@@ -3296,7 +3289,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		} while (m_seg);
 
 		/* fill the last descriptor with End of Packet (EOP) bit */
-		td_cmd |= ICE_TX_DESC_CMD_EOP;
+		td_cmd |= CI_TX_DESC_CMD_EOP;
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
@@ -3307,14 +3300,13 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 				   "%4u (port=%d queue=%d)",
 				   tx_last, txq->port_id, txq->queue_id);
 
-			td_cmd |= ICE_TX_DESC_CMD_RS;
+			td_cmd |= CI_TX_DESC_CMD_RS;
 
 			/* Update txq RS bit counters */
 			txq->nb_tx_used = 0;
 		}
 		txd->cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)td_cmd) <<
-					 ICE_TXD_QW1_CMD_S);
+			rte_cpu_to_le_64(((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S);
 
 		if (txq->tsq != NULL && txq->tsq->ts_flag > 0) {
 			uint64_t txtime = *RTE_MBUF_DYNFIELD(tx_pkt,
@@ -3361,8 +3353,8 @@ ice_tx_free_bufs(struct ci_tx_queue *txq)
 	uint16_t i;
 
 	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
-	     rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M)) !=
-	    rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE))
+	     rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
+	    rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
 		return 0;
 
 	txep = &txq->sw_ring[txq->tx_next_dd - (txq->tx_rs_thresh - 1)];
@@ -3598,8 +3590,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 		n = (uint16_t)(txq->nb_tx_desc - txq->tx_tail);
 		ice_tx_fill_hw_ring(txq, tx_pkts, n);
 		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS) <<
-					 ICE_TXD_QW1_CMD_S);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 		txq->tx_tail = 0;
 	}
@@ -3611,8 +3602,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	/* Determine if RS bit needs to be set */
 	if (txq->tx_tail > txq->tx_next_rs) {
 		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS) <<
-					 ICE_TXD_QW1_CMD_S);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 		if (txq->tx_next_rs >= txq->nb_tx_desc)
@@ -4843,9 +4833,9 @@ ice_fdir_programming(struct ice_pf *pf, struct ice_fltr_desc *fdir_desc)
 
 	txdp = &txq->ci_tx_ring[txq->tx_tail + 1];
 	txdp->buffer_addr = rte_cpu_to_le_64(pf->fdir.dma_addr);
-	td_cmd = ICE_TX_DESC_CMD_EOP |
-		ICE_TX_DESC_CMD_RS  |
-		ICE_TX_DESC_CMD_DUMMY;
+	td_cmd = CI_TX_DESC_CMD_EOP |
+		CI_TX_DESC_CMD_RS  |
+		CI_TX_DESC_CMD_DUMMY;
 
 	txdp->cmd_type_offset_bsz =
 		ice_build_ctob(td_cmd, 0, ICE_FDIR_PKT_LEN, 0);
@@ -4856,9 +4846,8 @@ ice_fdir_programming(struct ice_pf *pf, struct ice_fltr_desc *fdir_desc)
 	/* Update the tx tail register */
 	ICE_PCI_REG_WRITE(txq->qtx_tail, txq->tx_tail);
 	for (i = 0; i < ICE_FDIR_MAX_WAIT_US; i++) {
-		if ((txdp->cmd_type_offset_bsz &
-		     rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M)) ==
-		    rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE))
+		if ((txdp->cmd_type_offset_bsz & rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+		    rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
 			break;
 		rte_delay_us(1);
 	}
diff --git a/drivers/net/intel/ice/ice_rxtx.h b/drivers/net/intel/ice/ice_rxtx.h
index c524e9f756..cd5fa93d1c 100644
--- a/drivers/net/intel/ice/ice_rxtx.h
+++ b/drivers/net/intel/ice/ice_rxtx.h
@@ -46,7 +46,7 @@
 
 #define ICE_SUPPORT_CHAIN_NUM 5
 
-#define ICE_TD_CMD                      ICE_TX_DESC_CMD_EOP
+#define ICE_TD_CMD                      CI_TX_DESC_CMD_EOP
 
 #define ICE_VPMD_RX_BURST            CI_VPMD_RX_BURST
 #define ICE_VPMD_TX_BURST            32
@@ -169,19 +169,6 @@ struct ice_txtime {
 	const struct rte_memzone *ts_mz;
 };
 
-/* Offload features */
-union ice_tx_offload {
-	uint64_t data;
-	struct {
-		uint64_t l2_len:7; /* L2 (MAC) Header Length. */
-		uint64_t l3_len:9; /* L3 (IP) Header Length. */
-		uint64_t l4_len:8; /* L4 Header Length. */
-		uint64_t tso_segsz:16; /* TCP TSO segment size */
-		uint64_t outer_l2_len:8; /* outer L2 Header Length */
-		uint64_t outer_l3_len:16; /* outer L3 Header Length */
-	};
-};
-
 /* Rx Flex Descriptor for Comms Package Profile
  * RxDID Profile ID 22 (swap Hash and FlowID)
  * Flex-field 0: Flow ID lower 16-bits
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
index 0a1df0b2f6..2922671158 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
@@ -777,10 +777,9 @@ static __rte_always_inline void
 ice_vtx1(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf *pkt, uint64_t flags, bool offload)
 {
-	uint64_t high_qw =
-		(ICE_TX_DESC_DTYPE_DATA |
-		 ((uint64_t)flags  << ICE_TXD_QW1_CMD_S) |
-		 ((uint64_t)pkt->data_len << ICE_TXD_QW1_TX_BUF_SZ_S));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		 ((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+		 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 	if (offload)
 		ice_txd_enable_offload(pkt, &high_qw);
 
@@ -792,8 +791,7 @@ static __rte_always_inline void
 ice_vtx(volatile struct ci_tx_desc *txdp,
 	struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags, bool offload)
 {
-	const uint64_t hi_qw_tmpl = (ICE_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << ICE_TXD_QW1_CMD_S));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags << CI_TXD_QW1_CMD_S));
 
 	/* if unaligned on 32-bit boundary, do one to align */
 	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
@@ -801,30 +799,22 @@ ice_vtx(volatile struct ci_tx_desc *txdp,
 		nb_pkts--, txdp++, pkt++;
 	}
 
-	/* do two at a time while possible, in bursts */
+	/* do four at a time while possible, in bursts */
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
-		uint64_t hi_qw3 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[3]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw3 = hi_qw_tmpl |
+			((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			ice_txd_enable_offload(pkt[3], &hi_qw3);
-		uint64_t hi_qw2 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[2]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw2 = hi_qw_tmpl |
+			((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			ice_txd_enable_offload(pkt[2], &hi_qw2);
-		uint64_t hi_qw1 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[1]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw1 = hi_qw_tmpl |
+			((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			ice_txd_enable_offload(pkt[1], &hi_qw1);
-		uint64_t hi_qw0 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[0]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw0 = hi_qw_tmpl |
+			((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			ice_txd_enable_offload(pkt[0], &hi_qw0);
 
@@ -856,7 +846,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = ICE_TD_CMD;
-	uint64_t rs = ICE_TX_DESC_CMD_RS | ICE_TD_CMD;
+	uint64_t rs = CI_TX_DESC_CMD_RS | ICE_TD_CMD;
 
 	/* cross rx_thresh boundary is not allowed */
 	nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);
@@ -901,8 +891,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS) <<
-					 ICE_TXD_QW1_CMD_S);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
index d42f41461f..e64b6e227b 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
@@ -850,10 +850,9 @@ static __rte_always_inline void
 ice_vtx1(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf *pkt, uint64_t flags, bool do_offload)
 {
-	uint64_t high_qw =
-		(ICE_TX_DESC_DTYPE_DATA |
-		 ((uint64_t)flags  << ICE_TXD_QW1_CMD_S) |
-		 ((uint64_t)pkt->data_len << ICE_TXD_QW1_TX_BUF_SZ_S));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		 ((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+		 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	if (do_offload)
 		ice_txd_enable_offload(pkt, &high_qw);
@@ -866,32 +865,23 @@ static __rte_always_inline void
 ice_vtx(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkt,
 	uint16_t nb_pkts,  uint64_t flags, bool do_offload)
 {
-	const uint64_t hi_qw_tmpl = (ICE_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << ICE_TXD_QW1_CMD_S));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags << CI_TXD_QW1_CMD_S));
 
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
-		uint64_t hi_qw3 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[3]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw3 = hi_qw_tmpl |
+			((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (do_offload)
 			ice_txd_enable_offload(pkt[3], &hi_qw3);
-		uint64_t hi_qw2 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[2]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw2 = hi_qw_tmpl |
+			((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (do_offload)
 			ice_txd_enable_offload(pkt[2], &hi_qw2);
-		uint64_t hi_qw1 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[1]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw1 = hi_qw_tmpl |
+			((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (do_offload)
 			ice_txd_enable_offload(pkt[1], &hi_qw1);
-		uint64_t hi_qw0 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[0]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw0 = hi_qw_tmpl |
+			((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (do_offload)
 			ice_txd_enable_offload(pkt[0], &hi_qw0);
 
@@ -920,7 +910,7 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = ICE_TD_CMD;
-	uint64_t rs = ICE_TX_DESC_CMD_RS | ICE_TD_CMD;
+	uint64_t rs = CI_TX_DESC_CMD_RS | ICE_TD_CMD;
 
 	/* cross rx_thresh boundary is not allowed */
 	nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);
@@ -966,8 +956,7 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS) <<
-					 ICE_TXD_QW1_CMD_S);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_common.h b/drivers/net/intel/ice/ice_rxtx_vec_common.h
index 8ba591e403..1d83a087cc 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_common.h
+++ b/drivers/net/intel/ice/ice_rxtx_vec_common.h
@@ -12,8 +12,8 @@ static inline int
 ice_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 {
 	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M)) ==
-				rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+				rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 }
 
 static inline void
@@ -124,53 +124,52 @@ ice_txd_enable_offload(struct rte_mbuf *tx_pkt,
 	/* Tx Checksum Offload */
 	/* SET MACLEN */
 	td_offset |= (tx_pkt->l2_len >> 1) <<
-		ICE_TX_DESC_LEN_MACLEN_S;
+		CI_TX_DESC_LEN_MACLEN_S;
 
 	/* Enable L3 checksum offload */
 	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
-		td_cmd |= ICE_TX_DESC_CMD_IIPT_IPV4_CSUM;
+		td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
 		td_offset |= (tx_pkt->l3_len >> 2) <<
-			ICE_TX_DESC_LEN_IPLEN_S;
+			CI_TX_DESC_LEN_IPLEN_S;
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
-		td_cmd |= ICE_TX_DESC_CMD_IIPT_IPV4;
+		td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
 		td_offset |= (tx_pkt->l3_len >> 2) <<
-			ICE_TX_DESC_LEN_IPLEN_S;
+			CI_TX_DESC_LEN_IPLEN_S;
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
-		td_cmd |= ICE_TX_DESC_CMD_IIPT_IPV6;
+		td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
 		td_offset |= (tx_pkt->l3_len >> 2) <<
-			ICE_TX_DESC_LEN_IPLEN_S;
+			CI_TX_DESC_LEN_IPLEN_S;
 	}
 
 	/* Enable L4 checksum offloads */
 	switch (ol_flags & RTE_MBUF_F_TX_L4_MASK) {
 	case RTE_MBUF_F_TX_TCP_CKSUM:
-		td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_TCP;
+		td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
 		td_offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
-			ICE_TX_DESC_LEN_L4_LEN_S;
+			CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_SCTP_CKSUM:
-		td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_SCTP;
+		td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
 		td_offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
-			ICE_TX_DESC_LEN_L4_LEN_S;
+			CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_UDP_CKSUM:
-		td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_UDP;
+		td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
 		td_offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
-			ICE_TX_DESC_LEN_L4_LEN_S;
+			CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	default:
 		break;
 	}
 
-	*txd_hi |= ((uint64_t)td_offset) << ICE_TXD_QW1_OFFSET_S;
+	*txd_hi |= ((uint64_t)td_offset) << CI_TXD_QW1_OFFSET_S;
 
-	/* Tx VLAN insertion Offload */
+	/* Tx VLAN/QINQ insertion Offload */
 	if (ol_flags & RTE_MBUF_F_TX_VLAN) {
-		td_cmd |= ICE_TX_DESC_CMD_IL2TAG1;
-		*txd_hi |= ((uint64_t)tx_pkt->vlan_tci <<
-				ICE_TXD_QW1_L2TAG1_S);
+		td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
+		*txd_hi |= ((uint64_t)tx_pkt->vlan_tci << CI_TXD_QW1_L2TAG1_S);
 	}
 
-	*txd_hi |= ((uint64_t)td_cmd) << ICE_TXD_QW1_CMD_S;
+	*txd_hi |= ((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S;
 }
 #endif
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 23666539ab..587871b54a 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -271,7 +271,7 @@ idpf_qc_single_tx_queue_reset(struct ci_tx_queue *txq)
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
 		txq->ci_tx_ring[i].cmd_type_offset_bsz =
-			rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
 		txe[prev].next_id = i;
@@ -849,7 +849,7 @@ idpf_calc_context_desc(uint64_t flags)
  */
 static inline void
 idpf_set_splitq_tso_ctx(struct rte_mbuf *mbuf,
-			union idpf_tx_offload tx_offload,
+			union ci_tx_offload tx_offload,
 			volatile union idpf_flex_tx_ctx_desc *ctx_desc)
 {
 	uint16_t cmd_dtype;
@@ -887,7 +887,7 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	volatile struct idpf_flex_tx_sched_desc *txr;
 	volatile struct idpf_flex_tx_sched_desc *txd;
 	struct ci_tx_entry *sw_ring;
-	union idpf_tx_offload tx_offload = {0};
+	union ci_tx_offload tx_offload = {0};
 	struct ci_tx_entry *txe, *txn;
 	uint16_t nb_used, tx_id, sw_id;
 	struct rte_mbuf *tx_pkt;
@@ -1334,7 +1334,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 {
 	volatile struct ci_tx_desc *txd;
 	volatile struct ci_tx_desc *txr;
-	union idpf_tx_offload tx_offload = {0};
+	union ci_tx_offload tx_offload = {0};
 	struct ci_tx_entry *txe, *txn;
 	struct ci_tx_entry *sw_ring;
 	struct ci_tx_queue *txq;
@@ -1452,10 +1452,10 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			slen = m_seg->data_len;
 			buf_dma_addr = rte_mbuf_data_iova(m_seg);
 			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DATA |
-				((uint64_t)td_cmd  << IDPF_TXD_QW1_CMD_S) |
-				((uint64_t)td_offset << IDPF_TXD_QW1_OFFSET_S) |
-				((uint64_t)slen << IDPF_TXD_QW1_TX_BUF_SZ_S));
+			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)td_cmd  << CI_TXD_QW1_CMD_S) |
+				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+				((uint64_t)slen << CI_TXD_QW1_TX_BUF_SZ_S));
 
 			txe->last_id = tx_last;
 			tx_id = txe->next_id;
@@ -1464,7 +1464,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		} while (m_seg);
 
 		/* The last packet data descriptor needs End Of Packet (EOP) */
-		td_cmd |= IDPF_TX_DESC_CMD_EOP;
+		td_cmd |= CI_TX_DESC_CMD_EOP;
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
@@ -1473,13 +1473,13 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			       "%4u (port=%d queue=%d)",
 			       tx_last, txq->port_id, txq->queue_id);
 
-			td_cmd |= IDPF_TX_DESC_CMD_RS;
+			td_cmd |= CI_TX_DESC_CMD_RS;
 
 			/* Update txq RS bit counters */
 			txq->nb_tx_used = 0;
 		}
 
-		txd->cmd_type_offset_bsz |= rte_cpu_to_le_16(td_cmd << IDPF_TXD_QW1_CMD_S);
+		txd->cmd_type_offset_bsz |= rte_cpu_to_le_16(td_cmd << CI_TXD_QW1_CMD_S);
 	}
 
 end_of_tx:
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.h b/drivers/net/intel/idpf/idpf_common_rxtx.h
index 2f2fa153b2..b88a87402d 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.h
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.h
@@ -169,18 +169,6 @@ struct idpf_rx_queue {
 	uint32_t hw_register_set;
 };
 
-/* Offload features */
-union idpf_tx_offload {
-	uint64_t data;
-	struct {
-		uint64_t l2_len:7; /* L2 (MAC) Header Length. */
-		uint64_t l3_len:9; /* L3 (IP) Header Length. */
-		uint64_t l4_len:8; /* L4 Header Length. */
-		uint64_t tso_segsz:16; /* TCP TSO segment size */
-		/* uint64_t unused : 24; */
-	};
-};
-
 union idpf_tx_desc {
 	struct ci_tx_desc *tx_ring;
 	struct idpf_flex_tx_sched_desc *desc_ring;
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c b/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
index 04efee3722..411b171b97 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
@@ -486,10 +486,9 @@ static inline void
 idpf_singleq_vtx1(volatile struct ci_tx_desc *txdp,
 		  struct rte_mbuf *pkt, uint64_t flags)
 {
-	uint64_t high_qw =
-		(IDPF_TX_DESC_DTYPE_DATA |
-		 ((uint64_t)flags  << IDPF_TXD_QW1_CMD_S) |
-		 ((uint64_t)pkt->data_len << IDPF_TXD_QW1_TX_BUF_SZ_S));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		 ((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+		 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	__m128i descriptor = _mm_set_epi64x(high_qw,
 				pkt->buf_iova + pkt->data_off);
@@ -500,8 +499,7 @@ static inline void
 idpf_singleq_vtx(volatile struct ci_tx_desc *txdp,
 		 struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
-	const uint64_t hi_qw_tmpl = (IDPF_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << IDPF_TXD_QW1_CMD_S));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags << CI_TXD_QW1_CMD_S));
 
 	/* if unaligned on 32-bit boundary, do one to align */
 	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
@@ -511,22 +509,14 @@ idpf_singleq_vtx(volatile struct ci_tx_desc *txdp,
 
 	/* do two at a time while possible, in bursts */
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
-		uint64_t hi_qw3 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[3]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
-		uint64_t hi_qw2 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[2]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
-		uint64_t hi_qw1 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[1]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
-		uint64_t hi_qw0 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[0]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw3 = hi_qw_tmpl |
+			((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw2 = hi_qw_tmpl |
+			((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw1 = hi_qw_tmpl |
+			((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw0 = hi_qw_tmpl |
+			((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 
 		__m256i desc2_3 =
 			_mm256_set_epi64x
@@ -559,8 +549,8 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts
 	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
-	uint64_t flags = IDPF_TX_DESC_CMD_EOP;
-	uint64_t rs = IDPF_TX_DESC_CMD_RS | flags;
+	uint64_t flags = CI_TX_DESC_CMD_EOP;
+	uint64_t rs = CI_TX_DESC_CMD_RS | flags;
 
 	/* cross rx_thresh boundary is not allowed */
 	nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);
@@ -605,8 +595,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)IDPF_TX_DESC_CMD_RS) <<
-					 IDPF_TXD_QW1_CMD_S);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
index d5e5a2ca5f..49ace35615 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
@@ -1003,10 +1003,9 @@ static __rte_always_inline void
 idpf_singleq_vtx1(volatile struct ci_tx_desc *txdp,
 	  struct rte_mbuf *pkt, uint64_t flags)
 {
-	uint64_t high_qw =
-		(IDPF_TX_DESC_DTYPE_DATA |
-		 ((uint64_t)flags  << IDPF_TXD_QW1_CMD_S) |
-		 ((uint64_t)pkt->data_len << IDPF_TXD_QW1_TX_BUF_SZ_S));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		 ((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+		 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	__m128i descriptor = _mm_set_epi64x(high_qw,
 					    pkt->buf_iova + pkt->data_off);
@@ -1019,8 +1018,7 @@ static __rte_always_inline void
 idpf_singleq_vtx(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
-	const uint64_t hi_qw_tmpl = (IDPF_TX_DESC_DTYPE_DATA  |
-			((uint64_t)flags  << IDPF_TXD_QW1_CMD_S));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA  | (flags << CI_TXD_QW1_CMD_S));
 
 	/* if unaligned on 32-bit boundary, do one to align */
 	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
@@ -1030,22 +1028,14 @@ idpf_singleq_vtx(volatile struct ci_tx_desc *txdp,
 
 	/* do 4 at a time while possible, in bursts */
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
-		uint64_t hi_qw3 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[3]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
-		uint64_t hi_qw2 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[2]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
-		uint64_t hi_qw1 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[1]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
-		uint64_t hi_qw0 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[0]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw3 = hi_qw_tmpl |
+			((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw2 = hi_qw_tmpl |
+			((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw1 = hi_qw_tmpl |
+			((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw0 = hi_qw_tmpl |
+			((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 
 		__m512i desc0_3 =
 			_mm512_set_epi64
@@ -1075,8 +1065,8 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pk
 	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
-	uint64_t flags = IDPF_TX_DESC_CMD_EOP;
-	uint64_t rs = IDPF_TX_DESC_CMD_RS | flags;
+	uint64_t flags = CI_TX_DESC_CMD_EOP;
+	uint64_t rs = CI_TX_DESC_CMD_RS | flags;
 
 	/* cross rx_thresh boundary is not allowed */
 	nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);
@@ -1124,8 +1114,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pk
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)IDPF_TX_DESC_CMD_RS) <<
-					 IDPF_TXD_QW1_CMD_S);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/idpf/idpf_rxtx_vec_common.h b/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
index b5e8574667..a43d8f78e2 100644
--- a/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
+++ b/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
@@ -32,8 +32,8 @@ idpf_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 		return 1;
 
 	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(IDPF_TXD_QW1_DTYPE_M)) ==
-				rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+				rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 }
 
 static inline int
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v4 05/35] net/intel: add common fn to calculate needed descriptors
  2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (3 preceding siblings ...)
  2026-02-09 16:45   ` [PATCH v4 04/35] net/intel: consolidate definitions for Tx desc fields Bruce Richardson
@ 2026-02-09 16:45   ` Bruce Richardson
  2026-02-10 12:29     ` Burakov, Anatoly
  2026-02-09 16:45   ` [PATCH v4 06/35] net/ice: refactor context descriptor handling Bruce Richardson
                     ` (29 subsequent siblings)
  34 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 16:45 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Vladimir Medvedkin, Anatoly Burakov,
	Jingjing Wu, Praveen Shetty

Multiple drivers used the same logic to calculate how many Tx data
descriptors were needed. Move that calculation to common code. In the
process of updating drivers, fix idpf driver calculation for the TSO
case.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar.h      | 21 +++++++++++++++++++++
 drivers/net/intel/i40e/i40e_rxtx.c        | 18 +-----------------
 drivers/net/intel/iavf/iavf_rxtx.c        | 17 +----------------
 drivers/net/intel/ice/ice_rxtx.c          | 18 +-----------------
 drivers/net/intel/idpf/idpf_common_rxtx.c | 21 +++++++++++++++++----
 5 files changed, 41 insertions(+), 54 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index 6f2024273b..573f5136a9 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -59,4 +59,25 @@ ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
 	return 0;
 }
 
+static inline uint16_t
+ci_div_roundup16(uint16_t x, uint16_t y)
+{
+	return (uint16_t)((x + y - 1) / y);
+}
+
+/* Calculate the number of TX descriptors needed for each pkt */
+static inline uint16_t
+ci_calc_pkt_desc(const struct rte_mbuf *tx_pkt)
+{
+	uint16_t count = 0;
+
+	while (tx_pkt != NULL) {
+		count += ci_div_roundup16(tx_pkt->data_len, CI_MAX_DATA_PER_TXD);
+		tx_pkt = tx_pkt->next;
+	}
+
+	return count;
+}
+
+
 #endif /* _COMMON_INTEL_TX_SCALAR_H_ */
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index f96c5c7f1e..b75306931a 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -1029,21 +1029,6 @@ i40e_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 	return ctx_desc;
 }
 
-/* Calculate the number of TX descriptors needed for each pkt */
-static inline uint16_t
-i40e_calc_pkt_desc(struct rte_mbuf *tx_pkt)
-{
-	struct rte_mbuf *txd = tx_pkt;
-	uint16_t count = 0;
-
-	while (txd != NULL) {
-		count += DIV_ROUND_UP(txd->data_len, CI_MAX_DATA_PER_TXD);
-		txd = txd->next;
-	}
-
-	return count;
-}
-
 uint16_t
 i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
@@ -1106,8 +1091,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		 * per tx desc.
 		 */
 		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-			nb_used = (uint16_t)(i40e_calc_pkt_desc(tx_pkt) +
-					     nb_ctx);
+			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
 		else
 			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
 		tx_last = (uint16_t)(tx_id + nb_used - 1);
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index 947b6c24d2..885d9309cc 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -2666,21 +2666,6 @@ iavf_build_data_desc_cmd_offset_fields(volatile uint64_t *qw1,
 		((uint64_t)l2tag1 << IAVF_TXD_DATA_QW1_L2TAG1_SHIFT));
 }
 
-/* Calculate the number of TX descriptors needed for each pkt */
-static inline uint16_t
-iavf_calc_pkt_desc(struct rte_mbuf *tx_pkt)
-{
-	struct rte_mbuf *txd = tx_pkt;
-	uint16_t count = 0;
-
-	while (txd != NULL) {
-		count += (txd->data_len + CI_MAX_DATA_PER_TXD - 1) / CI_MAX_DATA_PER_TXD;
-		txd = txd->next;
-	}
-
-	return count;
-}
-
 static inline void
 iavf_fill_data_desc(volatile struct ci_tx_desc *desc,
 	uint64_t desc_template,	uint16_t buffsz,
@@ -2766,7 +2751,7 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		 * per tx desc.
 		 */
 		if (mb->ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-			nb_desc_required = iavf_calc_pkt_desc(mb) + nb_desc_ctx + nb_desc_ipsec;
+			nb_desc_required = ci_calc_pkt_desc(mb) + nb_desc_ctx + nb_desc_ipsec;
 		else
 			nb_desc_required = nb_desc_data + nb_desc_ctx + nb_desc_ipsec;
 
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 52bbf95967..2a53b614b2 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3075,21 +3075,6 @@ ice_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 	return ctx_desc;
 }
 
-/* Calculate the number of TX descriptors needed for each pkt */
-static inline uint16_t
-ice_calc_pkt_desc(struct rte_mbuf *tx_pkt)
-{
-	struct rte_mbuf *txd = tx_pkt;
-	uint16_t count = 0;
-
-	while (txd != NULL) {
-		count += DIV_ROUND_UP(txd->data_len, CI_MAX_DATA_PER_TXD);
-		txd = txd->next;
-	}
-
-	return count;
-}
-
 uint16_t
 ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
@@ -3152,8 +3137,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		 * per tx desc.
 		 */
 		if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
-			nb_used = (uint16_t)(ice_calc_pkt_desc(tx_pkt) +
-					     nb_ctx);
+			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
 		else
 			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
 		tx_last = (uint16_t)(tx_id + nb_used - 1);
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 587871b54a..11d6848430 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -934,7 +934,16 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		tx_offload.tso_segsz = tx_pkt->tso_segsz;
 		/* Calculate the number of context descriptors needed. */
 		nb_ctx = idpf_calc_context_desc(ol_flags);
-		nb_used = tx_pkt->nb_segs + nb_ctx;
+
+		/* Calculate the number of TX descriptors needed for
+		 * each packet. For TSO packets, use ci_calc_pkt_desc as
+		 * the mbuf data size might exceed max data size that hw allows
+		 * per tx desc.
+		 */
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+			nb_used = ci_calc_pkt_desc(tx_pkt) + nb_ctx;
+		else
+			nb_used = tx_pkt->nb_segs + nb_ctx;
 
 		if (ol_flags & IDPF_TX_CKSUM_OFFLOAD_MASK)
 			cmd_dtype = IDPF_TXD_FLEX_FLOW_CMD_CS_EN;
@@ -1382,10 +1391,14 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		nb_ctx = idpf_calc_context_desc(ol_flags);
 
 		/* The number of descriptors that must be allocated for
-		 * a packet equals to the number of the segments of that
-		 * packet plus 1 context descriptor if needed.
+		 * a packet. For TSO packets, use ci_calc_pkt_desc as
+		 * the mbuf data size might exceed max data size that hw allows
+		 * per tx desc.
 		 */
-		nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
+		else
+			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
 		tx_last = (uint16_t)(tx_id + nb_used - 1);
 
 		/* Circular ring */
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v4 06/35] net/ice: refactor context descriptor handling
  2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (4 preceding siblings ...)
  2026-02-09 16:45   ` [PATCH v4 05/35] net/intel: add common fn to calculate needed descriptors Bruce Richardson
@ 2026-02-09 16:45   ` Bruce Richardson
  2026-02-10 12:42     ` Burakov, Anatoly
  2026-02-09 16:45   ` [PATCH v4 07/35] net/i40e: " Bruce Richardson
                     ` (28 subsequent siblings)
  34 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 16:45 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

Create a single function to manage all context descriptor handling,
which returns either 0 or 1 depending on whether a descriptor is needed
or not, as well as returning directly the descriptor contents if
relevant.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/ice/ice_rxtx.c | 104 +++++++++++++++++--------------
 1 file changed, 57 insertions(+), 47 deletions(-)

diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 2a53b614b2..1c789d45da 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -2966,10 +2966,6 @@ ice_txd_enable_checksum(uint64_t ol_flags,
 			uint32_t *td_offset,
 			union ci_tx_offload tx_offload)
 {
-	/* Set MACLEN */
-	if (!(ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK))
-		*td_offset |= (tx_offload.l2_len >> 1)
-			<< CI_TX_DESC_LEN_MACLEN_S;
 
 	/* Enable L3 checksum offloads */
 	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
@@ -3052,7 +3048,7 @@ ice_calc_context_desc(uint64_t flags)
 
 /* set ice TSO context descriptor */
 static inline uint64_t
-ice_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
+ice_set_tso_ctx(uint64_t ol_flags, const struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 {
 	uint64_t ctx_desc = 0;
 	uint32_t cd_cmd, hdr_len, cd_tso_len;
@@ -3063,7 +3059,7 @@ ice_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 	}
 
 	hdr_len = tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len;
-	hdr_len += (mbuf->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) ?
+	hdr_len += (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) ?
 		   tx_offload.outer_l2_len + tx_offload.outer_l3_len : 0;
 
 	cd_cmd = CI_TX_CTX_DESC_TSO;
@@ -3075,6 +3071,49 @@ ice_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 	return ctx_desc;
 }
 
+/* compute a context descriptor if one is necessary based on the ol_flags
+ *
+ * Returns 0 if no descriptor is necessary.
+ * Returns 1 if one is necessary and the contents of the descriptor are returned
+ *   in the values pointed to by qw0 and qw1.
+ */
+static __rte_always_inline uint16_t
+get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
+	const union ci_tx_offload *tx_offload, const struct ci_tx_queue *txq,
+	uint64_t *qw0, uint64_t *qw1)
+{
+	uint16_t cd_l2tag2 = 0;
+	uint64_t cd_type_cmd_tso_mss = ICE_TX_DESC_DTYPE_CTX;
+	uint32_t cd_tunneling_params = 0;
+	uint64_t ptp_tx_index = txq->ice_vsi->adapter->ptp_tx_index;
+
+	if (ice_calc_context_desc(ol_flags) == 0)
+		return 0;
+
+	if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)
+		ice_parse_tunneling_params(ol_flags, *tx_offload, &cd_tunneling_params);
+
+	if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
+		cd_type_cmd_tso_mss |= ice_set_tso_ctx(ol_flags, tx_pkt, *tx_offload);
+	else if (ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST)
+		cd_type_cmd_tso_mss |=
+			((uint64_t)CI_TX_CTX_DESC_TSYN << CI_TXD_QW1_CMD_S) |
+			((ptp_tx_index << ICE_TXD_CTX_QW1_TSYN_S) & ICE_TXD_CTX_QW1_TSYN_M);
+
+
+	/* TX context descriptor based double VLAN insert */
+	if (ol_flags & RTE_MBUF_F_TX_QINQ) {
+		cd_l2tag2 = tx_pkt->vlan_tci_outer;
+		cd_type_cmd_tso_mss |= ((uint64_t)CI_TX_CTX_DESC_IL2TAG2 << CI_TXD_QW1_CMD_S);
+	}
+
+	*qw0 = rte_cpu_to_le_32(cd_tunneling_params) |
+		((uint64_t)rte_cpu_to_le_16(cd_l2tag2) << 32);
+	*qw1 = rte_cpu_to_le_64(cd_type_cmd_tso_mss);
+
+	return 1;
+}
+
 uint16_t
 ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
@@ -3085,7 +3124,6 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	struct ci_tx_entry *txe, *txn;
 	struct rte_mbuf *tx_pkt;
 	struct rte_mbuf *m_seg;
-	uint32_t cd_tunneling_params;
 	uint16_t tx_id;
 	uint16_t ts_id = -1;
 	uint16_t nb_tx;
@@ -3096,6 +3134,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	uint32_t td_tag = 0;
 	uint16_t tx_last;
 	uint16_t slen;
+	uint16_t l2_len;
 	uint64_t buf_dma_addr;
 	uint64_t ol_flags;
 	union ci_tx_offload tx_offload = {0};
@@ -3114,20 +3153,25 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		(void)ci_tx_xmit_cleanup(txq);
 
 	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
+		uint64_t cd_qw0, cd_qw1;
 		tx_pkt = *tx_pkts++;
 
+		ol_flags = tx_pkt->ol_flags;
 		td_cmd = 0;
 		td_tag = 0;
-		td_offset = 0;
-		ol_flags = tx_pkt->ol_flags;
+		l2_len = ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK ?
+				tx_pkt->outer_l2_len : tx_pkt->l2_len;
+		td_offset = (l2_len >> 1) << CI_TX_DESC_LEN_MACLEN_S;
+
 		tx_offload.l2_len = tx_pkt->l2_len;
 		tx_offload.l3_len = tx_pkt->l3_len;
 		tx_offload.outer_l2_len = tx_pkt->outer_l2_len;
 		tx_offload.outer_l3_len = tx_pkt->outer_l3_len;
 		tx_offload.l4_len = tx_pkt->l4_len;
 		tx_offload.tso_segsz = tx_pkt->tso_segsz;
+
 		/* Calculate the number of context descriptors needed. */
-		nb_ctx = ice_calc_context_desc(ol_flags);
+		nb_ctx = get_context_desc(ol_flags, tx_pkt, &tx_offload, txq, &cd_qw0, &cd_qw1);
 
 		/* The number of descriptors that must be allocated for
 		 * a packet equals to the number of the segments of that
@@ -3169,15 +3213,6 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			td_tag = tx_pkt->vlan_tci;
 		}
 
-		/* Fill in tunneling parameters if necessary */
-		cd_tunneling_params = 0;
-		if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) {
-			td_offset |= (tx_offload.outer_l2_len >> 1)
-				<< CI_TX_DESC_LEN_MACLEN_S;
-			ice_parse_tunneling_params(ol_flags, tx_offload,
-						   &cd_tunneling_params);
-		}
-
 		/* Enable checksum offloading */
 		if (ol_flags & ICE_TX_CKSUM_OFFLOAD_MASK)
 			ice_txd_enable_checksum(ol_flags, &td_cmd,
@@ -3185,11 +3220,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		if (nb_ctx) {
 			/* Setup TX context descriptor if required */
-			volatile struct ice_tx_ctx_desc *ctx_txd =
-				(volatile struct ice_tx_ctx_desc *)
-					&ci_tx_ring[tx_id];
-			uint16_t cd_l2tag2 = 0;
-			uint64_t cd_type_cmd_tso_mss = ICE_TX_DESC_DTYPE_CTX;
+			uint64_t *ctx_txd = RTE_CAST_PTR(uint64_t *, &ci_tx_ring[tx_id]);
 
 			txn = &sw_ring[txe->next_id];
 			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
@@ -3198,29 +3229,8 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 				txe->mbuf = NULL;
 			}
 
-			if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
-				cd_type_cmd_tso_mss |=
-					ice_set_tso_ctx(tx_pkt, tx_offload);
-			else if (ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST)
-				cd_type_cmd_tso_mss |=
-					((uint64_t)CI_TX_CTX_DESC_TSYN <<
-					CI_TXD_QW1_CMD_S) |
-					 (((uint64_t)txq->ice_vsi->adapter->ptp_tx_index <<
-					 ICE_TXD_CTX_QW1_TSYN_S) & ICE_TXD_CTX_QW1_TSYN_M);
-
-			ctx_txd->tunneling_params =
-				rte_cpu_to_le_32(cd_tunneling_params);
-
-			/* TX context descriptor based double VLAN insert */
-			if (ol_flags & RTE_MBUF_F_TX_QINQ) {
-				cd_l2tag2 = tx_pkt->vlan_tci_outer;
-				cd_type_cmd_tso_mss |=
-					((uint64_t)CI_TX_CTX_DESC_IL2TAG2 <<
-					 CI_TXD_QW1_CMD_S);
-			}
-			ctx_txd->l2tag2 = rte_cpu_to_le_16(cd_l2tag2);
-			ctx_txd->qw1 =
-				rte_cpu_to_le_64(cd_type_cmd_tso_mss);
+			ctx_txd[0] = cd_qw0;
+			ctx_txd[1] = cd_qw1;
 
 			txe->last_id = tx_last;
 			tx_id = txe->next_id;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v4 07/35] net/i40e: refactor context descriptor handling
  2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (5 preceding siblings ...)
  2026-02-09 16:45   ` [PATCH v4 06/35] net/ice: refactor context descriptor handling Bruce Richardson
@ 2026-02-09 16:45   ` Bruce Richardson
  2026-02-10 12:48     ` Burakov, Anatoly
  2026-02-09 16:45   ` [PATCH v4 08/35] net/idpf: " Bruce Richardson
                     ` (27 subsequent siblings)
  34 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 16:45 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

Move all context descriptor handling to a single function, as with the
ice driver, and use the same function signature as that driver.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/i40e/i40e_rxtx.c | 123 +++++++++++++++--------------
 1 file changed, 63 insertions(+), 60 deletions(-)

diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index b75306931a..601d4b98f2 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -321,11 +321,6 @@ i40e_txd_enable_checksum(uint64_t ol_flags,
 			uint32_t *td_offset,
 			union ci_tx_offload tx_offload)
 {
-	/* Set MACLEN */
-	if (!(ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK))
-		*td_offset |= (tx_offload.l2_len >> 1)
-			<< CI_TX_DESC_LEN_MACLEN_S;
-
 	/* Enable L3 checksum offloads */
 	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
 		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
@@ -1004,7 +999,7 @@ i40e_calc_context_desc(uint64_t flags)
 
 /* set i40e TSO context descriptor */
 static inline uint64_t
-i40e_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
+i40e_set_tso_ctx(uint64_t ol_flags, const struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 {
 	uint64_t ctx_desc = 0;
 	uint32_t cd_cmd, hdr_len, cd_tso_len;
@@ -1015,7 +1010,7 @@ i40e_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 	}
 
 	hdr_len = tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len;
-	hdr_len += (mbuf->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) ?
+	hdr_len += (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) ?
 		   tx_offload.outer_l2_len + tx_offload.outer_l3_len : 0;
 
 	cd_cmd = I40E_TX_CTX_DESC_TSO;
@@ -1029,6 +1024,52 @@ i40e_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 	return ctx_desc;
 }
 
+/* compute a context descriptor if one is necessary based on the ol_flags
+ *
+ * Returns 0 if no descriptor is necessary.
+ * Returns 1 if one is necessary and the contents of the descriptor are returned
+ *   in the values pointed to by qw0 and qw1.
+ */
+static __rte_always_inline uint16_t
+get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
+		 const union ci_tx_offload *tx_offload,
+		 const struct ci_tx_queue *txq __rte_unused,
+		 uint64_t *qw0, uint64_t *qw1)
+{
+	uint16_t cd_l2tag2 = 0;
+	uint64_t cd_type_cmd_tso_mss = I40E_TX_DESC_DTYPE_CONTEXT;
+	uint32_t cd_tunneling_params = 0;
+
+	if (i40e_calc_context_desc(ol_flags) == 0)
+		return 0;
+
+	if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)
+		i40e_parse_tunneling_params(ol_flags, *tx_offload, &cd_tunneling_params);
+
+	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+		cd_type_cmd_tso_mss |= i40e_set_tso_ctx(ol_flags, tx_pkt, *tx_offload);
+	} else {
+#ifdef RTE_LIBRTE_IEEE1588
+		if (ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST)
+			cd_type_cmd_tso_mss |=
+				((uint64_t)I40E_TX_CTX_DESC_TSYN << I40E_TXD_CTX_QW1_CMD_SHIFT);
+#endif
+	}
+
+	/* TX context descriptor based double VLAN insert */
+	if (ol_flags & RTE_MBUF_F_TX_QINQ) {
+		cd_l2tag2 = tx_pkt->vlan_tci_outer;
+		cd_type_cmd_tso_mss |=
+				((uint64_t)I40E_TX_CTX_DESC_IL2TAG2 << I40E_TXD_CTX_QW1_CMD_SHIFT);
+	}
+
+	*qw0 = rte_cpu_to_le_32(cd_tunneling_params) |
+		((uint64_t)rte_cpu_to_le_16(cd_l2tag2) << 32);
+	*qw1 = rte_cpu_to_le_64(cd_type_cmd_tso_mss);
+
+	return 1;
+}
+
 uint16_t
 i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
@@ -1039,7 +1080,6 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	volatile struct ci_tx_desc *txr;
 	struct rte_mbuf *tx_pkt;
 	struct rte_mbuf *m_seg;
-	uint32_t cd_tunneling_params;
 	uint16_t tx_id;
 	uint16_t nb_tx;
 	uint32_t td_cmd;
@@ -1050,6 +1090,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	uint16_t nb_ctx;
 	uint16_t tx_last;
 	uint16_t slen;
+	uint16_t l2_len;
 	uint64_t buf_dma_addr;
 	union ci_tx_offload tx_offload = {0};
 
@@ -1064,14 +1105,15 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		(void)ci_tx_xmit_cleanup(txq);
 
 	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
-		td_cmd = 0;
-		td_tag = 0;
-		td_offset = 0;
-
 		tx_pkt = *tx_pkts++;
 		RTE_MBUF_PREFETCH_TO_FREE(txe->mbuf);
 
 		ol_flags = tx_pkt->ol_flags;
+		td_cmd = 0;
+		td_tag = 0;
+		l2_len = ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK ?
+				tx_pkt->outer_l2_len : tx_pkt->l2_len;
+		td_offset = (l2_len >> 1) << CI_TX_DESC_LEN_MACLEN_S;
 		tx_offload.l2_len = tx_pkt->l2_len;
 		tx_offload.l3_len = tx_pkt->l3_len;
 		tx_offload.outer_l2_len = tx_pkt->outer_l2_len;
@@ -1080,7 +1122,9 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		tx_offload.tso_segsz = tx_pkt->tso_segsz;
 
 		/* Calculate the number of context descriptors needed. */
-		nb_ctx = i40e_calc_context_desc(ol_flags);
+		uint64_t cd_qw0 = 0, cd_qw1 = 0;
+		nb_ctx = get_context_desc(ol_flags, tx_pkt, &tx_offload, txq,
+				&cd_qw0, &cd_qw1);
 
 		/**
 		 * The number of descriptors that must be allocated for
@@ -1126,14 +1170,6 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		/* Always enable CRC offload insertion */
 		td_cmd |= CI_TX_DESC_CMD_ICRC;
 
-		/* Fill in tunneling parameters if necessary */
-		cd_tunneling_params = 0;
-		if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) {
-			td_offset |= (tx_offload.outer_l2_len >> 1)
-					<< CI_TX_DESC_LEN_MACLEN_S;
-			i40e_parse_tunneling_params(ol_flags, tx_offload,
-						    &cd_tunneling_params);
-		}
 		/* Enable checksum offloading */
 		if (ol_flags & I40E_TX_CKSUM_OFFLOAD_MASK)
 			i40e_txd_enable_checksum(ol_flags, &td_cmd,
@@ -1141,12 +1177,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		if (nb_ctx) {
 			/* Setup TX context descriptor if required */
-			volatile struct i40e_tx_context_desc *ctx_txd =
-				(volatile struct i40e_tx_context_desc *)\
-							&txr[tx_id];
-			uint16_t cd_l2tag2 = 0;
-			uint64_t cd_type_cmd_tso_mss =
-				I40E_TX_DESC_DTYPE_CONTEXT;
+			uint64_t *desc = RTE_CAST_PTR(uint64_t *, &txr[tx_id]);
 
 			txn = &sw_ring[txe->next_id];
 			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
@@ -1155,41 +1186,13 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 				txe->mbuf = NULL;
 			}
 
-			/* TSO enabled means no timestamp */
-			if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-				cd_type_cmd_tso_mss |=
-					i40e_set_tso_ctx(tx_pkt, tx_offload);
-			else {
-#ifdef RTE_LIBRTE_IEEE1588
-				if (ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST)
-					cd_type_cmd_tso_mss |=
-						((uint64_t)I40E_TX_CTX_DESC_TSYN <<
-						 I40E_TXD_CTX_QW1_CMD_SHIFT);
-#endif
-			}
-
-			ctx_txd->tunneling_params =
-				rte_cpu_to_le_32(cd_tunneling_params);
-			if (ol_flags & RTE_MBUF_F_TX_QINQ) {
-				cd_l2tag2 = tx_pkt->vlan_tci_outer;
-				cd_type_cmd_tso_mss |=
-					((uint64_t)I40E_TX_CTX_DESC_IL2TAG2 <<
-						I40E_TXD_CTX_QW1_CMD_SHIFT);
-			}
-			ctx_txd->l2tag2 = rte_cpu_to_le_16(cd_l2tag2);
-			ctx_txd->type_cmd_tso_mss =
-				rte_cpu_to_le_64(cd_type_cmd_tso_mss);
+			desc[0] = cd_qw0;
+			desc[1] = cd_qw1;
 
 			PMD_TX_LOG(DEBUG, "mbuf: %p, TCD[%u]: "
-				"tunneling_params: %#x; "
-				"l2tag2: %#hx; "
-				"rsvd: %#hx; "
-				"type_cmd_tso_mss: %#"PRIx64";",
-				tx_pkt, tx_id,
-				ctx_txd->tunneling_params,
-				ctx_txd->l2tag2,
-				ctx_txd->rsvd,
-				ctx_txd->type_cmd_tso_mss);
+				"qw0: %#"PRIx64"; "
+				"qw1: %#"PRIx64";",
+				tx_pkt, tx_id, cd_qw0, cd_qw1);
 
 			txe->last_id = tx_last;
 			tx_id = txe->next_id;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v4 08/35] net/idpf: refactor context descriptor handling
  2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (6 preceding siblings ...)
  2026-02-09 16:45   ` [PATCH v4 07/35] net/i40e: " Bruce Richardson
@ 2026-02-09 16:45   ` Bruce Richardson
  2026-02-10 12:52     ` Burakov, Anatoly
  2026-02-09 16:45   ` [PATCH v4 09/35] net/intel: consolidate checksum mask definition Bruce Richardson
                     ` (26 subsequent siblings)
  34 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 16:45 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Ciara Loftus, Jingjing Wu, Praveen Shetty

Move all context descriptor handling to a single function, as with the
ice and i40e drivers.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Ciara Loftus <ciara.loftus@intel.com>
---
 drivers/net/intel/idpf/idpf_common_rxtx.c | 61 +++++++++++------------
 1 file changed, 28 insertions(+), 33 deletions(-)

diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 11d6848430..9219ad9047 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -845,37 +845,36 @@ idpf_calc_context_desc(uint64_t flags)
 	return 0;
 }
 
-/* set TSO context descriptor
+/* set TSO context descriptor, returns 0 if no context needed, 1 if context set
  */
-static inline void
-idpf_set_splitq_tso_ctx(struct rte_mbuf *mbuf,
+static inline uint16_t
+idpf_set_tso_ctx(uint64_t ol_flags, struct rte_mbuf *mbuf,
 			union ci_tx_offload tx_offload,
-			volatile union idpf_flex_tx_ctx_desc *ctx_desc)
+			uint64_t *qw0, uint64_t *qw1)
 {
-	uint16_t cmd_dtype;
+	uint16_t cmd_dtype = IDPF_TX_DESC_DTYPE_FLEX_TSO_CTX | IDPF_TX_FLEX_CTX_DESC_CMD_TSO;
+	uint16_t tso_segsz = mbuf->tso_segsz;
 	uint32_t tso_len;
 	uint8_t hdr_len;
 
+	if (idpf_calc_context_desc(ol_flags) == 0)
+		return 0;
+
+	/* TSO context descriptor setup */
 	if (tx_offload.l4_len == 0) {
 		TX_LOG(DEBUG, "L4 length set to 0");
-		return;
+		return 0;
 	}
 
-	hdr_len = tx_offload.l2_len +
-		tx_offload.l3_len +
-		tx_offload.l4_len;
-	cmd_dtype = IDPF_TX_DESC_DTYPE_FLEX_TSO_CTX |
-		IDPF_TX_FLEX_CTX_DESC_CMD_TSO;
+	hdr_len = tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len;
 	tso_len = mbuf->pkt_len - hdr_len;
 
-	ctx_desc->tso.qw1.cmd_dtype = rte_cpu_to_le_16(cmd_dtype);
-	ctx_desc->tso.qw0.hdr_len = hdr_len;
-	ctx_desc->tso.qw0.mss_rt =
-		rte_cpu_to_le_16((uint16_t)mbuf->tso_segsz &
-				 IDPF_TXD_FLEX_CTX_MSS_RT_M);
-	ctx_desc->tso.qw0.flex_tlen =
-		rte_cpu_to_le_32(tso_len &
-				 IDPF_TXD_FLEX_CTX_MSS_RT_M);
+	*qw0 = rte_cpu_to_le_32(tso_len & IDPF_TXD_FLEX_CTX_MSS_RT_M) |
+	       ((uint64_t)rte_cpu_to_le_16(tso_segsz & IDPF_TXD_FLEX_CTX_MSS_RT_M) << 32) |
+	       ((uint64_t)hdr_len << 48);
+	*qw1 = rte_cpu_to_le_16(cmd_dtype);
+
+	return 1;
 }
 
 RTE_EXPORT_INTERNAL_SYMBOL(idpf_dp_splitq_xmit_pkts)
@@ -933,7 +932,8 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		tx_offload.l4_len = tx_pkt->l4_len;
 		tx_offload.tso_segsz = tx_pkt->tso_segsz;
 		/* Calculate the number of context descriptors needed. */
-		nb_ctx = idpf_calc_context_desc(ol_flags);
+		uint64_t cd_qw0, cd_qw1;
+		nb_ctx = idpf_set_tso_ctx(ol_flags, tx_pkt, tx_offload, &cd_qw0, &cd_qw1);
 
 		/* Calculate the number of TX descriptors needed for
 		 * each packet. For TSO packets, use ci_calc_pkt_desc as
@@ -950,12 +950,10 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 		/* context descriptor */
 		if (nb_ctx != 0) {
-			volatile union idpf_flex_tx_ctx_desc *ctx_desc =
-				(volatile union idpf_flex_tx_ctx_desc *)&txr[tx_id];
+			uint64_t *ctx_desc = RTE_CAST_PTR(uint64_t *, &txr[tx_id]);
 
-			if ((ol_flags & RTE_MBUF_F_TX_TCP_SEG) != 0)
-				idpf_set_splitq_tso_ctx(tx_pkt, tx_offload,
-							ctx_desc);
+			ctx_desc[0] = cd_qw0;
+			ctx_desc[1] = cd_qw1;
 
 			tx_id++;
 			if (tx_id == txq->nb_tx_desc)
@@ -1388,7 +1386,8 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		tx_offload.l4_len = tx_pkt->l4_len;
 		tx_offload.tso_segsz = tx_pkt->tso_segsz;
 		/* Calculate the number of context descriptors needed. */
-		nb_ctx = idpf_calc_context_desc(ol_flags);
+		uint64_t cd_qw0, cd_qw1;
+		nb_ctx = idpf_set_tso_ctx(ol_flags, tx_pkt, tx_offload, &cd_qw0, &cd_qw1);
 
 		/* The number of descriptors that must be allocated for
 		 * a packet. For TSO packets, use ci_calc_pkt_desc as
@@ -1431,9 +1430,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 		if (nb_ctx != 0) {
 			/* Setup TX context descriptor if required */
-			volatile union idpf_flex_tx_ctx_desc *ctx_txd =
-				(volatile union idpf_flex_tx_ctx_desc *)
-				&txr[tx_id];
+			uint64_t *ctx_txd = RTE_CAST_PTR(uint64_t *, &txr[tx_id]);
 
 			txn = &sw_ring[txe->next_id];
 			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
@@ -1442,10 +1439,8 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 				txe->mbuf = NULL;
 			}
 
-			/* TSO enabled */
-			if ((ol_flags & RTE_MBUF_F_TX_TCP_SEG) != 0)
-				idpf_set_splitq_tso_ctx(tx_pkt, tx_offload,
-							ctx_txd);
+			ctx_txd[0] = cd_qw0;
+			ctx_txd[1] = cd_qw1;
 
 			txe->last_id = tx_last;
 			tx_id = txe->next_id;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v4 09/35] net/intel: consolidate checksum mask definition
  2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (7 preceding siblings ...)
  2026-02-09 16:45   ` [PATCH v4 08/35] net/idpf: " Bruce Richardson
@ 2026-02-09 16:45   ` Bruce Richardson
  2026-02-10 13:00     ` Burakov, Anatoly
  2026-02-09 16:45   ` [PATCH v4 10/35] net/intel: create common checksum Tx offload function Bruce Richardson
                     ` (25 subsequent siblings)
  34 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 16:45 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Vladimir Medvedkin, Anatoly Burakov,
	Jingjing Wu, Praveen Shetty

Create a common definition for checksum masks across iavf, idpf, i40e
and ice drivers.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h             | 8 ++++++++
 drivers/net/intel/i40e/i40e_rxtx.c        | 7 +------
 drivers/net/intel/iavf/iavf_rxtx.c        | 2 +-
 drivers/net/intel/iavf/iavf_rxtx.h        | 8 --------
 drivers/net/intel/ice/ice_rxtx.c          | 8 +-------
 drivers/net/intel/idpf/idpf_common_rxtx.c | 4 ++--
 drivers/net/intel/idpf/idpf_common_rxtx.h | 7 +------
 7 files changed, 14 insertions(+), 30 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 091f220f1c..23deabc5d1 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -53,6 +53,14 @@
 /* Common maximum data per TX descriptor */
 #define CI_MAX_DATA_PER_TXD     (CI_TXD_QW1_TX_BUF_SZ_M >> CI_TXD_QW1_TX_BUF_SZ_S)
 
+/* Checksum offload mask to identify packets requesting offload */
+#define CI_TX_CKSUM_OFFLOAD_MASK (RTE_MBUF_F_TX_IP_CKSUM |		 \
+				   RTE_MBUF_F_TX_L4_MASK |		 \
+				   RTE_MBUF_F_TX_TCP_SEG |		 \
+				   RTE_MBUF_F_TX_UDP_SEG |		 \
+				   RTE_MBUF_F_TX_OUTER_IP_CKSUM |	 \
+				   RTE_MBUF_F_TX_OUTER_UDP_CKSUM)
+
 /**
  * Common TX offload union for Intel drivers.
  * Supports both basic offloads (l2_len, l3_len, l4_len, tso_segsz) and
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 601d4b98f2..12a21407c5 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -53,11 +53,6 @@
 #define I40E_TX_IEEE1588_TMST 0
 #endif
 
-#define I40E_TX_CKSUM_OFFLOAD_MASK (RTE_MBUF_F_TX_IP_CKSUM |		 \
-		RTE_MBUF_F_TX_L4_MASK |		 \
-		RTE_MBUF_F_TX_TCP_SEG |		 \
-		RTE_MBUF_F_TX_OUTER_IP_CKSUM)
-
 #define I40E_TX_OFFLOAD_MASK (RTE_MBUF_F_TX_OUTER_IPV4 |	\
 		RTE_MBUF_F_TX_OUTER_IPV6 |	\
 		RTE_MBUF_F_TX_IPV4 |		\
@@ -1171,7 +1166,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		td_cmd |= CI_TX_DESC_CMD_ICRC;
 
 		/* Enable checksum offloading */
-		if (ol_flags & I40E_TX_CKSUM_OFFLOAD_MASK)
+		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
 			i40e_txd_enable_checksum(ol_flags, &td_cmd,
 						 &td_offset, tx_offload);
 
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index 885d9309cc..3dbcfd5355 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -2596,7 +2596,7 @@ iavf_build_data_desc_cmd_offset_fields(volatile uint64_t *qw1,
 	}
 
 	if ((m->ol_flags &
-	    (IAVF_TX_CKSUM_OFFLOAD_MASK | RTE_MBUF_F_TX_SEC_OFFLOAD)) == 0)
+	    (CI_TX_CKSUM_OFFLOAD_MASK | RTE_MBUF_F_TX_SEC_OFFLOAD)) == 0)
 		goto skip_cksum;
 
 	/* Set MACLEN */
diff --git a/drivers/net/intel/iavf/iavf_rxtx.h b/drivers/net/intel/iavf/iavf_rxtx.h
index 395d97b4ee..cca5c25119 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.h
+++ b/drivers/net/intel/iavf/iavf_rxtx.h
@@ -136,14 +136,6 @@
 
 #define IAVF_TX_MIN_PKT_LEN 17
 
-#define IAVF_TX_CKSUM_OFFLOAD_MASK (		 \
-		RTE_MBUF_F_TX_IP_CKSUM |		 \
-		RTE_MBUF_F_TX_L4_MASK |		 \
-		RTE_MBUF_F_TX_TCP_SEG |          \
-		RTE_MBUF_F_TX_UDP_SEG |          \
-		RTE_MBUF_F_TX_OUTER_IP_CKSUM |   \
-		RTE_MBUF_F_TX_OUTER_UDP_CKSUM)
-
 #define IAVF_TX_OFFLOAD_MASK (  \
 		RTE_MBUF_F_TX_OUTER_IPV6 |		 \
 		RTE_MBUF_F_TX_OUTER_IPV4 |		 \
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 1c789d45da..63bce7bd9e 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -13,12 +13,6 @@
 #include "../common/rx_vec_x86.h"
 #endif
 
-#define ICE_TX_CKSUM_OFFLOAD_MASK (RTE_MBUF_F_TX_IP_CKSUM |		 \
-		RTE_MBUF_F_TX_L4_MASK |		 \
-		RTE_MBUF_F_TX_TCP_SEG |		 \
-		RTE_MBUF_F_TX_UDP_SEG |		 \
-		RTE_MBUF_F_TX_OUTER_IP_CKSUM)
-
 /**
  * The mbuf dynamic field pointer for protocol extraction metadata.
  */
@@ -3214,7 +3208,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		}
 
 		/* Enable checksum offloading */
-		if (ol_flags & ICE_TX_CKSUM_OFFLOAD_MASK)
+		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
 			ice_txd_enable_checksum(ol_flags, &td_cmd,
 						&td_offset, tx_offload);
 
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 9219ad9047..b34d545a0a 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -945,7 +945,7 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		else
 			nb_used = tx_pkt->nb_segs + nb_ctx;
 
-		if (ol_flags & IDPF_TX_CKSUM_OFFLOAD_MASK)
+		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
 			cmd_dtype = IDPF_TXD_FLEX_FLOW_CMD_CS_EN;
 
 		/* context descriptor */
@@ -1425,7 +1425,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			}
 		}
 
-		if (ol_flags & IDPF_TX_CKSUM_OFFLOAD_MASK)
+		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
 			td_cmd |= IDPF_TX_FLEX_DESC_CMD_CS_EN;
 
 		if (nb_ctx != 0) {
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.h b/drivers/net/intel/idpf/idpf_common_rxtx.h
index b88a87402d..fe7094d434 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.h
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.h
@@ -39,13 +39,8 @@
 #define IDPF_RLAN_CTX_DBUF_S	7
 #define IDPF_RX_MAX_DATA_BUF_SIZE	(16 * 1024 - 128)
 
-#define IDPF_TX_CKSUM_OFFLOAD_MASK (		\
-		RTE_MBUF_F_TX_IP_CKSUM |	\
-		RTE_MBUF_F_TX_L4_MASK |		\
-		RTE_MBUF_F_TX_TCP_SEG)
-
 #define IDPF_TX_OFFLOAD_MASK (			\
-		IDPF_TX_CKSUM_OFFLOAD_MASK |	\
+		CI_TX_CKSUM_OFFLOAD_MASK |	\
 		RTE_MBUF_F_TX_IPV4 |		\
 		RTE_MBUF_F_TX_IPV6)
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v4 10/35] net/intel: create common checksum Tx offload function
  2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (8 preceding siblings ...)
  2026-02-09 16:45   ` [PATCH v4 09/35] net/intel: consolidate checksum mask definition Bruce Richardson
@ 2026-02-09 16:45   ` Bruce Richardson
  2026-02-10 13:04     ` Burakov, Anatoly
  2026-02-09 16:45   ` [PATCH v4 11/35] net/intel: create a common scalar Tx function Bruce Richardson
                     ` (24 subsequent siblings)
  34 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 16:45 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

Since i40e and ice have the same checksum offload logic, merge their
functions into one. Future rework should enable this to be used by more
drivers also.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar.h | 58 +++++++++++++++++++++++++++
 drivers/net/intel/i40e/i40e_rxtx.c   | 52 +-----------------------
 drivers/net/intel/i40e/i40e_rxtx.h   |  1 +
 drivers/net/intel/ice/ice_rxtx.c     | 60 +---------------------------
 drivers/net/intel/ice/ice_rxtx.h     |  1 +
 5 files changed, 62 insertions(+), 110 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index 573f5136a9..cf0dcb4b2c 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -59,6 +59,64 @@ ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
 	return 0;
 }
 
+/* Common checksum enable function for Intel drivers (ice, i40e, etc.) */
+static inline void
+ci_txd_enable_checksum(uint64_t ol_flags,
+		       uint32_t *td_cmd,
+		       uint32_t *td_offset,
+		       union ci_tx_offload tx_offload)
+{
+	/* Enable L3 checksum offloads */
+	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
+		*td_offset |= (tx_offload.l3_len >> 2) <<
+			CI_TX_DESC_LEN_IPLEN_S;
+	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
+		*td_offset |= (tx_offload.l3_len >> 2) <<
+			CI_TX_DESC_LEN_IPLEN_S;
+	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
+		*td_offset |= (tx_offload.l3_len >> 2) <<
+			CI_TX_DESC_LEN_IPLEN_S;
+	}
+
+	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
+		*td_offset |= (tx_offload.l4_len >> 2) <<
+			      CI_TX_DESC_LEN_L4_LEN_S;
+		return;
+	}
+
+	if (ol_flags & RTE_MBUF_F_TX_UDP_SEG) {
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
+		*td_offset |= (tx_offload.l4_len >> 2) <<
+			      CI_TX_DESC_LEN_L4_LEN_S;
+		return;
+	}
+
+	/* Enable L4 checksum offloads */
+	switch (ol_flags & RTE_MBUF_F_TX_L4_MASK) {
+	case RTE_MBUF_F_TX_TCP_CKSUM:
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
+		*td_offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
+			      CI_TX_DESC_LEN_L4_LEN_S;
+		break;
+	case RTE_MBUF_F_TX_SCTP_CKSUM:
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
+		*td_offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
+			      CI_TX_DESC_LEN_L4_LEN_S;
+		break;
+	case RTE_MBUF_F_TX_UDP_CKSUM:
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
+		*td_offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
+			      CI_TX_DESC_LEN_L4_LEN_S;
+		break;
+	default:
+		break;
+	}
+}
+
 static inline uint16_t
 ci_div_roundup16(uint16_t x, uint16_t y)
 {
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 12a21407c5..c318b4c84e 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -310,56 +310,6 @@ i40e_parse_tunneling_params(uint64_t ol_flags,
 		*cd_tunneling |= I40E_TXD_CTX_QW0_L4T_CS_MASK;
 }
 
-static inline void
-i40e_txd_enable_checksum(uint64_t ol_flags,
-			uint32_t *td_cmd,
-			uint32_t *td_offset,
-			union ci_tx_offload tx_offload)
-{
-	/* Enable L3 checksum offloads */
-	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
-		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
-		*td_offset |= (tx_offload.l3_len >> 2)
-				<< CI_TX_DESC_LEN_IPLEN_S;
-	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
-		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
-		*td_offset |= (tx_offload.l3_len >> 2)
-				<< CI_TX_DESC_LEN_IPLEN_S;
-	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
-		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
-		*td_offset |= (tx_offload.l3_len >> 2)
-				<< CI_TX_DESC_LEN_IPLEN_S;
-	}
-
-	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
-		*td_offset |= (tx_offload.l4_len >> 2)
-			<< CI_TX_DESC_LEN_L4_LEN_S;
-		return;
-	}
-
-	/* Enable L4 checksum offloads */
-	switch (ol_flags & RTE_MBUF_F_TX_L4_MASK) {
-	case RTE_MBUF_F_TX_TCP_CKSUM:
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
-		*td_offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
-				CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	case RTE_MBUF_F_TX_SCTP_CKSUM:
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
-		*td_offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
-				CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	case RTE_MBUF_F_TX_UDP_CKSUM:
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
-		*td_offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
-				CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	default:
-		break;
-	}
-}
-
 /* Construct the tx flags */
 static inline uint64_t
 i40e_build_ctob(uint32_t td_cmd,
@@ -1167,7 +1117,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		/* Enable checksum offloading */
 		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
-			i40e_txd_enable_checksum(ol_flags, &td_cmd,
+			ci_txd_enable_checksum(ol_flags, &td_cmd,
 						 &td_offset, tx_offload);
 
 		if (nb_ctx) {
diff --git a/drivers/net/intel/i40e/i40e_rxtx.h b/drivers/net/intel/i40e/i40e_rxtx.h
index 307ffa3049..db8525d52d 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.h
+++ b/drivers/net/intel/i40e/i40e_rxtx.h
@@ -100,6 +100,7 @@ enum i40e_header_split_mode {
 		RTE_ETH_TX_OFFLOAD_SCTP_CKSUM |		\
 		RTE_ETH_TX_OFFLOAD_OUTER_IPV4_CKSUM |	\
 		RTE_ETH_TX_OFFLOAD_TCP_TSO |		\
+		RTE_ETH_TX_OFFLOAD_UDP_TSO |		\
 		RTE_ETH_TX_OFFLOAD_VXLAN_TNL_TSO |	\
 		RTE_ETH_TX_OFFLOAD_GRE_TNL_TSO |	\
 		RTE_ETH_TX_OFFLOAD_IPIP_TNL_TSO |	\
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 63bce7bd9e..4792aa9a8b 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -2954,64 +2954,6 @@ ice_parse_tunneling_params(uint64_t ol_flags,
 		*cd_tunneling |= ICE_TXD_CTX_QW0_L4T_CS_M;
 }
 
-static inline void
-ice_txd_enable_checksum(uint64_t ol_flags,
-			uint32_t *td_cmd,
-			uint32_t *td_offset,
-			union ci_tx_offload tx_offload)
-{
-
-	/* Enable L3 checksum offloads */
-	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
-		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
-		*td_offset |= (tx_offload.l3_len >> 2) <<
-			CI_TX_DESC_LEN_IPLEN_S;
-	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
-		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
-		*td_offset |= (tx_offload.l3_len >> 2) <<
-			CI_TX_DESC_LEN_IPLEN_S;
-	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
-		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
-		*td_offset |= (tx_offload.l3_len >> 2) <<
-			CI_TX_DESC_LEN_IPLEN_S;
-	}
-
-	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
-		*td_offset |= (tx_offload.l4_len >> 2) <<
-			      CI_TX_DESC_LEN_L4_LEN_S;
-		return;
-	}
-
-	if (ol_flags & RTE_MBUF_F_TX_UDP_SEG) {
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
-		*td_offset |= (tx_offload.l4_len >> 2) <<
-			      CI_TX_DESC_LEN_L4_LEN_S;
-		return;
-	}
-
-	/* Enable L4 checksum offloads */
-	switch (ol_flags & RTE_MBUF_F_TX_L4_MASK) {
-	case RTE_MBUF_F_TX_TCP_CKSUM:
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
-		*td_offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
-			      CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	case RTE_MBUF_F_TX_SCTP_CKSUM:
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
-		*td_offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
-			      CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	case RTE_MBUF_F_TX_UDP_CKSUM:
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
-		*td_offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
-			      CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	default:
-		break;
-	}
-}
-
 /* Construct the tx flags */
 static inline uint64_t
 ice_build_ctob(uint32_t td_cmd,
@@ -3209,7 +3151,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		/* Enable checksum offloading */
 		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
-			ice_txd_enable_checksum(ol_flags, &td_cmd,
+			ci_txd_enable_checksum(ol_flags, &td_cmd,
 						&td_offset, tx_offload);
 
 		if (nb_ctx) {
diff --git a/drivers/net/intel/ice/ice_rxtx.h b/drivers/net/intel/ice/ice_rxtx.h
index cd5fa93d1c..7d6480b410 100644
--- a/drivers/net/intel/ice/ice_rxtx.h
+++ b/drivers/net/intel/ice/ice_rxtx.h
@@ -112,6 +112,7 @@
 #define ICE_TX_SCALAR_OFFLOADS (		\
 	RTE_ETH_TX_OFFLOAD_VLAN_INSERT |	\
 	RTE_ETH_TX_OFFLOAD_TCP_TSO |		\
+	RTE_ETH_TX_OFFLOAD_UDP_TSO |		\
 	RTE_ETH_TX_OFFLOAD_MULTI_SEGS |		\
 	RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE |	\
 	RTE_ETH_TX_OFFLOAD_QINQ_INSERT |	\
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v4 11/35] net/intel: create a common scalar Tx function
  2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (9 preceding siblings ...)
  2026-02-09 16:45   ` [PATCH v4 10/35] net/intel: create common checksum Tx offload function Bruce Richardson
@ 2026-02-09 16:45   ` Bruce Richardson
  2026-02-10 13:14     ` Burakov, Anatoly
  2026-02-09 16:45   ` [PATCH v4 12/35] net/i40e: use " Bruce Richardson
                     ` (23 subsequent siblings)
  34 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 16:45 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

Given the similarities between the transmit functions across various
Intel drivers, make a start on consolidating them by moving the ice Tx
function into common, for reuse by other drivers.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar.h | 218 +++++++++++++++++++++
 drivers/net/intel/ice/ice_rxtx.c     | 270 ++++++---------------------
 2 files changed, 270 insertions(+), 218 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index cf0dcb4b2c..4d041fac99 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -6,6 +6,7 @@
 #define _COMMON_INTEL_TX_SCALAR_H_
 
 #include <stdint.h>
+#include <rte_io.h>
 #include <rte_byteorder.h>
 
 /* depends on common Tx definitions. */
@@ -137,5 +138,222 @@ ci_calc_pkt_desc(const struct rte_mbuf *tx_pkt)
 	return count;
 }
 
+typedef uint16_t (*ci_get_ctx_desc_fn)(uint64_t ol_flags, const struct rte_mbuf *mbuf,
+		const union ci_tx_offload *tx_offload, const struct ci_tx_queue *txq,
+		uint64_t *qw0, uint64_t *qw1);
+
+/* gets current timestamp tail index */
+typedef uint16_t (*get_ts_tail_t)(struct ci_tx_queue *txq);
+/* writes a timestamp descriptor and returns new tail index */
+typedef uint16_t (*write_ts_desc_t)(struct ci_tx_queue *txq, struct rte_mbuf *mbuf,
+		uint16_t tx_id, uint16_t ts_id);
+/* writes a timestamp tail index - doorbell */
+typedef void (*write_ts_tail_t)(struct ci_tx_queue *txq, uint16_t ts_id);
+
+struct ci_timestamp_queue_fns {
+	get_ts_tail_t get_ts_tail;
+	write_ts_desc_t write_ts_desc;
+	write_ts_tail_t write_ts_tail;
+};
+
+static inline uint16_t
+ci_xmit_pkts(struct ci_tx_queue *txq,
+	     struct rte_mbuf **tx_pkts,
+	     uint16_t nb_pkts,
+	     ci_get_ctx_desc_fn get_ctx_desc,
+	     const struct ci_timestamp_queue_fns *ts_fns)
+{
+	volatile struct ci_tx_desc *ci_tx_ring;
+	volatile struct ci_tx_desc *txd;
+	struct ci_tx_entry *sw_ring;
+	struct ci_tx_entry *txe, *txn;
+	struct rte_mbuf *tx_pkt;
+	struct rte_mbuf *m_seg;
+	uint16_t tx_id;
+	uint16_t ts_id = -1;
+	uint16_t nb_tx;
+	uint16_t nb_used;
+	uint16_t nb_ctx;
+	uint32_t td_cmd = 0;
+	uint32_t td_offset = 0;
+	uint32_t td_tag = 0;
+	uint16_t tx_last;
+	uint16_t slen;
+	uint16_t l2_len;
+	uint64_t buf_dma_addr;
+	uint64_t ol_flags;
+	union ci_tx_offload tx_offload = {0};
+
+	sw_ring = txq->sw_ring;
+	ci_tx_ring = txq->ci_tx_ring;
+	tx_id = txq->tx_tail;
+	txe = &sw_ring[tx_id];
+
+	if (ts_fns != NULL)
+		ts_id = ts_fns->get_ts_tail(txq);
+
+	/* Check if the descriptor ring needs to be cleaned. */
+	if (txq->nb_tx_free < txq->tx_free_thresh)
+		(void)ci_tx_xmit_cleanup(txq);
+
+	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
+		uint64_t cd_qw0, cd_qw1;
+		tx_pkt = *tx_pkts++;
+
+		ol_flags = tx_pkt->ol_flags;
+		td_cmd = CI_TX_DESC_CMD_ICRC;
+		td_tag = 0;
+		l2_len = ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK ?
+				tx_pkt->outer_l2_len : tx_pkt->l2_len;
+		td_offset = (l2_len >> 1) << CI_TX_DESC_LEN_MACLEN_S;
+
+
+		tx_offload.l2_len = tx_pkt->l2_len;
+		tx_offload.l3_len = tx_pkt->l3_len;
+		tx_offload.outer_l2_len = tx_pkt->outer_l2_len;
+		tx_offload.outer_l3_len = tx_pkt->outer_l3_len;
+		tx_offload.l4_len = tx_pkt->l4_len;
+		tx_offload.tso_segsz = tx_pkt->tso_segsz;
+
+		/* Calculate the number of context descriptors needed. */
+		nb_ctx = get_ctx_desc(ol_flags, tx_pkt, &tx_offload, txq, &cd_qw0, &cd_qw1);
+
+		/* The number of descriptors that must be allocated for
+		 * a packet equals to the number of the segments of that
+		 * packet plus the number of context descriptor if needed.
+		 * Recalculate the needed tx descs when TSO enabled in case
+		 * the mbuf data size exceeds max data size that hw allows
+		 * per tx desc.
+		 */
+		if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
+			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
+		else
+			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
+		tx_last = (uint16_t)(tx_id + nb_used - 1);
+
+		/* Circular ring */
+		if (tx_last >= txq->nb_tx_desc)
+			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
+
+		if (nb_used > txq->nb_tx_free) {
+			if (ci_tx_xmit_cleanup(txq) != 0) {
+				if (nb_tx == 0)
+					return 0;
+				goto end_of_tx;
+			}
+			if (unlikely(nb_used > txq->tx_rs_thresh)) {
+				while (nb_used > txq->nb_tx_free) {
+					if (ci_tx_xmit_cleanup(txq) != 0) {
+						if (nb_tx == 0)
+							return 0;
+						goto end_of_tx;
+					}
+				}
+			}
+		}
+
+		/* Descriptor based VLAN insertion */
+		if (ol_flags & (RTE_MBUF_F_TX_VLAN | RTE_MBUF_F_TX_QINQ)) {
+			td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
+			td_tag = tx_pkt->vlan_tci;
+		}
+
+		/* Enable checksum offloading */
+		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
+			ci_txd_enable_checksum(ol_flags, &td_cmd,
+						&td_offset, tx_offload);
+
+		if (nb_ctx) {
+			/* Setup TX context descriptor if required */
+			uint64_t *ctx_txd = RTE_CAST_PTR(uint64_t *, &ci_tx_ring[tx_id]);
+
+			txn = &sw_ring[txe->next_id];
+			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
+			if (txe->mbuf) {
+				rte_pktmbuf_free_seg(txe->mbuf);
+				txe->mbuf = NULL;
+			}
+
+			ctx_txd[0] = cd_qw0;
+			ctx_txd[1] = cd_qw1;
+
+			txe->last_id = tx_last;
+			tx_id = txe->next_id;
+			txe = txn;
+		}
+		m_seg = tx_pkt;
+
+		do {
+			txd = &ci_tx_ring[tx_id];
+			txn = &sw_ring[txe->next_id];
+
+			if (txe->mbuf)
+				rte_pktmbuf_free_seg(txe->mbuf);
+			txe->mbuf = m_seg;
+
+			/* Setup TX Descriptor */
+			slen = m_seg->data_len;
+			buf_dma_addr = rte_mbuf_data_iova(m_seg);
+
+			while ((ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) &&
+					unlikely(slen > CI_MAX_DATA_PER_TXD)) {
+				txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
+				txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+					((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
+					((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+					((uint64_t)CI_MAX_DATA_PER_TXD << CI_TXD_QW1_TX_BUF_SZ_S) |
+					((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
+
+				buf_dma_addr += CI_MAX_DATA_PER_TXD;
+				slen -= CI_MAX_DATA_PER_TXD;
+
+				txe->last_id = tx_last;
+				tx_id = txe->next_id;
+				txe = txn;
+				txd = &ci_tx_ring[tx_id];
+				txn = &sw_ring[txe->next_id];
+			}
+
+			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
+			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
+				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+				((uint64_t)slen << CI_TXD_QW1_TX_BUF_SZ_S) |
+				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
+
+			txe->last_id = tx_last;
+			tx_id = txe->next_id;
+			txe = txn;
+			m_seg = m_seg->next;
+		} while (m_seg);
+
+		/* fill the last descriptor with End of Packet (EOP) bit */
+		td_cmd |= CI_TX_DESC_CMD_EOP;
+		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
+		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
+
+		/* set RS bit on the last descriptor of one packet */
+		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
+			td_cmd |= CI_TX_DESC_CMD_RS;
+
+			/* Update txq RS bit counters */
+			txq->nb_tx_used = 0;
+		}
+		txd->cmd_type_offset_bsz |=
+				rte_cpu_to_le_64(((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S);
+
+		if (ts_fns != NULL)
+			ts_id = ts_fns->write_ts_desc(txq, tx_pkt, tx_id, ts_id);
+	}
+end_of_tx:
+	/* update Tail register */
+	if (ts_fns != NULL)
+		ts_fns->write_ts_tail(txq, ts_id);
+	else
+		rte_write32_wc(tx_id, txq->qtx_tail);
+	txq->tx_tail = tx_id;
+
+	return nb_tx;
+}
 
 #endif /* _COMMON_INTEL_TX_SCALAR_H_ */
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 4792aa9a8b..561a6617a6 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3050,230 +3050,64 @@ get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
 	return 1;
 }
 
-uint16_t
-ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+static uint16_t
+ice_get_ts_tail(struct ci_tx_queue *txq)
 {
-	struct ci_tx_queue *txq;
-	volatile struct ci_tx_desc *ci_tx_ring;
-	volatile struct ci_tx_desc *txd;
-	struct ci_tx_entry *sw_ring;
-	struct ci_tx_entry *txe, *txn;
-	struct rte_mbuf *tx_pkt;
-	struct rte_mbuf *m_seg;
-	uint16_t tx_id;
-	uint16_t ts_id = -1;
-	uint16_t nb_tx;
-	uint16_t nb_used;
-	uint16_t nb_ctx;
-	uint32_t td_cmd = 0;
-	uint32_t td_offset = 0;
-	uint32_t td_tag = 0;
-	uint16_t tx_last;
-	uint16_t slen;
-	uint16_t l2_len;
-	uint64_t buf_dma_addr;
-	uint64_t ol_flags;
-	union ci_tx_offload tx_offload = {0};
-
-	txq = tx_queue;
-	sw_ring = txq->sw_ring;
-	ci_tx_ring = txq->ci_tx_ring;
-	tx_id = txq->tx_tail;
-	txe = &sw_ring[tx_id];
-
-	if (txq->tsq != NULL && txq->tsq->ts_flag > 0)
-		ts_id = txq->tsq->ts_tail;
-
-	/* Check if the descriptor ring needs to be cleaned. */
-	if (txq->nb_tx_free < txq->tx_free_thresh)
-		(void)ci_tx_xmit_cleanup(txq);
-
-	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
-		uint64_t cd_qw0, cd_qw1;
-		tx_pkt = *tx_pkts++;
-
-		ol_flags = tx_pkt->ol_flags;
-		td_cmd = 0;
-		td_tag = 0;
-		l2_len = ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK ?
-				tx_pkt->outer_l2_len : tx_pkt->l2_len;
-		td_offset = (l2_len >> 1) << CI_TX_DESC_LEN_MACLEN_S;
-
-		tx_offload.l2_len = tx_pkt->l2_len;
-		tx_offload.l3_len = tx_pkt->l3_len;
-		tx_offload.outer_l2_len = tx_pkt->outer_l2_len;
-		tx_offload.outer_l3_len = tx_pkt->outer_l3_len;
-		tx_offload.l4_len = tx_pkt->l4_len;
-		tx_offload.tso_segsz = tx_pkt->tso_segsz;
-
-		/* Calculate the number of context descriptors needed. */
-		nb_ctx = get_context_desc(ol_flags, tx_pkt, &tx_offload, txq, &cd_qw0, &cd_qw1);
-
-		/* The number of descriptors that must be allocated for
-		 * a packet equals to the number of the segments of that
-		 * packet plus the number of context descriptor if needed.
-		 * Recalculate the needed tx descs when TSO enabled in case
-		 * the mbuf data size exceeds max data size that hw allows
-		 * per tx desc.
-		 */
-		if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
-			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
-		else
-			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
-		tx_last = (uint16_t)(tx_id + nb_used - 1);
-
-		/* Circular ring */
-		if (tx_last >= txq->nb_tx_desc)
-			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
-
-		if (nb_used > txq->nb_tx_free) {
-			if (ci_tx_xmit_cleanup(txq) != 0) {
-				if (nb_tx == 0)
-					return 0;
-				goto end_of_tx;
-			}
-			if (unlikely(nb_used > txq->tx_rs_thresh)) {
-				while (nb_used > txq->nb_tx_free) {
-					if (ci_tx_xmit_cleanup(txq) != 0) {
-						if (nb_tx == 0)
-							return 0;
-						goto end_of_tx;
-					}
-				}
-			}
-		}
-
-		/* Descriptor based VLAN insertion */
-		if (ol_flags & (RTE_MBUF_F_TX_VLAN | RTE_MBUF_F_TX_QINQ)) {
-			td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
-			td_tag = tx_pkt->vlan_tci;
-		}
-
-		/* Enable checksum offloading */
-		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
-			ci_txd_enable_checksum(ol_flags, &td_cmd,
-						&td_offset, tx_offload);
-
-		if (nb_ctx) {
-			/* Setup TX context descriptor if required */
-			uint64_t *ctx_txd = RTE_CAST_PTR(uint64_t *, &ci_tx_ring[tx_id]);
-
-			txn = &sw_ring[txe->next_id];
-			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
-			if (txe->mbuf) {
-				rte_pktmbuf_free_seg(txe->mbuf);
-				txe->mbuf = NULL;
-			}
-
-			ctx_txd[0] = cd_qw0;
-			ctx_txd[1] = cd_qw1;
-
-			txe->last_id = tx_last;
-			tx_id = txe->next_id;
-			txe = txn;
-		}
-		m_seg = tx_pkt;
-
-		do {
-			txd = &ci_tx_ring[tx_id];
-			txn = &sw_ring[txe->next_id];
-
-			if (txe->mbuf)
-				rte_pktmbuf_free_seg(txe->mbuf);
-			txe->mbuf = m_seg;
-
-			/* Setup TX Descriptor */
-			slen = m_seg->data_len;
-			buf_dma_addr = rte_mbuf_data_iova(m_seg);
-
-			while ((ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) &&
-					unlikely(slen > CI_MAX_DATA_PER_TXD)) {
-				txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-				txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
-					((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
-					((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
-					((uint64_t)CI_MAX_DATA_PER_TXD << CI_TXD_QW1_TX_BUF_SZ_S) |
-					((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
-
-				buf_dma_addr += CI_MAX_DATA_PER_TXD;
-				slen -= CI_MAX_DATA_PER_TXD;
-
-				txe->last_id = tx_last;
-				tx_id = txe->next_id;
-				txe = txn;
-				txd = &ci_tx_ring[tx_id];
-				txn = &sw_ring[txe->next_id];
-			}
-
-			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
-				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
-				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
-				((uint64_t)slen << CI_TXD_QW1_TX_BUF_SZ_S) |
-				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
-
-			txe->last_id = tx_last;
-			tx_id = txe->next_id;
-			txe = txn;
-			m_seg = m_seg->next;
-		} while (m_seg);
+	return txq->tsq->ts_tail;
+}
 
-		/* fill the last descriptor with End of Packet (EOP) bit */
-		td_cmd |= CI_TX_DESC_CMD_EOP;
-		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
-		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
+static uint16_t
+ice_write_ts_desc(struct ci_tx_queue *txq,
+		  struct rte_mbuf *tx_pkt,
+		  uint16_t tx_id,
+		  uint16_t ts_id)
+{
+	uint64_t txtime = *RTE_MBUF_DYNFIELD(tx_pkt, txq->tsq->ts_offset, uint64_t *);
+	uint32_t tstamp = (uint32_t)(txtime % NS_PER_S) >> ICE_TXTIME_CTX_RESOLUTION_128NS;
+	const uint32_t desc_tx_id = (tx_id == 0) ? txq->nb_tx_desc : tx_id;
+	__le32 ts_desc = rte_cpu_to_le_32(FIELD_PREP(ICE_TXTIME_TX_DESC_IDX_M, desc_tx_id) |
+			FIELD_PREP(ICE_TXTIME_STAMP_M, tstamp));
+
+	txq->tsq->ice_ts_ring[ts_id].tx_desc_idx_tstamp = ts_desc;
+	ts_id++;
+
+	/* To prevent an MDD, when wrapping the tstamp
+	 * ring create additional TS descriptors equal
+	 * to the number of the fetch TS descriptors
+	 * value. HW will merge the TS descriptors with
+	 * the same timestamp value into a single
+	 * descriptor.
+	 */
+	if (ts_id == txq->tsq->nb_ts_desc) {
+		uint16_t fetch = txq->tsq->nb_ts_desc - txq->nb_tx_desc;
+		ts_id = 0;
+		for (; ts_id < fetch; ts_id++)
+			txq->tsq->ice_ts_ring[ts_id].tx_desc_idx_tstamp = ts_desc;
+	}
+	return ts_id;
+}
 
-		/* set RS bit on the last descriptor of one packet */
-		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
-			PMD_TX_LOG(DEBUG,
-				   "Setting RS bit on TXD id="
-				   "%4u (port=%d queue=%d)",
-				   tx_last, txq->port_id, txq->queue_id);
+static void
+ice_write_ts_tail(struct ci_tx_queue *txq, uint16_t ts_tail)
+{
+	ICE_PCI_REG_WRITE(txq->qtx_tail, ts_tail);
+	txq->tsq->ts_tail = ts_tail;
+}
 
-			td_cmd |= CI_TX_DESC_CMD_RS;
+uint16_t
+ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	const struct ci_timestamp_queue_fns ts_fns = {
+		.get_ts_tail = ice_get_ts_tail,
+		.write_ts_desc = ice_write_ts_desc,
+		.write_ts_tail = ice_write_ts_tail,
+	};
+	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
 
-			/* Update txq RS bit counters */
-			txq->nb_tx_used = 0;
-		}
-		txd->cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S);
-
-		if (txq->tsq != NULL && txq->tsq->ts_flag > 0) {
-			uint64_t txtime = *RTE_MBUF_DYNFIELD(tx_pkt,
-					txq->tsq->ts_offset, uint64_t *);
-			uint32_t tstamp = (uint32_t)(txtime % NS_PER_S) >>
-						ICE_TXTIME_CTX_RESOLUTION_128NS;
-			const uint32_t desc_tx_id = (tx_id == 0) ? txq->nb_tx_desc : tx_id;
-			__le32 ts_desc = rte_cpu_to_le_32(FIELD_PREP(ICE_TXTIME_TX_DESC_IDX_M,
-					desc_tx_id) | FIELD_PREP(ICE_TXTIME_STAMP_M, tstamp));
-			txq->tsq->ice_ts_ring[ts_id].tx_desc_idx_tstamp = ts_desc;
-			ts_id++;
-			/* To prevent an MDD, when wrapping the tstamp
-			 * ring create additional TS descriptors equal
-			 * to the number of the fetch TS descriptors
-			 * value. HW will merge the TS descriptors with
-			 * the same timestamp value into a single
-			 * descriptor.
-			 */
-			if (ts_id == txq->tsq->nb_ts_desc) {
-				uint16_t fetch = txq->tsq->nb_ts_desc - txq->nb_tx_desc;
-				ts_id = 0;
-				for (; ts_id < fetch; ts_id++)
-					txq->tsq->ice_ts_ring[ts_id].tx_desc_idx_tstamp = ts_desc;
-			}
-		}
-	}
-end_of_tx:
-	/* update Tail register */
-	if (txq->tsq != NULL && txq->tsq->ts_flag > 0) {
-		ICE_PCI_REG_WRITE(txq->qtx_tail, ts_id);
-		txq->tsq->ts_tail = ts_id;
-	} else {
-		ICE_PCI_REG_WRITE(txq->qtx_tail, tx_id);
-	}
-	txq->tx_tail = tx_id;
+	if (txq->tsq != NULL && txq->tsq->ts_flag > 0)
+		return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, &ts_fns);
 
-	return nb_tx;
+	return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, NULL);
 }
 
 static __rte_always_inline int
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v4 12/35] net/i40e: use common scalar Tx function
  2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (10 preceding siblings ...)
  2026-02-09 16:45   ` [PATCH v4 11/35] net/intel: create a common scalar Tx function Bruce Richardson
@ 2026-02-09 16:45   ` Bruce Richardson
  2026-02-10 13:14     ` Burakov, Anatoly
  2026-02-09 16:45   ` [PATCH v4 13/35] net/intel: add IPsec hooks to common " Bruce Richardson
                     ` (22 subsequent siblings)
  34 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 16:45 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

Following earlier rework, the scalar transmit function for i40e can use
the common function previously moved over from ice driver. This saves
hundreds of duplicated lines of code.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/i40e/i40e_rxtx.c | 208 +----------------------------
 1 file changed, 2 insertions(+), 206 deletions(-)

diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index c318b4c84e..41310b4c6c 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -1018,212 +1018,8 @@ get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
 uint16_t
 i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
-	struct ci_tx_queue *txq;
-	struct ci_tx_entry *sw_ring;
-	struct ci_tx_entry *txe, *txn;
-	volatile struct ci_tx_desc *txd;
-	volatile struct ci_tx_desc *txr;
-	struct rte_mbuf *tx_pkt;
-	struct rte_mbuf *m_seg;
-	uint16_t tx_id;
-	uint16_t nb_tx;
-	uint32_t td_cmd;
-	uint32_t td_offset;
-	uint32_t td_tag;
-	uint64_t ol_flags;
-	uint16_t nb_used;
-	uint16_t nb_ctx;
-	uint16_t tx_last;
-	uint16_t slen;
-	uint16_t l2_len;
-	uint64_t buf_dma_addr;
-	union ci_tx_offload tx_offload = {0};
-
-	txq = tx_queue;
-	sw_ring = txq->sw_ring;
-	txr = txq->ci_tx_ring;
-	tx_id = txq->tx_tail;
-	txe = &sw_ring[tx_id];
-
-	/* Check if the descriptor ring needs to be cleaned. */
-	if (txq->nb_tx_free < txq->tx_free_thresh)
-		(void)ci_tx_xmit_cleanup(txq);
-
-	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
-		tx_pkt = *tx_pkts++;
-		RTE_MBUF_PREFETCH_TO_FREE(txe->mbuf);
-
-		ol_flags = tx_pkt->ol_flags;
-		td_cmd = 0;
-		td_tag = 0;
-		l2_len = ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK ?
-				tx_pkt->outer_l2_len : tx_pkt->l2_len;
-		td_offset = (l2_len >> 1) << CI_TX_DESC_LEN_MACLEN_S;
-		tx_offload.l2_len = tx_pkt->l2_len;
-		tx_offload.l3_len = tx_pkt->l3_len;
-		tx_offload.outer_l2_len = tx_pkt->outer_l2_len;
-		tx_offload.outer_l3_len = tx_pkt->outer_l3_len;
-		tx_offload.l4_len = tx_pkt->l4_len;
-		tx_offload.tso_segsz = tx_pkt->tso_segsz;
-
-		/* Calculate the number of context descriptors needed. */
-		uint64_t cd_qw0 = 0, cd_qw1 = 0;
-		nb_ctx = get_context_desc(ol_flags, tx_pkt, &tx_offload, txq,
-				&cd_qw0, &cd_qw1);
-
-		/**
-		 * The number of descriptors that must be allocated for
-		 * a packet equals to the number of the segments of that
-		 * packet plus 1 context descriptor if needed.
-		 * Recalculate the needed tx descs when TSO enabled in case
-		 * the mbuf data size exceeds max data size that hw allows
-		 * per tx desc.
-		 */
-		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
-		else
-			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
-		tx_last = (uint16_t)(tx_id + nb_used - 1);
-
-		/* Circular ring */
-		if (tx_last >= txq->nb_tx_desc)
-			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
-
-		if (nb_used > txq->nb_tx_free) {
-			if (ci_tx_xmit_cleanup(txq) != 0) {
-				if (nb_tx == 0)
-					return 0;
-				goto end_of_tx;
-			}
-			if (unlikely(nb_used > txq->tx_rs_thresh)) {
-				while (nb_used > txq->nb_tx_free) {
-					if (ci_tx_xmit_cleanup(txq) != 0) {
-						if (nb_tx == 0)
-							return 0;
-						goto end_of_tx;
-					}
-				}
-			}
-		}
-
-		/* Descriptor based VLAN insertion */
-		if (ol_flags & (RTE_MBUF_F_TX_VLAN | RTE_MBUF_F_TX_QINQ)) {
-			td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
-			td_tag = tx_pkt->vlan_tci;
-		}
-
-		/* Always enable CRC offload insertion */
-		td_cmd |= CI_TX_DESC_CMD_ICRC;
-
-		/* Enable checksum offloading */
-		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
-			ci_txd_enable_checksum(ol_flags, &td_cmd,
-						 &td_offset, tx_offload);
-
-		if (nb_ctx) {
-			/* Setup TX context descriptor if required */
-			uint64_t *desc = RTE_CAST_PTR(uint64_t *, &txr[tx_id]);
-
-			txn = &sw_ring[txe->next_id];
-			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
-			if (txe->mbuf != NULL) {
-				rte_pktmbuf_free_seg(txe->mbuf);
-				txe->mbuf = NULL;
-			}
-
-			desc[0] = cd_qw0;
-			desc[1] = cd_qw1;
-
-			PMD_TX_LOG(DEBUG, "mbuf: %p, TCD[%u]: "
-				"qw0: %#"PRIx64"; "
-				"qw1: %#"PRIx64";",
-				tx_pkt, tx_id, cd_qw0, cd_qw1);
-
-			txe->last_id = tx_last;
-			tx_id = txe->next_id;
-			txe = txn;
-		}
-
-		m_seg = tx_pkt;
-		do {
-			txd = &txr[tx_id];
-			txn = &sw_ring[txe->next_id];
-
-			if (txe->mbuf)
-				rte_pktmbuf_free_seg(txe->mbuf);
-			txe->mbuf = m_seg;
-
-			/* Setup TX Descriptor */
-			slen = m_seg->data_len;
-			buf_dma_addr = rte_mbuf_data_iova(m_seg);
-
-			while ((ol_flags & RTE_MBUF_F_TX_TCP_SEG) &&
-				unlikely(slen > CI_MAX_DATA_PER_TXD)) {
-				txd->buffer_addr =
-					rte_cpu_to_le_64(buf_dma_addr);
-				txd->cmd_type_offset_bsz =
-					i40e_build_ctob(td_cmd,
-					td_offset, CI_MAX_DATA_PER_TXD,
-					td_tag);
-
-				buf_dma_addr += CI_MAX_DATA_PER_TXD;
-				slen -= CI_MAX_DATA_PER_TXD;
-
-				txe->last_id = tx_last;
-				tx_id = txe->next_id;
-				txe = txn;
-				txd = &txr[tx_id];
-				txn = &sw_ring[txe->next_id];
-			}
-			PMD_TX_LOG(DEBUG, "mbuf: %p, TDD[%u]: "
-				"buf_dma_addr: %#"PRIx64"; "
-				"td_cmd: %#x; "
-				"td_offset: %#x; "
-				"td_len: %u; "
-				"td_tag: %#x;",
-				tx_pkt, tx_id, buf_dma_addr,
-				td_cmd, td_offset, slen, td_tag);
-
-			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-			txd->cmd_type_offset_bsz = i40e_build_ctob(td_cmd,
-						td_offset, slen, td_tag);
-			txe->last_id = tx_last;
-			tx_id = txe->next_id;
-			txe = txn;
-			m_seg = m_seg->next;
-		} while (m_seg != NULL);
-
-		/* The last packet data descriptor needs End Of Packet (EOP) */
-		td_cmd |= CI_TX_DESC_CMD_EOP;
-		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
-		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
-
-		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
-			PMD_TX_LOG(DEBUG,
-				   "Setting RS bit on TXD id="
-				   "%4u (port=%d queue=%d)",
-				   tx_last, txq->port_id, txq->queue_id);
-
-			td_cmd |= CI_TX_DESC_CMD_RS;
-
-			/* Update txq RS bit counters */
-			txq->nb_tx_used = 0;
-		}
-
-		txd->cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S);
-	}
-
-end_of_tx:
-	PMD_TX_LOG(DEBUG, "port_id=%u queue_id=%u tx_tail=%u nb_tx=%u",
-		   (unsigned) txq->port_id, (unsigned) txq->queue_id,
-		   (unsigned) tx_id, (unsigned) nb_tx);
-
-	rte_io_wmb();
-	I40E_PCI_REG_WC_WRITE_RELAXED(txq->qtx_tail, tx_id);
-	txq->tx_tail = tx_id;
-
-	return nb_tx;
+	/* i40e does not support timestamp queues, so pass NULL for ts_fns */
+	return ci_xmit_pkts(tx_queue, tx_pkts, nb_pkts, get_context_desc, NULL);
 }
 
 static __rte_always_inline int
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v4 13/35] net/intel: add IPsec hooks to common Tx function
  2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (11 preceding siblings ...)
  2026-02-09 16:45   ` [PATCH v4 12/35] net/i40e: use " Bruce Richardson
@ 2026-02-09 16:45   ` Bruce Richardson
  2026-02-10 13:16     ` Burakov, Anatoly
  2026-02-09 16:45   ` [PATCH v4 14/35] net/intel: support configurable VLAN tag insertion on Tx Bruce Richardson
                     ` (21 subsequent siblings)
  34 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 16:45 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

The iavf driver has IPsec offload support on Tx, so add hooks to the
common Tx function to support that. Do so in a way that has zero
performance impact for drivers which do not have IPSec support, by
passing in compile-time NULL constants for the function pointers, which
can be optimized away by the compiler.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar.h | 60 ++++++++++++++++++++++++++--
 drivers/net/intel/i40e/i40e_rxtx.c   |  4 +-
 drivers/net/intel/ice/ice_rxtx.c     |  4 +-
 3 files changed, 60 insertions(+), 8 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index 4d041fac99..af50bda820 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -142,6 +142,24 @@ typedef uint16_t (*ci_get_ctx_desc_fn)(uint64_t ol_flags, const struct rte_mbuf
 		const union ci_tx_offload *tx_offload, const struct ci_tx_queue *txq,
 		uint64_t *qw0, uint64_t *qw1);
 
+/* gets IPsec descriptor information and returns number of descriptors needed (0 or 1) */
+typedef uint16_t (*get_ipsec_desc_t)(const struct rte_mbuf *mbuf,
+		const struct ci_tx_queue *txq,
+		void **ipsec_metadata,
+		uint64_t *qw0,
+		uint64_t *qw1);
+/* calculates segment length for IPsec + TSO combinations */
+typedef uint16_t (*calc_ipsec_segment_len_t)(const struct rte_mbuf *mb_seg,
+		uint64_t ol_flags,
+		const void *ipsec_metadata,
+		uint16_t tlen);
+
+/** IPsec descriptor operations for drivers that support inline IPsec crypto. */
+struct ci_ipsec_ops {
+	get_ipsec_desc_t get_ipsec_desc;
+	calc_ipsec_segment_len_t calc_segment_len;
+};
+
 /* gets current timestamp tail index */
 typedef uint16_t (*get_ts_tail_t)(struct ci_tx_queue *txq);
 /* writes a timestamp descriptor and returns new tail index */
@@ -161,6 +179,7 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
 	     uint16_t nb_pkts,
 	     ci_get_ctx_desc_fn get_ctx_desc,
+	     const struct ci_ipsec_ops *ipsec_ops,
 	     const struct ci_timestamp_queue_fns *ts_fns)
 {
 	volatile struct ci_tx_desc *ci_tx_ring;
@@ -197,6 +216,9 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 		(void)ci_tx_xmit_cleanup(txq);
 
 	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
+		void *ipsec_md = NULL;
+		uint16_t nb_ipsec = 0;
+		uint64_t ipsec_qw0 = 0, ipsec_qw1 = 0;
 		uint64_t cd_qw0, cd_qw1;
 		tx_pkt = *tx_pkts++;
 
@@ -218,17 +240,22 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 		/* Calculate the number of context descriptors needed. */
 		nb_ctx = get_ctx_desc(ol_flags, tx_pkt, &tx_offload, txq, &cd_qw0, &cd_qw1);
 
+		/* Get IPsec descriptor information if IPsec ops provided */
+		if (ipsec_ops != NULL)
+			nb_ipsec = ipsec_ops->get_ipsec_desc(tx_pkt, txq, &ipsec_md,
+					&ipsec_qw0, &ipsec_qw1);
+
 		/* The number of descriptors that must be allocated for
 		 * a packet equals to the number of the segments of that
-		 * packet plus the number of context descriptor if needed.
+		 * packet plus the number of context and IPsec descriptors if needed.
 		 * Recalculate the needed tx descs when TSO enabled in case
 		 * the mbuf data size exceeds max data size that hw allows
 		 * per tx desc.
 		 */
 		if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
-			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
+			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx + nb_ipsec);
 		else
-			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
+			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx + nb_ipsec);
 		tx_last = (uint16_t)(tx_id + nb_used - 1);
 
 		/* Circular ring */
@@ -281,6 +308,26 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			tx_id = txe->next_id;
 			txe = txn;
 		}
+
+		if (ipsec_ops != NULL && nb_ipsec > 0) {
+			/* Setup TX IPsec descriptor if required */
+			uint64_t *ipsec_txd = RTE_CAST_PTR(uint64_t *, &ci_tx_ring[tx_id]);
+
+			txn = &sw_ring[txe->next_id];
+			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
+			if (txe->mbuf) {
+				rte_pktmbuf_free_seg(txe->mbuf);
+				txe->mbuf = NULL;
+			}
+
+			ipsec_txd[0] = ipsec_qw0;
+			ipsec_txd[1] = ipsec_qw1;
+
+			txe->last_id = tx_last;
+			tx_id = txe->next_id;
+			txe = txn;
+		}
+
 		m_seg = tx_pkt;
 
 		do {
@@ -292,7 +339,12 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			txe->mbuf = m_seg;
 
 			/* Setup TX Descriptor */
-			slen = m_seg->data_len;
+			/* Calculate segment length, using IPsec callback if provided */
+			if (ipsec_ops != NULL)
+				slen = ipsec_ops->calc_segment_len(m_seg, ol_flags, ipsec_md, 0);
+			else
+				slen = m_seg->data_len;
+
 			buf_dma_addr = rte_mbuf_data_iova(m_seg);
 
 			while ((ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) &&
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 41310b4c6c..4e362e737e 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -1018,8 +1018,8 @@ get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
 uint16_t
 i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
-	/* i40e does not support timestamp queues, so pass NULL for ts_fns */
-	return ci_xmit_pkts(tx_queue, tx_pkts, nb_pkts, get_context_desc, NULL);
+	/* i40e does not support IPsec or timestamp queues, so pass NULL for both */
+	return ci_xmit_pkts(tx_queue, tx_pkts, nb_pkts, get_context_desc, NULL, NULL);
 }
 
 static __rte_always_inline int
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 561a6617a6..9643dd3817 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3105,9 +3105,9 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
 
 	if (txq->tsq != NULL && txq->tsq->ts_flag > 0)
-		return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, &ts_fns);
+		return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, NULL, &ts_fns);
 
-	return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, NULL);
+	return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, NULL, NULL);
 }
 
 static __rte_always_inline int
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v4 14/35] net/intel: support configurable VLAN tag insertion on Tx
  2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (12 preceding siblings ...)
  2026-02-09 16:45   ` [PATCH v4 13/35] net/intel: add IPsec hooks to common " Bruce Richardson
@ 2026-02-09 16:45   ` Bruce Richardson
  2026-02-10 13:21     ` Burakov, Anatoly
  2026-02-09 16:45   ` [PATCH v4 15/35] net/iavf: use common scalar Tx function Bruce Richardson
                     ` (20 subsequent siblings)
  34 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 16:45 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

Make the VLAN tag insertion logic configurable in the common code, as to
where inner/outer tags get placed.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h        | 10 ++++++++++
 drivers/net/intel/common/tx_scalar.h |  9 +++++++--
 drivers/net/intel/i40e/i40e_rxtx.c   |  6 +++---
 drivers/net/intel/ice/ice_rxtx.c     |  5 +++--
 4 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 23deabc5d1..f0229314a0 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -45,6 +45,16 @@
 #define CI_TX_CTX_DESC_TSYN             0x02
 #define CI_TX_CTX_DESC_IL2TAG2          0x04
 
+/**
+ * L2TAG1 Field Source Selection
+ * Specifies which mbuf VLAN field to use for the L2TAG1 field in data descriptors.
+ * Context descriptor VLAN handling (L2TAG2) is managed by driver-specific callbacks.
+ */
+enum ci_tx_l2tag1_field {
+	CI_VLAN_IN_L2TAG1,       /**< For VLAN (not QinQ), use L2Tag1 field in data desc */
+	CI_VLAN_IN_L2TAG2,       /**< For VLAN (not QinQ), use L2Tag2 field in ctx desc */
+};
+
 /* Common TX Descriptor Length Field Shifts */
 #define CI_TX_DESC_LEN_MACLEN_S         0  /* 7 BITS */
 #define CI_TX_DESC_LEN_IPLEN_S          7  /* 7 BITS */
diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index af50bda820..3248f9423d 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -178,6 +178,7 @@ static inline uint16_t
 ci_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
 	     uint16_t nb_pkts,
+	     enum ci_tx_l2tag1_field l2tag1_field,
 	     ci_get_ctx_desc_fn get_ctx_desc,
 	     const struct ci_ipsec_ops *ipsec_ops,
 	     const struct ci_timestamp_queue_fns *ts_fns)
@@ -279,8 +280,12 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			}
 		}
 
-		/* Descriptor based VLAN insertion */
-		if (ol_flags & (RTE_MBUF_F_TX_VLAN | RTE_MBUF_F_TX_QINQ)) {
+		/* Descriptor based VLAN/QinQ insertion */
+		/* for single vlan offload, only insert in data desc with VLAN_IN_L2TAG1 is set
+		 * for qinq offload, we always put inner tag in L2Tag1
+		 */
+		if (((ol_flags & RTE_MBUF_F_TX_VLAN) && l2tag1_field == CI_VLAN_IN_L2TAG1) ||
+				(ol_flags & RTE_MBUF_F_TX_QINQ)) {
 			td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
 			td_tag = tx_pkt->vlan_tci;
 		}
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 4e362e737e..35c1b53c1e 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -1004,8 +1004,7 @@ get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
 	/* TX context descriptor based double VLAN insert */
 	if (ol_flags & RTE_MBUF_F_TX_QINQ) {
 		cd_l2tag2 = tx_pkt->vlan_tci_outer;
-		cd_type_cmd_tso_mss |=
-				((uint64_t)I40E_TX_CTX_DESC_IL2TAG2 << I40E_TXD_CTX_QW1_CMD_SHIFT);
+		cd_type_cmd_tso_mss |= (I40E_TX_CTX_DESC_IL2TAG2 << I40E_TXD_CTX_QW1_CMD_SHIFT);
 	}
 
 	*qw0 = rte_cpu_to_le_32(cd_tunneling_params) |
@@ -1019,7 +1018,8 @@ uint16_t
 i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
 	/* i40e does not support IPsec or timestamp queues, so pass NULL for both */
-	return ci_xmit_pkts(tx_queue, tx_pkts, nb_pkts, get_context_desc, NULL, NULL);
+	return ci_xmit_pkts(tx_queue, tx_pkts, nb_pkts, CI_VLAN_IN_L2TAG1,
+			get_context_desc, NULL, NULL);
 }
 
 static __rte_always_inline int
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 9643dd3817..111cb5e37f 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3105,9 +3105,10 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
 
 	if (txq->tsq != NULL && txq->tsq->ts_flag > 0)
-		return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, NULL, &ts_fns);
+		return ci_xmit_pkts(txq, tx_pkts, nb_pkts, CI_VLAN_IN_L2TAG1,
+				get_context_desc, NULL, &ts_fns);
 
-	return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, NULL, NULL);
+	return ci_xmit_pkts(txq, tx_pkts, nb_pkts, CI_VLAN_IN_L2TAG1, get_context_desc, NULL, NULL);
 }
 
 static __rte_always_inline int
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v4 15/35] net/iavf: use common scalar Tx function
  2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (13 preceding siblings ...)
  2026-02-09 16:45   ` [PATCH v4 14/35] net/intel: support configurable VLAN tag insertion on Tx Bruce Richardson
@ 2026-02-09 16:45   ` Bruce Richardson
  2026-02-10 13:27     ` Burakov, Anatoly
  2026-02-09 16:45   ` [PATCH v4 16/35] net/i40e: document requirement for QinQ support Bruce Richardson
                     ` (19 subsequent siblings)
  34 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 16:45 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Vladimir Medvedkin

Now that the common scalar Tx function has all necessary hooks for the
features supported by the iavf driver, use the common function to avoid
duplicated code.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar.h |   3 +-
 drivers/net/intel/iavf/iavf_rxtx.c   | 534 ++++++---------------------
 drivers/net/intel/iavf/iavf_rxtx.h   |   1 +
 3 files changed, 112 insertions(+), 426 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index 3248f9423d..8b503dd4de 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -226,7 +226,8 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 		ol_flags = tx_pkt->ol_flags;
 		td_cmd = CI_TX_DESC_CMD_ICRC;
 		td_tag = 0;
-		l2_len = ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK ?
+		l2_len = (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK &&
+					!(ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD)) ?
 				tx_pkt->outer_l2_len : tx_pkt->l2_len;
 		td_offset = (l2_len >> 1) << CI_TX_DESC_LEN_MACLEN_S;
 
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index 3dbcfd5355..2ea00e1975 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -2326,7 +2326,7 @@ iavf_recv_pkts_bulk_alloc(void *rx_queue,
 
 /* Check if the context descriptor is needed for TX offloading */
 static inline uint16_t
-iavf_calc_context_desc(struct rte_mbuf *mb, uint8_t vlan_flag)
+iavf_calc_context_desc(const struct rte_mbuf *mb, uint8_t vlan_flag)
 {
 	uint64_t flags = mb->ol_flags;
 	if (flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG |
@@ -2344,44 +2344,7 @@ iavf_calc_context_desc(struct rte_mbuf *mb, uint8_t vlan_flag)
 }
 
 static inline void
-iavf_fill_ctx_desc_cmd_field(volatile uint64_t *field, struct rte_mbuf *m,
-		uint8_t vlan_flag)
-{
-	uint64_t cmd = 0;
-
-	/* TSO enabled */
-	if (m->ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
-		cmd = CI_TX_CTX_DESC_TSO << IAVF_TXD_CTX_QW1_CMD_SHIFT;
-
-	if ((m->ol_flags & RTE_MBUF_F_TX_VLAN &&
-			vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2) ||
-			m->ol_flags & RTE_MBUF_F_TX_QINQ) {
-		cmd |= CI_TX_CTX_DESC_IL2TAG2
-			<< IAVF_TXD_CTX_QW1_CMD_SHIFT;
-	}
-
-	if (IAVF_CHECK_TX_LLDP(m))
-		cmd |= IAVF_TX_CTX_DESC_SWTCH_UPLINK
-			<< IAVF_TXD_CTX_QW1_CMD_SHIFT;
-
-	*field |= cmd;
-}
-
-static inline void
-iavf_fill_ctx_desc_ipsec_field(volatile uint64_t *field,
-	struct iavf_ipsec_crypto_pkt_metadata *ipsec_md)
-{
-	uint64_t ipsec_field =
-		(uint64_t)ipsec_md->ctx_desc_ipsec_params <<
-			IAVF_TXD_CTX_QW1_IPSEC_PARAMS_CIPHERBLK_SHIFT;
-
-	*field |= ipsec_field;
-}
-
-
-static inline void
-iavf_fill_ctx_desc_tunnelling_field(volatile uint64_t *qw0,
-		const struct rte_mbuf *m)
+iavf_fill_ctx_desc_tunnelling_field(uint64_t *qw0, const struct rte_mbuf *m)
 {
 	uint64_t eip_typ = IAVF_TX_CTX_DESC_EIPT_NONE;
 	uint64_t eip_len = 0;
@@ -2456,7 +2419,7 @@ iavf_fill_ctx_desc_tunnelling_field(volatile uint64_t *qw0,
 
 static inline uint16_t
 iavf_fill_ctx_desc_segmentation_field(volatile uint64_t *field,
-	struct rte_mbuf *m, struct iavf_ipsec_crypto_pkt_metadata *ipsec_md)
+	const struct rte_mbuf *m, struct iavf_ipsec_crypto_pkt_metadata *ipsec_md)
 {
 	uint64_t segmentation_field = 0;
 	uint64_t total_length = 0;
@@ -2495,59 +2458,31 @@ struct iavf_tx_context_desc_qws {
 	__le64 qw1;
 };
 
-static inline void
-iavf_fill_context_desc(volatile struct iavf_tx_context_desc *desc,
-	struct rte_mbuf *m, struct iavf_ipsec_crypto_pkt_metadata *ipsec_md,
-	uint16_t *tlen, uint8_t vlan_flag)
+/* IPsec callback for ci_xmit_pkts - gets IPsec descriptor information */
+static uint16_t
+iavf_get_ipsec_desc(const struct rte_mbuf *mbuf, const struct ci_tx_queue *txq,
+		    void **ipsec_metadata, uint64_t *qw0, uint64_t *qw1)
 {
-	volatile struct iavf_tx_context_desc_qws *desc_qws =
-			(volatile struct iavf_tx_context_desc_qws *)desc;
-	/* fill descriptor type field */
-	desc_qws->qw1 = IAVF_TX_DESC_DTYPE_CONTEXT;
-
-	/* fill command field */
-	iavf_fill_ctx_desc_cmd_field(&desc_qws->qw1, m, vlan_flag);
-
-	/* fill segmentation field */
-	if (m->ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) {
-		/* fill IPsec field */
-		if (m->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD)
-			iavf_fill_ctx_desc_ipsec_field(&desc_qws->qw1,
-				ipsec_md);
-
-		*tlen = iavf_fill_ctx_desc_segmentation_field(&desc_qws->qw1,
-				m, ipsec_md);
-	}
-
-	/* fill tunnelling field */
-	if (m->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)
-		iavf_fill_ctx_desc_tunnelling_field(&desc_qws->qw0, m);
-	else
-		desc_qws->qw0 = 0;
+	struct iavf_ipsec_crypto_pkt_metadata *md;
 
-	desc_qws->qw0 = rte_cpu_to_le_64(desc_qws->qw0);
-	desc_qws->qw1 = rte_cpu_to_le_64(desc_qws->qw1);
+	if (!(mbuf->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD))
+		return 0;
 
-	/* vlan_flag specifies VLAN tag location for VLAN, and outer tag location for QinQ. */
-	if (m->ol_flags & RTE_MBUF_F_TX_QINQ)
-		desc->l2tag2 = vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2 ? m->vlan_tci_outer :
-						m->vlan_tci;
-	else if (m->ol_flags & RTE_MBUF_F_TX_VLAN && vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2)
-		desc->l2tag2 = m->vlan_tci;
-}
+	md = RTE_MBUF_DYNFIELD(mbuf, txq->ipsec_crypto_pkt_md_offset,
+				     struct iavf_ipsec_crypto_pkt_metadata *);
+	if (!md)
+		return 0;
 
+	*ipsec_metadata = md;
 
-static inline void
-iavf_fill_ipsec_desc(volatile struct iavf_tx_ipsec_desc *desc,
-	const struct iavf_ipsec_crypto_pkt_metadata *md, uint16_t *ipsec_len)
-{
-	desc->qw0 = rte_cpu_to_le_64(((uint64_t)md->l4_payload_len <<
+	/* Fill IPsec descriptor using existing logic */
+	*qw0 = rte_cpu_to_le_64(((uint64_t)md->l4_payload_len <<
 		IAVF_IPSEC_TX_DESC_QW0_L4PAYLEN_SHIFT) |
 		((uint64_t)md->esn << IAVF_IPSEC_TX_DESC_QW0_IPSECESN_SHIFT) |
 		((uint64_t)md->esp_trailer_len <<
 				IAVF_IPSEC_TX_DESC_QW0_TRAILERLEN_SHIFT));
 
-	desc->qw1 = rte_cpu_to_le_64(((uint64_t)md->sa_idx <<
+	*qw1 = rte_cpu_to_le_64(((uint64_t)md->sa_idx <<
 		IAVF_IPSEC_TX_DESC_QW1_IPSECSA_SHIFT) |
 		((uint64_t)md->next_proto <<
 				IAVF_IPSEC_TX_DESC_QW1_IPSECNH_SHIFT) |
@@ -2556,143 +2491,106 @@ iavf_fill_ipsec_desc(volatile struct iavf_tx_ipsec_desc *desc,
 		((uint64_t)(md->ol_flags & IAVF_IPSEC_CRYPTO_OL_FLAGS_NATT ?
 				1ULL : 0ULL) <<
 				IAVF_IPSEC_TX_DESC_QW1_UDP_SHIFT) |
-		(uint64_t)IAVF_TX_DESC_DTYPE_IPSEC);
+		((uint64_t)IAVF_TX_DESC_DTYPE_IPSEC <<
+				CI_TXD_QW1_DTYPE_S));
 
-	/**
-	 * TODO: Pre-calculate this in the Session initialization
-	 *
-	 * Calculate IPsec length required in data descriptor func when TSO
-	 * offload is enabled
-	 */
-	*ipsec_len = sizeof(struct rte_esp_hdr) + (md->len_iv >> 2) +
-			(md->ol_flags & IAVF_IPSEC_CRYPTO_OL_FLAGS_NATT ?
-			sizeof(struct rte_udp_hdr) : 0);
+	return 1; /* One IPsec descriptor needed */
 }
 
-static inline void
-iavf_build_data_desc_cmd_offset_fields(volatile uint64_t *qw1,
-		struct rte_mbuf *m, uint8_t vlan_flag)
+/* IPsec callback for ci_xmit_pkts - calculates segment length for IPsec+TSO */
+static uint16_t
+iavf_calc_ipsec_segment_len(const struct rte_mbuf *mb_seg, uint64_t ol_flags,
+			    const void *ipsec_metadata, uint16_t tlen)
 {
-	uint64_t command = 0;
-	uint64_t offset = 0;
-	uint64_t l2tag1 = 0;
-
-	*qw1 = CI_TX_DESC_DTYPE_DATA;
-
-	command = (uint64_t)CI_TX_DESC_CMD_ICRC;
-
-	/* Descriptor based VLAN insertion */
-	if ((vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1) &&
-			m->ol_flags & RTE_MBUF_F_TX_VLAN) {
-		command |= (uint64_t)CI_TX_DESC_CMD_IL2TAG1;
-		l2tag1 |= m->vlan_tci;
-	}
-
-	/* Descriptor based QinQ insertion. vlan_flag specifies outer tag location. */
-	if (m->ol_flags & RTE_MBUF_F_TX_QINQ) {
-		command |= (uint64_t)CI_TX_DESC_CMD_IL2TAG1;
-		l2tag1 = vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1 ? m->vlan_tci_outer :
-									m->vlan_tci;
+	const struct iavf_ipsec_crypto_pkt_metadata *ipsec_md = ipsec_metadata;
+
+	if ((ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD) &&
+	    (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))) {
+		uint16_t ipseclen = ipsec_md ? (ipsec_md->esp_trailer_len +
+						ipsec_md->len_iv) : 0;
+		uint16_t slen = tlen + mb_seg->l2_len + mb_seg->l3_len +
+				mb_seg->outer_l3_len + ipseclen;
+		if (ol_flags & RTE_MBUF_F_TX_L4_MASK)
+			slen += mb_seg->l4_len;
+		return slen;
 	}
 
-	if ((m->ol_flags &
-	    (CI_TX_CKSUM_OFFLOAD_MASK | RTE_MBUF_F_TX_SEC_OFFLOAD)) == 0)
-		goto skip_cksum;
+	return mb_seg->data_len;
+}
 
-	/* Set MACLEN */
-	if (m->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK &&
-			!(m->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD))
-		offset |= (m->outer_l2_len >> 1)
-			<< CI_TX_DESC_LEN_MACLEN_S;
-	else
-		offset |= (m->l2_len >> 1)
-			<< CI_TX_DESC_LEN_MACLEN_S;
+/* Context descriptor callback for ci_xmit_pkts */
+static uint16_t
+iavf_get_context_desc(uint64_t ol_flags, const struct rte_mbuf *mbuf,
+		      const union ci_tx_offload *tx_offload __rte_unused,
+		      const struct ci_tx_queue *txq,
+		      uint64_t *qw0, uint64_t *qw1)
+{
+	uint8_t iavf_vlan_flag;
+	uint16_t cd_l2tag2 = 0;
+	uint64_t cd_type_cmd = IAVF_TX_DESC_DTYPE_CONTEXT;
+	uint64_t cd_tunneling_params = 0;
+	uint16_t tlen = 0;
+	struct iavf_ipsec_crypto_pkt_metadata *ipsec_md = NULL;
+
+	/* Use IAVF-specific vlan_flag from txq */
+	iavf_vlan_flag = txq->vlan_flag;
+
+	/* Check if context descriptor is needed using existing IAVF logic */
+	if (!iavf_calc_context_desc(mbuf, iavf_vlan_flag))
+		return 0;
 
-	/* Enable L3 checksum offloading inner */
-	if (m->ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
-		if (m->ol_flags & RTE_MBUF_F_TX_IPV4) {
-			command |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
-			offset |= (m->l3_len >> 2) << CI_TX_DESC_LEN_IPLEN_S;
-		}
-	} else if (m->ol_flags & RTE_MBUF_F_TX_IPV4) {
-		command |= CI_TX_DESC_CMD_IIPT_IPV4;
-		offset |= (m->l3_len >> 2) << CI_TX_DESC_LEN_IPLEN_S;
-	} else if (m->ol_flags & RTE_MBUF_F_TX_IPV6) {
-		command |= CI_TX_DESC_CMD_IIPT_IPV6;
-		offset |= (m->l3_len >> 2) << CI_TX_DESC_LEN_IPLEN_S;
+	/* Get IPsec metadata if needed */
+	if (ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD) {
+		ipsec_md = RTE_MBUF_DYNFIELD(mbuf, txq->ipsec_crypto_pkt_md_offset,
+					     struct iavf_ipsec_crypto_pkt_metadata *);
 	}
 
-	if (m->ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) {
-		if (m->ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-			command |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
-		else
-			command |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
-		offset |= (m->l4_len >> 2) <<
-			      CI_TX_DESC_LEN_L4_LEN_S;
+	/* TSO command field */
+	if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) {
+		cd_type_cmd |= (uint64_t)CI_TX_CTX_DESC_TSO << IAVF_TXD_CTX_QW1_CMD_SHIFT;
 
-		*qw1 = rte_cpu_to_le_64((((uint64_t)command <<
-			IAVF_TXD_DATA_QW1_CMD_SHIFT) & IAVF_TXD_DATA_QW1_CMD_MASK) |
-			(((uint64_t)offset << IAVF_TXD_DATA_QW1_OFFSET_SHIFT) &
-			IAVF_TXD_DATA_QW1_OFFSET_MASK) |
-			((uint64_t)l2tag1 << IAVF_TXD_DATA_QW1_L2TAG1_SHIFT));
+		/* IPsec field for TSO */
+		if (ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD && ipsec_md) {
+			uint64_t ipsec_field = (uint64_t)ipsec_md->ctx_desc_ipsec_params <<
+				IAVF_TXD_CTX_QW1_IPSEC_PARAMS_CIPHERBLK_SHIFT;
+			cd_type_cmd |= ipsec_field;
+		}
 
-		return;
+		/* TSO segmentation field */
+		tlen = iavf_fill_ctx_desc_segmentation_field(&cd_type_cmd,
+							     mbuf, ipsec_md);
+		(void)tlen; /* Suppress unused variable warning */
 	}
 
-	/* Enable L4 checksum offloads */
-	switch (m->ol_flags & RTE_MBUF_F_TX_L4_MASK) {
-	case RTE_MBUF_F_TX_TCP_CKSUM:
-		command |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
-		offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
-				CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	case RTE_MBUF_F_TX_SCTP_CKSUM:
-		command |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
-		offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
-				CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	case RTE_MBUF_F_TX_UDP_CKSUM:
-		command |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
-		offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
-				CI_TX_DESC_LEN_L4_LEN_S;
-		break;
+	/* VLAN field for L2TAG2 */
+	if ((ol_flags & RTE_MBUF_F_TX_VLAN &&
+	     iavf_vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2) ||
+	    ol_flags & RTE_MBUF_F_TX_QINQ) {
+		cd_type_cmd |= (uint64_t)CI_TX_CTX_DESC_IL2TAG2 << IAVF_TXD_CTX_QW1_CMD_SHIFT;
 	}
 
-skip_cksum:
-	*qw1 = rte_cpu_to_le_64((((uint64_t)command <<
-		IAVF_TXD_DATA_QW1_CMD_SHIFT) & IAVF_TXD_DATA_QW1_CMD_MASK) |
-		(((uint64_t)offset << IAVF_TXD_DATA_QW1_OFFSET_SHIFT) &
-		IAVF_TXD_DATA_QW1_OFFSET_MASK) |
-		((uint64_t)l2tag1 << IAVF_TXD_DATA_QW1_L2TAG1_SHIFT));
-}
-
-static inline void
-iavf_fill_data_desc(volatile struct ci_tx_desc *desc,
-	uint64_t desc_template,	uint16_t buffsz,
-	uint64_t buffer_addr)
-{
-	/* fill data descriptor qw1 from template */
-	desc->cmd_type_offset_bsz = desc_template;
-
-	/* set data buffer size */
-	desc->cmd_type_offset_bsz |=
-		(((uint64_t)buffsz << IAVF_TXD_DATA_QW1_TX_BUF_SZ_SHIFT) &
-		IAVF_TXD_DATA_QW1_TX_BUF_SZ_MASK);
-
-	desc->buffer_addr = rte_cpu_to_le_64(buffer_addr);
-	desc->cmd_type_offset_bsz = rte_cpu_to_le_64(desc->cmd_type_offset_bsz);
-}
-
+	/* LLDP switching field */
+	if (IAVF_CHECK_TX_LLDP(mbuf))
+		cd_type_cmd |= IAVF_TX_CTX_DESC_SWTCH_UPLINK << IAVF_TXD_CTX_QW1_CMD_SHIFT;
+
+	/* Tunneling field */
+	if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)
+		iavf_fill_ctx_desc_tunnelling_field((uint64_t *)&cd_tunneling_params, mbuf);
+
+	/* L2TAG2 field (VLAN) */
+	if (ol_flags & RTE_MBUF_F_TX_QINQ) {
+		cd_l2tag2 = iavf_vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2 ?
+			    mbuf->vlan_tci_outer : mbuf->vlan_tci;
+	} else if (ol_flags & RTE_MBUF_F_TX_VLAN &&
+		   iavf_vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2) {
+		cd_l2tag2 = mbuf->vlan_tci;
+	}
 
-static struct iavf_ipsec_crypto_pkt_metadata *
-iavf_ipsec_crypto_get_pkt_metadata(const struct ci_tx_queue *txq,
-		struct rte_mbuf *m)
-{
-	if (m->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD)
-		return RTE_MBUF_DYNFIELD(m, txq->ipsec_crypto_pkt_md_offset,
-				struct iavf_ipsec_crypto_pkt_metadata *);
+	/* Set outputs */
+	*qw0 = rte_cpu_to_le_64(cd_tunneling_params | ((uint64_t)cd_l2tag2 << 32));
+	*qw1 = rte_cpu_to_le_64(cd_type_cmd);
 
-	return NULL;
+	return 1; /* One context descriptor needed */
 }
 
 /* TX function */
@@ -2700,231 +2598,17 @@ uint16_t
 iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = tx_queue;
-	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
-	struct ci_tx_entry *txe_ring = txq->sw_ring;
-	struct ci_tx_entry *txe, *txn;
-	struct rte_mbuf *mb, *mb_seg;
-	uint64_t buf_dma_addr;
-	uint16_t desc_idx, desc_idx_last;
-	uint16_t idx;
-	uint16_t slen;
-
-
-	/* Check if the descriptor ring needs to be cleaned. */
-	if (txq->nb_tx_free < txq->tx_free_thresh)
-		ci_tx_xmit_cleanup(txq);
-
-	desc_idx = txq->tx_tail;
-	txe = &txe_ring[desc_idx];
-
-	for (idx = 0; idx < nb_pkts; idx++) {
-		volatile struct ci_tx_desc *ddesc;
-		struct iavf_ipsec_crypto_pkt_metadata *ipsec_md;
-
-		uint16_t nb_desc_ctx, nb_desc_ipsec;
-		uint16_t nb_desc_data, nb_desc_required;
-		uint16_t tlen = 0, ipseclen = 0;
-		uint64_t ddesc_template = 0;
-		uint64_t ddesc_cmd = 0;
-
-		mb = tx_pkts[idx];
 
-		RTE_MBUF_PREFETCH_TO_FREE(txe->mbuf);
-
-		/**
-		 * Get metadata for ipsec crypto from mbuf dynamic fields if
-		 * security offload is specified.
-		 */
-		ipsec_md = iavf_ipsec_crypto_get_pkt_metadata(txq, mb);
-
-		nb_desc_data = mb->nb_segs;
-		nb_desc_ctx =
-			iavf_calc_context_desc(mb, txq->vlan_flag);
-		nb_desc_ipsec = !!(mb->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD);
-
-		/**
-		 * The number of descriptors that must be allocated for
-		 * a packet equals to the number of the segments of that
-		 * packet plus the context and ipsec descriptors if needed.
-		 * Recalculate the needed tx descs when TSO enabled in case
-		 * the mbuf data size exceeds max data size that hw allows
-		 * per tx desc.
-		 */
-		if (mb->ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-			nb_desc_required = ci_calc_pkt_desc(mb) + nb_desc_ctx + nb_desc_ipsec;
-		else
-			nb_desc_required = nb_desc_data + nb_desc_ctx + nb_desc_ipsec;
-
-		desc_idx_last = (uint16_t)(desc_idx + nb_desc_required - 1);
-
-		/* wrap descriptor ring */
-		if (desc_idx_last >= txq->nb_tx_desc)
-			desc_idx_last =
-				(uint16_t)(desc_idx_last - txq->nb_tx_desc);
-
-		PMD_TX_LOG(DEBUG,
-			"port_id=%u queue_id=%u tx_first=%u tx_last=%u",
-			txq->port_id, txq->queue_id, desc_idx, desc_idx_last);
-
-		if (nb_desc_required > txq->nb_tx_free) {
-			if (ci_tx_xmit_cleanup(txq)) {
-				if (idx == 0)
-					return 0;
-				goto end_of_tx;
-			}
-			if (unlikely(nb_desc_required > txq->tx_rs_thresh)) {
-				while (nb_desc_required > txq->nb_tx_free) {
-					if (ci_tx_xmit_cleanup(txq)) {
-						if (idx == 0)
-							return 0;
-						goto end_of_tx;
-					}
-				}
-			}
-		}
-
-		iavf_build_data_desc_cmd_offset_fields(&ddesc_template, mb,
-			txq->vlan_flag);
-
-			/* Setup TX context descriptor if required */
-		if (nb_desc_ctx) {
-			volatile struct iavf_tx_context_desc *ctx_desc =
-				(volatile struct iavf_tx_context_desc *)
-					&txr[desc_idx];
-
-			/* clear QW0 or the previous writeback value
-			 * may impact next write
-			 */
-			*(volatile uint64_t *)ctx_desc = 0;
-
-			txn = &txe_ring[txe->next_id];
-			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
-
-			if (txe->mbuf) {
-				rte_pktmbuf_free_seg(txe->mbuf);
-				txe->mbuf = NULL;
-			}
-
-			iavf_fill_context_desc(ctx_desc, mb, ipsec_md, &tlen,
-				txq->vlan_flag);
-			IAVF_DUMP_TX_DESC(txq, ctx_desc, desc_idx);
-
-			txe->last_id = desc_idx_last;
-			desc_idx = txe->next_id;
-			txe = txn;
-		}
-
-		if (nb_desc_ipsec) {
-			volatile struct iavf_tx_ipsec_desc *ipsec_desc =
-				(volatile struct iavf_tx_ipsec_desc *)
-					&txr[desc_idx];
-
-			txn = &txe_ring[txe->next_id];
-			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
-
-			if (txe->mbuf) {
-				rte_pktmbuf_free_seg(txe->mbuf);
-				txe->mbuf = NULL;
-			}
-
-			iavf_fill_ipsec_desc(ipsec_desc, ipsec_md, &ipseclen);
-
-			IAVF_DUMP_TX_DESC(txq, ipsec_desc, desc_idx);
-
-			txe->last_id = desc_idx_last;
-			desc_idx = txe->next_id;
-			txe = txn;
-		}
-
-		mb_seg = mb;
-
-		do {
-			ddesc = (volatile struct ci_tx_desc *)
-					&txr[desc_idx];
-
-			txn = &txe_ring[txe->next_id];
-			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
-
-			if (txe->mbuf)
-				rte_pktmbuf_free_seg(txe->mbuf);
-
-			txe->mbuf = mb_seg;
-
-			if ((mb_seg->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD) &&
-					(mb_seg->ol_flags &
-						(RTE_MBUF_F_TX_TCP_SEG |
-						RTE_MBUF_F_TX_UDP_SEG))) {
-				slen = tlen + mb_seg->l2_len + mb_seg->l3_len +
-						mb_seg->outer_l3_len + ipseclen;
-				if (mb_seg->ol_flags & RTE_MBUF_F_TX_L4_MASK)
-					slen += mb_seg->l4_len;
-			} else {
-				slen = mb_seg->data_len;
-			}
-
-			buf_dma_addr = rte_mbuf_data_iova(mb_seg);
-			while ((mb_seg->ol_flags & (RTE_MBUF_F_TX_TCP_SEG |
-					RTE_MBUF_F_TX_UDP_SEG)) &&
-					unlikely(slen > CI_MAX_DATA_PER_TXD)) {
-				iavf_fill_data_desc(ddesc, ddesc_template,
-					CI_MAX_DATA_PER_TXD, buf_dma_addr);
-
-				IAVF_DUMP_TX_DESC(txq, ddesc, desc_idx);
-
-				buf_dma_addr += CI_MAX_DATA_PER_TXD;
-				slen -= CI_MAX_DATA_PER_TXD;
-
-				txe->last_id = desc_idx_last;
-				desc_idx = txe->next_id;
-				txe = txn;
-				ddesc = &txr[desc_idx];
-				txn = &txe_ring[txe->next_id];
-			}
-
-			iavf_fill_data_desc(ddesc, ddesc_template,
-					slen, buf_dma_addr);
-
-			IAVF_DUMP_TX_DESC(txq, ddesc, desc_idx);
-
-			txe->last_id = desc_idx_last;
-			desc_idx = txe->next_id;
-			txe = txn;
-			mb_seg = mb_seg->next;
-		} while (mb_seg);
-
-		/* The last packet data descriptor needs End Of Packet (EOP) */
-		ddesc_cmd = CI_TX_DESC_CMD_EOP;
-
-		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_desc_required);
-		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_desc_required);
-
-		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
-			PMD_TX_LOG(DEBUG, "Setting RS bit on TXD id="
-				   "%4u (port=%d queue=%d)",
-				   desc_idx_last, txq->port_id, txq->queue_id);
-
-			ddesc_cmd |= CI_TX_DESC_CMD_RS;
-
-			/* Update txq RS bit counters */
-			txq->nb_tx_used = 0;
-		}
-
-		ddesc->cmd_type_offset_bsz |= rte_cpu_to_le_64(ddesc_cmd <<
-				IAVF_TXD_DATA_QW1_CMD_SHIFT);
-
-		IAVF_DUMP_TX_DESC(txq, ddesc, desc_idx - 1);
-	}
-
-end_of_tx:
-	rte_wmb();
-
-	PMD_TX_LOG(DEBUG, "port_id=%u queue_id=%u tx_tail=%u nb_tx=%u",
-		   txq->port_id, txq->queue_id, desc_idx, idx);
-
-	IAVF_PCI_REG_WRITE_RELAXED(txq->qtx_tail, desc_idx);
-	txq->tx_tail = desc_idx;
+	const struct ci_ipsec_ops ipsec_ops = {
+		.get_ipsec_desc = iavf_get_ipsec_desc,
+		.calc_segment_len = iavf_calc_ipsec_segment_len,
+	};
 
-	return idx;
+	/* IAVF does not support timestamp queues, so pass NULL for ts_fns */
+	return ci_xmit_pkts(txq, tx_pkts, nb_pkts,
+			    (txq->vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1) ?
+				CI_VLAN_IN_L2TAG1 : CI_VLAN_IN_L2TAG2,
+			    iavf_get_context_desc, &ipsec_ops, NULL);
 }
 
 /* Check if the packet with vlan user priority is transmitted in the
diff --git a/drivers/net/intel/iavf/iavf_rxtx.h b/drivers/net/intel/iavf/iavf_rxtx.h
index cca5c25119..fe3385dcf6 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.h
+++ b/drivers/net/intel/iavf/iavf_rxtx.h
@@ -43,6 +43,7 @@
 		RTE_ETH_TX_OFFLOAD_TCP_CKSUM |		\
 		RTE_ETH_TX_OFFLOAD_SCTP_CKSUM |		\
 		RTE_ETH_TX_OFFLOAD_TCP_TSO |		\
+		RTE_ETH_TX_OFFLOAD_UDP_TSO |		\
 		RTE_ETH_TX_OFFLOAD_OUTER_IPV4_CKSUM |	\
 		RTE_ETH_TX_OFFLOAD_VXLAN_TNL_TSO |	\
 		RTE_ETH_TX_OFFLOAD_QINQ_INSERT |	\
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v4 16/35] net/i40e: document requirement for QinQ support
  2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (14 preceding siblings ...)
  2026-02-09 16:45   ` [PATCH v4 15/35] net/iavf: use common scalar Tx function Bruce Richardson
@ 2026-02-09 16:45   ` Bruce Richardson
  2026-02-10 13:27     ` Burakov, Anatoly
  2026-02-09 16:45   ` [PATCH v4 17/35] net/idpf: use common scalar Tx function Bruce Richardson
                     ` (18 subsequent siblings)
  34 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 16:45 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

In order to get multiple VLANs inserted in an outgoing packet with QinQ
offload the i40e driver needs to be set to double vlan mode. This is
done by using the VLAN_EXTEND Rx config flag. Add a code check for this
dependency and update the docs about it.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 doc/guides/nics/i40e.rst           | 18 ++++++++++++++++++
 drivers/net/intel/i40e/i40e_rxtx.c |  9 +++++++++
 2 files changed, 27 insertions(+)

diff --git a/doc/guides/nics/i40e.rst b/doc/guides/nics/i40e.rst
index 40be9aa755..750af3c8b3 100644
--- a/doc/guides/nics/i40e.rst
+++ b/doc/guides/nics/i40e.rst
@@ -245,6 +245,24 @@ Runtime Configuration
   * ``segment``: Check number of mbuf segments not exceed hw limitation.
   * ``offload``: Check any unsupported offload flag.
 
+QinQ Configuration
+~~~~~~~~~~~~~~~~~~
+
+When using QinQ TX offload (``RTE_ETH_TX_OFFLOAD_QINQ_INSERT``), you must also
+enable ``RTE_ETH_RX_OFFLOAD_VLAN_EXTEND`` to configure the hardware for double
+VLAN mode. Without this, only the inner VLAN tag will be inserted.
+
+Example::
+
+  struct rte_eth_conf port_conf = {
+      .rxmode = {
+          .offloads = RTE_ETH_RX_OFFLOAD_VLAN_EXTEND,
+      },
+      .txmode = {
+          .offloads = RTE_ETH_TX_OFFLOAD_QINQ_INSERT,
+      },
+  };
+
 Vector RX Pre-conditions
 ~~~~~~~~~~~~~~~~~~~~~~~~
 For Vector RX it is assumed that the number of descriptor rings will be a power
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 35c1b53c1e..dfd2213020 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -2182,6 +2182,15 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	vsi = i40e_pf_get_vsi_by_qindex(pf, queue_idx);
 	if (!vsi)
 		return -EINVAL;
+
+	/* Check if QinQ TX offload requires VLAN extend mode */
+	if ((offloads & RTE_ETH_TX_OFFLOAD_QINQ_INSERT) &&
+			!(dev->data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_VLAN_EXTEND)) {
+		PMD_DRV_LOG(WARNING, "Port %u: QinQ TX offload is enabled but VLAN extend mode is not set. ",
+				dev->data->port_id);
+		PMD_DRV_LOG(WARNING, "Double VLAN insertion may not work correctly without RTE_ETH_RX_OFFLOAD_VLAN_EXTEND set in Rx configuration.");
+	}
+
 	q_offset = i40e_get_queue_offset_by_qindex(pf, queue_idx);
 	if (q_offset < 0)
 		return -EINVAL;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v4 17/35] net/idpf: use common scalar Tx function
  2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (15 preceding siblings ...)
  2026-02-09 16:45   ` [PATCH v4 16/35] net/i40e: document requirement for QinQ support Bruce Richardson
@ 2026-02-09 16:45   ` Bruce Richardson
  2026-02-10 13:30     ` Burakov, Anatoly
  2026-02-09 16:45   ` [PATCH v4 18/35] net/intel: avoid writing the final pkt descriptor twice Bruce Richardson
                     ` (17 subsequent siblings)
  34 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 16:45 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Jingjing Wu, Praveen Shetty

Update idpf driver to use the common scalar Tx function in single-queue
configuration.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/idpf/idpf_common_rxtx.c | 178 ++--------------------
 1 file changed, 10 insertions(+), 168 deletions(-)

diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index b34d545a0a..bca5f13c8e 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -8,7 +8,6 @@
 
 #include "idpf_common_rxtx.h"
 #include "idpf_common_device.h"
-#include "../common/rx.h"
 
 int idpf_timestamp_dynfield_offset = -1;
 uint64_t idpf_timestamp_dynflag;
@@ -848,9 +847,10 @@ idpf_calc_context_desc(uint64_t flags)
 /* set TSO context descriptor, returns 0 if no context needed, 1 if context set
  */
 static inline uint16_t
-idpf_set_tso_ctx(uint64_t ol_flags, struct rte_mbuf *mbuf,
-			union ci_tx_offload tx_offload,
-			uint64_t *qw0, uint64_t *qw1)
+idpf_set_tso_ctx(uint64_t ol_flags, const struct rte_mbuf *mbuf,
+		 const union ci_tx_offload *tx_offload,
+		 const struct ci_tx_queue *txq __rte_unused,
+		 uint64_t *qw0, uint64_t *qw1)
 {
 	uint16_t cmd_dtype = IDPF_TX_DESC_DTYPE_FLEX_TSO_CTX | IDPF_TX_FLEX_CTX_DESC_CMD_TSO;
 	uint16_t tso_segsz = mbuf->tso_segsz;
@@ -861,12 +861,12 @@ idpf_set_tso_ctx(uint64_t ol_flags, struct rte_mbuf *mbuf,
 		return 0;
 
 	/* TSO context descriptor setup */
-	if (tx_offload.l4_len == 0) {
+	if (tx_offload->l4_len == 0) {
 		TX_LOG(DEBUG, "L4 length set to 0");
 		return 0;
 	}
 
-	hdr_len = tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len;
+	hdr_len = tx_offload->l2_len + tx_offload->l3_len + tx_offload->l4_len;
 	tso_len = mbuf->pkt_len - hdr_len;
 
 	*qw0 = rte_cpu_to_le_32(tso_len & IDPF_TXD_FLEX_CTX_MSS_RT_M) |
@@ -933,7 +933,8 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		tx_offload.tso_segsz = tx_pkt->tso_segsz;
 		/* Calculate the number of context descriptors needed. */
 		uint64_t cd_qw0, cd_qw1;
-		nb_ctx = idpf_set_tso_ctx(ol_flags, tx_pkt, tx_offload, &cd_qw0, &cd_qw1);
+		nb_ctx = idpf_set_tso_ctx(ol_flags, tx_pkt, &tx_offload, txq,
+					  &cd_qw0, &cd_qw1);
 
 		/* Calculate the number of TX descriptors needed for
 		 * each packet. For TSO packets, use ci_calc_pkt_desc as
@@ -1339,167 +1340,8 @@ uint16_t
 idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			  uint16_t nb_pkts)
 {
-	volatile struct ci_tx_desc *txd;
-	volatile struct ci_tx_desc *txr;
-	union ci_tx_offload tx_offload = {0};
-	struct ci_tx_entry *txe, *txn;
-	struct ci_tx_entry *sw_ring;
-	struct ci_tx_queue *txq;
-	struct rte_mbuf *tx_pkt;
-	struct rte_mbuf *m_seg;
-	uint64_t buf_dma_addr;
-	uint32_t td_offset;
-	uint64_t ol_flags;
-	uint16_t tx_last;
-	uint16_t nb_used;
-	uint16_t nb_ctx;
-	uint16_t td_cmd;
-	uint16_t tx_id;
-	uint16_t nb_tx;
-	uint16_t slen;
-
-	nb_tx = 0;
-	txq = tx_queue;
-
-	if (unlikely(txq == NULL))
-		return nb_tx;
-
-	sw_ring = txq->sw_ring;
-	txr = txq->ci_tx_ring;
-	tx_id = txq->tx_tail;
-	txe = &sw_ring[tx_id];
-
-	/* Check if the descriptor ring needs to be cleaned. */
-	if (txq->nb_tx_free < txq->tx_free_thresh)
-		(void)ci_tx_xmit_cleanup(txq);
-
-	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
-		td_cmd = 0;
-		td_offset = 0;
-
-		tx_pkt = *tx_pkts++;
-		RTE_MBUF_PREFETCH_TO_FREE(txe->mbuf);
-
-		ol_flags = tx_pkt->ol_flags;
-		tx_offload.l2_len = tx_pkt->l2_len;
-		tx_offload.l3_len = tx_pkt->l3_len;
-		tx_offload.l4_len = tx_pkt->l4_len;
-		tx_offload.tso_segsz = tx_pkt->tso_segsz;
-		/* Calculate the number of context descriptors needed. */
-		uint64_t cd_qw0, cd_qw1;
-		nb_ctx = idpf_set_tso_ctx(ol_flags, tx_pkt, tx_offload, &cd_qw0, &cd_qw1);
-
-		/* The number of descriptors that must be allocated for
-		 * a packet. For TSO packets, use ci_calc_pkt_desc as
-		 * the mbuf data size might exceed max data size that hw allows
-		 * per tx desc.
-		 */
-		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
-		else
-			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
-		tx_last = (uint16_t)(tx_id + nb_used - 1);
-
-		/* Circular ring */
-		if (tx_last >= txq->nb_tx_desc)
-			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
-
-		TX_LOG(DEBUG, "port_id=%u queue_id=%u"
-		       " tx_first=%u tx_last=%u",
-		       txq->port_id, txq->queue_id, tx_id, tx_last);
-
-		if (nb_used > txq->nb_tx_free) {
-			if (ci_tx_xmit_cleanup(txq) != 0) {
-				if (nb_tx == 0)
-					return 0;
-				goto end_of_tx;
-			}
-			if (unlikely(nb_used > txq->tx_rs_thresh)) {
-				while (nb_used > txq->nb_tx_free) {
-					if (ci_tx_xmit_cleanup(txq) != 0) {
-						if (nb_tx == 0)
-							return 0;
-						goto end_of_tx;
-					}
-				}
-			}
-		}
-
-		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
-			td_cmd |= IDPF_TX_FLEX_DESC_CMD_CS_EN;
-
-		if (nb_ctx != 0) {
-			/* Setup TX context descriptor if required */
-			uint64_t *ctx_txd = RTE_CAST_PTR(uint64_t *, &txr[tx_id]);
-
-			txn = &sw_ring[txe->next_id];
-			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
-			if (txe->mbuf != NULL) {
-				rte_pktmbuf_free_seg(txe->mbuf);
-				txe->mbuf = NULL;
-			}
-
-			ctx_txd[0] = cd_qw0;
-			ctx_txd[1] = cd_qw1;
-
-			txe->last_id = tx_last;
-			tx_id = txe->next_id;
-			txe = txn;
-		}
-
-		m_seg = tx_pkt;
-		do {
-			txd = &txr[tx_id];
-			txn = &sw_ring[txe->next_id];
-
-			if (txe->mbuf != NULL)
-				rte_pktmbuf_free_seg(txe->mbuf);
-			txe->mbuf = m_seg;
-
-			/* Setup TX Descriptor */
-			slen = m_seg->data_len;
-			buf_dma_addr = rte_mbuf_data_iova(m_seg);
-			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
-				((uint64_t)td_cmd  << CI_TXD_QW1_CMD_S) |
-				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
-				((uint64_t)slen << CI_TXD_QW1_TX_BUF_SZ_S));
-
-			txe->last_id = tx_last;
-			tx_id = txe->next_id;
-			txe = txn;
-			m_seg = m_seg->next;
-		} while (m_seg);
-
-		/* The last packet data descriptor needs End Of Packet (EOP) */
-		td_cmd |= CI_TX_DESC_CMD_EOP;
-		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
-		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
-
-		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
-			TX_LOG(DEBUG, "Setting RS bit on TXD id="
-			       "%4u (port=%d queue=%d)",
-			       tx_last, txq->port_id, txq->queue_id);
-
-			td_cmd |= CI_TX_DESC_CMD_RS;
-
-			/* Update txq RS bit counters */
-			txq->nb_tx_used = 0;
-		}
-
-		txd->cmd_type_offset_bsz |= rte_cpu_to_le_16(td_cmd << CI_TXD_QW1_CMD_S);
-	}
-
-end_of_tx:
-	rte_wmb();
-
-	TX_LOG(DEBUG, "port_id=%u queue_id=%u tx_tail=%u nb_tx=%u",
-	       txq->port_id, txq->queue_id, tx_id, nb_tx);
-
-	IDPF_PCI_REG_WRITE(txq->qtx_tail, tx_id);
-	txq->tx_tail = tx_id;
-
-	return nb_tx;
+	return ci_xmit_pkts(tx_queue, tx_pkts, nb_pkts, CI_VLAN_IN_L2TAG1,
+			idpf_set_tso_ctx, NULL, NULL);
 }
 
 /* TX prep functions */
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v4 18/35] net/intel: avoid writing the final pkt descriptor twice
  2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (16 preceding siblings ...)
  2026-02-09 16:45   ` [PATCH v4 17/35] net/idpf: use common scalar Tx function Bruce Richardson
@ 2026-02-09 16:45   ` Bruce Richardson
  2026-02-10 13:31     ` Burakov, Anatoly
  2026-02-09 16:45   ` [PATCH v4 19/35] eal: add macro for marking assumed alignment Bruce Richardson
                     ` (16 subsequent siblings)
  34 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 16:45 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

In the scalar datapath, there is a loop to handle multi-segment, and
multi-descriptor packets on Tx. After that loop, the end-of-packet bit
was written to the descriptor separately, meaning that for each
single-descriptor packet there were two writes to the second quad-word -
basically 3 x 64-bit writes rather than just 2. Adjusting the code to
compute the EOP bit inside the loop saves that extra write per packet
and so improves performance.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar.h | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index 8b503dd4de..e9e9a826fa 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -372,6 +372,10 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 				txn = &sw_ring[txe->next_id];
 			}
 
+			/* fill the last descriptor with End of Packet (EOP) bit */
+			if (m_seg->next == NULL)
+				td_cmd |= CI_TX_DESC_CMD_EOP;
+
 			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
 			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
 				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
@@ -384,21 +388,17 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			txe = txn;
 			m_seg = m_seg->next;
 		} while (m_seg);
-
-		/* fill the last descriptor with End of Packet (EOP) bit */
-		td_cmd |= CI_TX_DESC_CMD_EOP;
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
 		/* set RS bit on the last descriptor of one packet */
 		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
-			td_cmd |= CI_TX_DESC_CMD_RS;
+			txd->cmd_type_offset_bsz |=
+					rte_cpu_to_le_64(CI_TX_DESC_CMD_RS << CI_TXD_QW1_CMD_S);
 
 			/* Update txq RS bit counters */
 			txq->nb_tx_used = 0;
 		}
-		txd->cmd_type_offset_bsz |=
-				rte_cpu_to_le_64(((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S);
 
 		if (ts_fns != NULL)
 			ts_id = ts_fns->write_ts_desc(txq, tx_pkt, tx_id, ts_id);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v4 19/35] eal: add macro for marking assumed alignment
  2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (17 preceding siblings ...)
  2026-02-09 16:45   ` [PATCH v4 18/35] net/intel: avoid writing the final pkt descriptor twice Bruce Richardson
@ 2026-02-09 16:45   ` Bruce Richardson
  2026-02-09 22:35     ` Morten Brørup
  2026-02-09 16:45   ` [PATCH v4 20/35] net/intel: write descriptors using non-volatile pointers Bruce Richardson
                     ` (15 subsequent siblings)
  34 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 16:45 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

Provide a common DPDK macro for the gcc/clang builtin
__rte_assume_aligned to mark pointers as pointing to something with
known minimum alignment.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/eal/include/rte_common.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h
index 573bf4f2ce..51a2eaf8b4 100644
--- a/lib/eal/include/rte_common.h
+++ b/lib/eal/include/rte_common.h
@@ -121,6 +121,12 @@ extern "C" {
 #define __rte_aligned(a) __attribute__((__aligned__(a)))
 #endif
 
+#ifdef RTE_TOOLCHAIN_MSVC
+#define __rte_assume_aligned(ptr, align) (ptr)
+#else
+#define __rte_assume_aligned __builtin_assume_aligned
+#endif
+
 #ifdef RTE_ARCH_STRICT_ALIGN
 typedef uint64_t unaligned_uint64_t __rte_aligned(1);
 typedef uint32_t unaligned_uint32_t __rte_aligned(1);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v4 20/35] net/intel: write descriptors using non-volatile pointers
  2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (18 preceding siblings ...)
  2026-02-09 16:45   ` [PATCH v4 19/35] eal: add macro for marking assumed alignment Bruce Richardson
@ 2026-02-09 16:45   ` Bruce Richardson
  2026-02-09 23:08     ` Morten Brørup
  2026-02-09 16:45   ` [PATCH v4 21/35] net/intel: remove unnecessary flag clearing Bruce Richardson
                     ` (14 subsequent siblings)
  34 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 16:45 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

Use a non-volatile uint64_t pointer to store to the descriptor ring.
This will allow the compiler to optionally merge the stores as it sees
best.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar.h | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index e9e9a826fa..71f96349c3 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -174,6 +174,15 @@ struct ci_timestamp_queue_fns {
 	write_ts_tail_t write_ts_tail;
 };
 
+static inline void
+write_txd(volatile void *txd, uint64_t qw0, uint64_t qw1)
+{
+	uint64_t *txd_qw = __rte_assume_aligned(RTE_CAST_PTR(void *, txd), 16);
+
+	txd_qw[0] = rte_cpu_to_le_64(qw0);
+	txd_qw[1] = rte_cpu_to_le_64(qw1);
+}
+
 static inline uint16_t
 ci_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
@@ -307,8 +316,7 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 				txe->mbuf = NULL;
 			}
 
-			ctx_txd[0] = cd_qw0;
-			ctx_txd[1] = cd_qw1;
+			write_txd(ctx_txd, cd_qw0, cd_qw1);
 
 			txe->last_id = tx_last;
 			tx_id = txe->next_id;
@@ -355,12 +363,12 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 
 			while ((ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) &&
 					unlikely(slen > CI_MAX_DATA_PER_TXD)) {
-				txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-				txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+				const uint64_t cmd_type_offset_bsz = CI_TX_DESC_DTYPE_DATA |
 					((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
 					((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
 					((uint64_t)CI_MAX_DATA_PER_TXD << CI_TXD_QW1_TX_BUF_SZ_S) |
-					((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
+					((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S);
+				write_txd(txd, buf_dma_addr, cmd_type_offset_bsz);
 
 				buf_dma_addr += CI_MAX_DATA_PER_TXD;
 				slen -= CI_MAX_DATA_PER_TXD;
@@ -376,12 +384,12 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			if (m_seg->next == NULL)
 				td_cmd |= CI_TX_DESC_CMD_EOP;
 
-			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+			const uint64_t cmd_type_offset_bsz = CI_TX_DESC_DTYPE_DATA |
 				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
 				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
 				((uint64_t)slen << CI_TXD_QW1_TX_BUF_SZ_S) |
-				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
+				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S);
+			write_txd(txd, buf_dma_addr, cmd_type_offset_bsz);
 
 			txe->last_id = tx_last;
 			tx_id = txe->next_id;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v4 21/35] net/intel: remove unnecessary flag clearing
  2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (19 preceding siblings ...)
  2026-02-09 16:45   ` [PATCH v4 20/35] net/intel: write descriptors using non-volatile pointers Bruce Richardson
@ 2026-02-09 16:45   ` Bruce Richardson
  2026-02-10 13:33     ` Burakov, Anatoly
  2026-02-09 16:45   ` [PATCH v4 22/35] net/intel: mark mid-burst ring cleanup as unlikely Bruce Richardson
                     ` (13 subsequent siblings)
  34 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 16:45 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

When cleaning the Tx ring, there is no need to zero out the done flag
from the completed entry. That flag will be automatically cleared when
the descriptor is next written. This gives a small performance benefit.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar.h | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index 71f96349c3..3e54cd7607 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -46,13 +46,6 @@ ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
 	else
 		nb_tx_to_clean = (uint16_t)(desc_to_clean_to - last_desc_cleaned);
 
-	/* The last descriptor to clean is done, so that means all the
-	 * descriptors from the last descriptor that was cleaned
-	 * up to the last descriptor with the RS bit set
-	 * are done. Only reset the threshold descriptor.
-	 */
-	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
-
 	/* Update the txq to reflect the last descriptor that was cleaned */
 	txq->last_desc_cleaned = desc_to_clean_to;
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v4 22/35] net/intel: mark mid-burst ring cleanup as unlikely
  2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (20 preceding siblings ...)
  2026-02-09 16:45   ` [PATCH v4 21/35] net/intel: remove unnecessary flag clearing Bruce Richardson
@ 2026-02-09 16:45   ` Bruce Richardson
  2026-02-10 13:36     ` Burakov, Anatoly
  2026-02-09 16:45   ` [PATCH v4 23/35] net/intel: add special handling for single desc packets Bruce Richardson
                     ` (12 subsequent siblings)
  34 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 16:45 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

It should rarely be the case that we need to cleanup the descriptor ring
mid-burst, so mark as unlikely to help performance.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index 3e54cd7607..cee8f11c7f 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -266,7 +266,7 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 		if (tx_last >= txq->nb_tx_desc)
 			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
 
-		if (nb_used > txq->nb_tx_free) {
+		if (unlikely(nb_used > txq->nb_tx_free)) {
 			if (ci_tx_xmit_cleanup(txq) != 0) {
 				if (nb_tx == 0)
 					return 0;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v4 23/35] net/intel: add special handling for single desc packets
  2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (21 preceding siblings ...)
  2026-02-09 16:45   ` [PATCH v4 22/35] net/intel: mark mid-burst ring cleanup as unlikely Bruce Richardson
@ 2026-02-09 16:45   ` Bruce Richardson
  2026-02-10 13:57     ` Burakov, Anatoly
  2026-02-09 16:45   ` [PATCH v4 24/35] net/intel: use separate array for desc status tracking Bruce Richardson
                     ` (11 subsequent siblings)
  34 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 16:45 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

Within the scalar packets, add a shortcut path for packets that don't
use TSO and have only a single data descriptor.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar.h | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index cee8f11c7f..342e271c5f 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -298,6 +298,31 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			ci_txd_enable_checksum(ol_flags, &td_cmd,
 						&td_offset, tx_offload);
 
+		/* special case for single descriptor packet, without TSO offload */
+		if (nb_used == 1 &&
+				(ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) == 0) {
+			txd = &ci_tx_ring[tx_id];
+			tx_id = txe->next_id;
+
+			if (txe->mbuf)
+				rte_pktmbuf_free_seg(txe->mbuf);
+			*txe = (struct ci_tx_entry){
+				.mbuf = tx_pkt, .last_id = tx_last, .next_id = tx_id
+			};
+
+			/* Setup TX Descriptor */
+			td_cmd |= CI_TX_DESC_CMD_EOP;
+			const uint64_t cmd_type_offset_bsz = CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
+				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+				((uint64_t)tx_pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S) |
+				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S);
+			write_txd(txd, rte_mbuf_data_iova(tx_pkt), cmd_type_offset_bsz);
+
+			txe = &sw_ring[tx_id];
+			goto end_pkt;
+		}
+
 		if (nb_ctx) {
 			/* Setup TX context descriptor if required */
 			uint64_t *ctx_txd = RTE_CAST_PTR(uint64_t *, &ci_tx_ring[tx_id]);
@@ -389,6 +414,7 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			txe = txn;
 			m_seg = m_seg->next;
 		} while (m_seg);
+end_pkt:
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v4 24/35] net/intel: use separate array for desc status tracking
  2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (22 preceding siblings ...)
  2026-02-09 16:45   ` [PATCH v4 23/35] net/intel: add special handling for single desc packets Bruce Richardson
@ 2026-02-09 16:45   ` Bruce Richardson
  2026-02-10 14:11     ` Burakov, Anatoly
  2026-02-09 16:45   ` [PATCH v4 25/35] net/ixgbe: " Bruce Richardson
                     ` (10 subsequent siblings)
  34 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 16:45 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Praveen Shetty, Vladimir Medvedkin,
	Anatoly Burakov, Jingjing Wu

Rather than writing a last_id for each individual descriptor, we can
write one only for places where the "report status" (RS) bit is set,
i.e. the descriptors which will be written back when done. The method
used for marking what descriptors are free is also changed in the
process, even if the last descriptor with the "done" bits set is past
the expected point, we only track up to the expected point, and leave
the rest to be counted as freed next time. This means that we always
have the RS/DD bits set at fixed intervals, and we always track free
slots in units of the same tx_free_thresh intervals.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h             |  4 ++
 drivers/net/intel/common/tx_scalar.h      | 66 +++++++++++------------
 drivers/net/intel/cpfl/cpfl_rxtx.c        | 17 ++++++
 drivers/net/intel/i40e/i40e_rxtx.c        | 20 +++++++
 drivers/net/intel/iavf/iavf_rxtx.c        | 19 +++++++
 drivers/net/intel/ice/ice_rxtx.c          | 20 +++++++
 drivers/net/intel/idpf/idpf_common_rxtx.c |  7 +++
 drivers/net/intel/idpf/idpf_rxtx.c        | 13 +++++
 8 files changed, 132 insertions(+), 34 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index f0229314a0..e7d79eb7d0 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -127,6 +127,8 @@ struct ci_tx_queue {
 		struct ci_tx_entry *sw_ring; /* virtual address of SW ring */
 		struct ci_tx_entry_vec *sw_ring_vec;
 	};
+	/* Scalar TX path: Array tracking last_id at each RS threshold boundary */
+	uint16_t *rs_last_id;
 	uint16_t nb_tx_desc;           /* number of TX descriptors */
 	uint16_t tx_tail; /* current value of tail register */
 	uint16_t nb_tx_used; /* number of TX desc used since RS bit set */
@@ -140,6 +142,8 @@ struct ci_tx_queue {
 	uint16_t tx_free_thresh;
 	/* Number of TX descriptors to use before RS bit is set. */
 	uint16_t tx_rs_thresh;
+	/* Scalar TX path: log2 of tx_rs_thresh for efficient bit operations */
+	uint8_t log2_rs_thresh;
 	uint16_t port_id;  /* Device port identifier. */
 	uint16_t queue_id; /* TX queue index. */
 	uint16_t reg_idx;
diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index 342e271c5f..acda2f0478 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -22,33 +22,25 @@
 static __rte_always_inline int
 ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
 {
-	struct ci_tx_entry *sw_ring = txq->sw_ring;
 	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
-	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
-	uint16_t nb_tx_desc = txq->nb_tx_desc;
-	uint16_t desc_to_clean_to;
-	uint16_t nb_tx_to_clean;
-
-	/* Determine the last descriptor needing to be cleaned */
-	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
-	if (desc_to_clean_to >= nb_tx_desc)
-		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
-
-	/* Check if descriptor is done */
-	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-	if ((txd[desc_to_clean_to].cmd_type_offset_bsz & rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
-			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
+	const uint16_t last_desc_cleaned = txq->last_desc_cleaned;
+	const uint16_t nb_tx_desc = txq->nb_tx_desc;
+
+	/* Calculate where the next descriptor write-back will occur */
+	const uint16_t rs_idx = (last_desc_cleaned == nb_tx_desc - 1) ?
+			0 :
+			(last_desc_cleaned + 1) >> txq->log2_rs_thresh;
+	uint16_t desc_to_clean_to = (rs_idx << txq->log2_rs_thresh) + (txq->tx_rs_thresh - 1);
+
+	/* Check if descriptor is done  */
+	if ((txd[txq->rs_last_id[rs_idx]].cmd_type_offset_bsz &
+			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
+				rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
 		return -1;
 
-	/* Figure out how many descriptors will be cleaned */
-	if (last_desc_cleaned > desc_to_clean_to)
-		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) + desc_to_clean_to);
-	else
-		nb_tx_to_clean = (uint16_t)(desc_to_clean_to - last_desc_cleaned);
-
 	/* Update the txq to reflect the last descriptor that was cleaned */
 	txq->last_desc_cleaned = desc_to_clean_to;
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
+	txq->nb_tx_free += txq->tx_rs_thresh;
 
 	return 0;
 }
@@ -223,6 +215,7 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 		uint16_t nb_ipsec = 0;
 		uint64_t ipsec_qw0 = 0, ipsec_qw1 = 0;
 		uint64_t cd_qw0, cd_qw1;
+		uint16_t pkt_rs_idx;
 		tx_pkt = *tx_pkts++;
 
 		ol_flags = tx_pkt->ol_flags;
@@ -266,6 +259,9 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 		if (tx_last >= txq->nb_tx_desc)
 			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
 
+		/* Track the RS threshold bucket at packet start */
+		pkt_rs_idx = (uint16_t)(tx_id >> txq->log2_rs_thresh);
+
 		if (unlikely(nb_used > txq->nb_tx_free)) {
 			if (ci_tx_xmit_cleanup(txq) != 0) {
 				if (nb_tx == 0)
@@ -306,10 +302,7 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 
 			if (txe->mbuf)
 				rte_pktmbuf_free_seg(txe->mbuf);
-			*txe = (struct ci_tx_entry){
-				.mbuf = tx_pkt, .last_id = tx_last, .next_id = tx_id
-			};
-
+			txe->mbuf = tx_pkt;
 			/* Setup TX Descriptor */
 			td_cmd |= CI_TX_DESC_CMD_EOP;
 			const uint64_t cmd_type_offset_bsz = CI_TX_DESC_DTYPE_DATA |
@@ -336,7 +329,6 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 
 			write_txd(ctx_txd, cd_qw0, cd_qw1);
 
-			txe->last_id = tx_last;
 			tx_id = txe->next_id;
 			txe = txn;
 		}
@@ -355,7 +347,6 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			ipsec_txd[0] = ipsec_qw0;
 			ipsec_txd[1] = ipsec_qw1;
 
-			txe->last_id = tx_last;
 			tx_id = txe->next_id;
 			txe = txn;
 		}
@@ -391,7 +382,6 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 				buf_dma_addr += CI_MAX_DATA_PER_TXD;
 				slen -= CI_MAX_DATA_PER_TXD;
 
-				txe->last_id = tx_last;
 				tx_id = txe->next_id;
 				txe = txn;
 				txd = &ci_tx_ring[tx_id];
@@ -409,7 +399,6 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S);
 			write_txd(txd, buf_dma_addr, cmd_type_offset_bsz);
 
-			txe->last_id = tx_last;
 			tx_id = txe->next_id;
 			txe = txn;
 			m_seg = m_seg->next;
@@ -418,13 +407,22 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
-		/* set RS bit on the last descriptor of one packet */
-		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
+		/* Check if packet crosses into a new RS threshold bucket.
+		 * The RS bit is set on the last descriptor when we move from one bucket to another.
+		 * For example, with tx_rs_thresh=32 and a 5-descriptor packet using slots 30-34:
+		 *   - pkt_rs_idx = 30 >> 5 = 0 (started in bucket 0)
+		 *   - tx_last = 34, so 35 >> 5 = 1 (next packet is in bucket 1)
+		 *   - Since 0 != 1, set RS bit on descriptor 34, and record rs_last_id[0] = 34
+		 */
+		uint16_t next_rs_idx = ((tx_last + 1) >> txq->log2_rs_thresh);
+
+		if (next_rs_idx != pkt_rs_idx) {
+			/* Packet crossed into a new bucket - set RS bit on last descriptor */
 			txd->cmd_type_offset_bsz |=
 					rte_cpu_to_le_64(CI_TX_DESC_CMD_RS << CI_TXD_QW1_CMD_S);
 
-			/* Update txq RS bit counters */
-			txq->nb_tx_used = 0;
+			/* Record the last descriptor ID for the bucket we're leaving */
+			txq->rs_last_id[pkt_rs_idx] = tx_last;
 		}
 
 		if (ts_fns != NULL)
diff --git a/drivers/net/intel/cpfl/cpfl_rxtx.c b/drivers/net/intel/cpfl/cpfl_rxtx.c
index bc5bec65f0..e7a98ed4f6 100644
--- a/drivers/net/intel/cpfl/cpfl_rxtx.c
+++ b/drivers/net/intel/cpfl/cpfl_rxtx.c
@@ -5,6 +5,7 @@
 #include <ethdev_driver.h>
 #include <rte_net.h>
 #include <rte_vect.h>
+#include <rte_bitops.h>
 
 #include "cpfl_ethdev.h"
 #include "cpfl_rxtx.h"
@@ -330,6 +331,7 @@ cpfl_tx_queue_release(void *txq)
 
 	ci_txq_release_all_mbufs(q, q->vector_tx);
 	rte_free(q->sw_ring);
+	rte_free(q->rs_last_id);
 	rte_memzone_free(q->mz);
 	rte_free(cpfl_txq);
 }
@@ -572,6 +574,7 @@ cpfl_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 
 	txq->nb_tx_desc = nb_desc;
 	txq->tx_rs_thresh = tx_rs_thresh;
+	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
 	txq->tx_free_thresh = tx_free_thresh;
 	txq->queue_id = vport->chunks_info.tx_start_qid + queue_idx;
 	txq->port_id = dev->data->port_id;
@@ -605,6 +608,17 @@ cpfl_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 		goto err_sw_ring_alloc;
 	}
 
+	/* Allocate RS last_id tracking array */
+	uint16_t num_rs_buckets = nb_desc / tx_rs_thresh;
+	txq->rs_last_id = rte_zmalloc_socket("cpfl tx rs_last_id",
+			sizeof(txq->rs_last_id[0]) * num_rs_buckets,
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (txq->rs_last_id == NULL) {
+		PMD_INIT_LOG(ERR, "Failed to allocate memory for RS last_id array");
+		ret = -ENOMEM;
+		goto err_rs_last_id_alloc;
+	}
+
 	if (!is_splitq) {
 		txq->ci_tx_ring = mz->addr;
 		idpf_qc_single_tx_queue_reset(txq);
@@ -628,6 +642,9 @@ cpfl_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 	return 0;
 
 err_complq_setup:
+	rte_free(txq->rs_last_id);
+err_rs_last_id_alloc:
+	rte_free(txq->sw_ring);
 err_sw_ring_alloc:
 	cpfl_dma_zone_release(mz);
 err_mz_reserve:
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index dfd2213020..b554bc6c31 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -24,6 +24,7 @@
 #include <rte_ip.h>
 #include <rte_net.h>
 #include <rte_vect.h>
+#include <rte_bitops.h>
 
 #include "i40e_logs.h"
 #include "base/i40e_prototype.h"
@@ -2280,6 +2281,13 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 			     (int)queue_idx);
 		return I40E_ERR_PARAM;
 	}
+	if (!rte_is_power_of_2(tx_rs_thresh)) {
+		PMD_INIT_LOG(ERR, "tx_rs_thresh must be a power of 2. (tx_rs_thresh=%u port=%d queue=%d)",
+			     (unsigned int)tx_rs_thresh,
+			     (int)dev->data->port_id,
+			     (int)queue_idx);
+		return I40E_ERR_PARAM;
+	}
 	if ((tx_rs_thresh > 1) && (tx_conf->tx_thresh.wthresh != 0)) {
 		PMD_INIT_LOG(ERR, "TX WTHRESH must be set to 0 if "
 			     "tx_rs_thresh is greater than 1. "
@@ -2321,6 +2329,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	txq->mz = tz;
 	txq->nb_tx_desc = nb_desc;
 	txq->tx_rs_thresh = tx_rs_thresh;
+	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
 	txq->tx_free_thresh = tx_free_thresh;
 	txq->queue_id = queue_idx;
 	txq->reg_idx = reg_idx;
@@ -2346,6 +2355,16 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 		return -ENOMEM;
 	}
 
+	/* Allocate RS last_id tracking array */
+	uint16_t num_rs_buckets = nb_desc / tx_rs_thresh;
+	txq->rs_last_id = rte_zmalloc_socket(NULL, sizeof(txq->rs_last_id[0]) * num_rs_buckets,
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (txq->rs_last_id == NULL) {
+		i40e_tx_queue_release(txq);
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for RS last_id array");
+		return -ENOMEM;
+	}
+
 	i40e_reset_tx_queue(txq);
 	txq->q_set = TRUE;
 
@@ -2391,6 +2410,7 @@ i40e_tx_queue_release(void *txq)
 
 	ci_txq_release_all_mbufs(q, false);
 	rte_free(q->sw_ring);
+	rte_free(q->rs_last_id);
 	rte_memzone_free(q->mz);
 	rte_free(q);
 }
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index 2ea00e1975..e7187f713d 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -25,6 +25,7 @@
 #include <rte_ip.h>
 #include <rte_net.h>
 #include <rte_vect.h>
+#include <rte_bitops.h>
 #include <rte_vxlan.h>
 #include <rte_gtp.h>
 #include <rte_geneve.h>
@@ -194,6 +195,11 @@ check_tx_thresh(uint16_t nb_desc, uint16_t tx_rs_thresh,
 			     tx_rs_thresh, nb_desc);
 		return -EINVAL;
 	}
+	if (!rte_is_power_of_2(tx_rs_thresh)) {
+		PMD_INIT_LOG(ERR, "tx_rs_thresh must be a power of 2. (tx_rs_thresh=%u)",
+			     tx_rs_thresh);
+		return -EINVAL;
+	}
 
 	return 0;
 }
@@ -801,6 +807,7 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
 
 	txq->nb_tx_desc = nb_desc;
 	txq->tx_rs_thresh = tx_rs_thresh;
+	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
 	txq->tx_free_thresh = tx_free_thresh;
 	txq->queue_id = queue_idx;
 	txq->port_id = dev->data->port_id;
@@ -826,6 +833,17 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
 		return -ENOMEM;
 	}
 
+	/* Allocate RS last_id tracking array */
+	uint16_t num_rs_buckets = nb_desc / tx_rs_thresh;
+	txq->rs_last_id = rte_zmalloc_socket(NULL, sizeof(txq->rs_last_id[0]) * num_rs_buckets,
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (txq->rs_last_id == NULL) {
+		PMD_INIT_LOG(ERR, "Failed to allocate memory for RS last_id array");
+		rte_free(txq->sw_ring);
+		rte_free(txq);
+		return -ENOMEM;
+	}
+
 	/* Allocate TX hardware ring descriptors. */
 	ring_size = sizeof(struct ci_tx_desc) * IAVF_MAX_RING_DESC;
 	ring_size = RTE_ALIGN(ring_size, IAVF_DMA_MEM_ALIGN);
@@ -1050,6 +1068,7 @@ iavf_dev_tx_queue_release(struct rte_eth_dev *dev, uint16_t qid)
 
 	ci_txq_release_all_mbufs(q, q->use_ctx);
 	rte_free(q->sw_ring);
+	rte_free(q->rs_last_id);
 	rte_memzone_free(q->mz);
 	rte_free(q);
 }
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 111cb5e37f..2915223397 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -5,6 +5,7 @@
 #include <ethdev_driver.h>
 #include <rte_net.h>
 #include <rte_vect.h>
+#include <rte_bitops.h>
 
 #include "ice_rxtx.h"
 #include "ice_rxtx_vec_common.h"
@@ -1589,6 +1590,13 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
 			     (int)queue_idx);
 		return -EINVAL;
 	}
+	if (!rte_is_power_of_2(tx_rs_thresh)) {
+		PMD_INIT_LOG(ERR, "tx_rs_thresh must be a power of 2. (tx_rs_thresh=%u port=%d queue=%d)",
+			     (unsigned int)tx_rs_thresh,
+			     (int)dev->data->port_id,
+			     (int)queue_idx);
+		return -EINVAL;
+	}
 	if (tx_rs_thresh > 1 && tx_conf->tx_thresh.wthresh != 0) {
 		PMD_INIT_LOG(ERR, "TX WTHRESH must be set to 0 if "
 			     "tx_rs_thresh is greater than 1. "
@@ -1631,6 +1639,7 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
 	txq->mz = tz;
 	txq->nb_tx_desc = nb_desc;
 	txq->tx_rs_thresh = tx_rs_thresh;
+	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
 	txq->tx_free_thresh = tx_free_thresh;
 	txq->queue_id = queue_idx;
 
@@ -1657,6 +1666,16 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
 		return -ENOMEM;
 	}
 
+	/* Allocate RS last_id tracking array */
+	uint16_t num_rs_buckets = nb_desc / tx_rs_thresh;
+	txq->rs_last_id = rte_zmalloc_socket(NULL, sizeof(txq->rs_last_id[0]) * num_rs_buckets,
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (txq->rs_last_id == NULL) {
+		ice_tx_queue_release(txq);
+		PMD_INIT_LOG(ERR, "Failed to allocate memory for RS last_id array");
+		return -ENOMEM;
+	}
+
 	if (vsi->type == ICE_VSI_PF && (offloads & RTE_ETH_TX_OFFLOAD_SEND_ON_TIMESTAMP)) {
 		if (hw->phy_model != ICE_PHY_E830) {
 			ice_tx_queue_release(txq);
@@ -1729,6 +1748,7 @@ ice_tx_queue_release(void *txq)
 
 	ci_txq_release_all_mbufs(q, false);
 	rte_free(q->sw_ring);
+	rte_free(q->rs_last_id);
 	if (q->tsq) {
 		rte_memzone_free(q->tsq->ts_mz);
 		rte_free(q->tsq);
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index bca5f13c8e..8859bcca86 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -5,6 +5,7 @@
 #include <eal_export.h>
 #include <rte_mbuf_dyn.h>
 #include <rte_errno.h>
+#include <rte_bitops.h>
 
 #include "idpf_common_rxtx.h"
 #include "idpf_common_device.h"
@@ -73,6 +74,11 @@ idpf_qc_tx_thresh_check(uint16_t nb_desc, uint16_t tx_rs_thresh,
 			tx_rs_thresh, nb_desc);
 		return -EINVAL;
 	}
+	if (!rte_is_power_of_2(tx_rs_thresh)) {
+		DRV_LOG(ERR, "tx_rs_thresh must be a power of 2. (tx_rs_thresh=%u)",
+			tx_rs_thresh);
+		return -EINVAL;
+	}
 
 	return 0;
 }
@@ -333,6 +339,7 @@ idpf_qc_tx_queue_release(void *txq)
 	}
 
 	ci_txq_release_all_mbufs(q, false);
+	rte_free(q->rs_last_id);
 	rte_free(q->sw_ring);
 	rte_memzone_free(q->mz);
 	rte_free(q);
diff --git a/drivers/net/intel/idpf/idpf_rxtx.c b/drivers/net/intel/idpf/idpf_rxtx.c
index 0de54d9305..9420200f6d 100644
--- a/drivers/net/intel/idpf/idpf_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_rxtx.c
@@ -447,6 +447,7 @@ idpf_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 
 	txq->nb_tx_desc = nb_desc;
 	txq->tx_rs_thresh = tx_rs_thresh;
+	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
 	txq->tx_free_thresh = tx_free_thresh;
 	txq->queue_id = vport->chunks_info.tx_start_qid + queue_idx;
 	txq->port_id = dev->data->port_id;
@@ -480,6 +481,15 @@ idpf_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 		goto err_sw_ring_alloc;
 	}
 
+	txq->rs_last_id = rte_zmalloc_socket("idpf tx rs_last_id",
+			sizeof(txq->rs_last_id[0]) * (nb_desc >> txq->log2_rs_thresh),
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (txq->rs_last_id == NULL) {
+		PMD_INIT_LOG(ERR, "Failed to allocate memory for TX RS tracking");
+		ret = -ENOMEM;
+		goto err_rs_last_id_alloc;
+	}
+
 	if (!is_splitq) {
 		txq->ci_tx_ring = mz->addr;
 		idpf_qc_single_tx_queue_reset(txq);
@@ -502,6 +512,9 @@ idpf_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 	return 0;
 
 err_complq_setup:
+	rte_free(txq->rs_last_id);
+err_rs_last_id_alloc:
+	rte_free(txq->sw_ring);
 err_sw_ring_alloc:
 	idpf_dma_zone_release(mz);
 err_mz_reserve:
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v4 25/35] net/ixgbe: use separate array for desc status tracking
  2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (23 preceding siblings ...)
  2026-02-09 16:45   ` [PATCH v4 24/35] net/intel: use separate array for desc status tracking Bruce Richardson
@ 2026-02-09 16:45   ` Bruce Richardson
  2026-02-10 14:12     ` Burakov, Anatoly
  2026-02-09 16:45   ` [PATCH v4 26/35] net/intel: drop unused Tx queue used count Bruce Richardson
                     ` (9 subsequent siblings)
  34 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 16:45 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov, Vladimir Medvedkin

Due to significant differences in the ixgbe transmit descriptors, the
ixgbe driver does not use the common scalar Tx functionality. Update the
driver directly so its use of the rs_last_id array matches that of the
common Tx code.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/ixgbe/ixgbe_rxtx.c | 86 +++++++++++++++-------------
 1 file changed, 47 insertions(+), 39 deletions(-)

diff --git a/drivers/net/intel/ixgbe/ixgbe_rxtx.c b/drivers/net/intel/ixgbe/ixgbe_rxtx.c
index 0af04c9b0d..3e37ccc50d 100644
--- a/drivers/net/intel/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/intel/ixgbe/ixgbe_rxtx.c
@@ -43,6 +43,7 @@
 #include <rte_ip.h>
 #include <rte_net.h>
 #include <rte_vect.h>
+#include <rte_bitops.h>
 
 #include "ixgbe_logs.h"
 #include "base/ixgbe_api.h"
@@ -571,57 +572,35 @@ tx_desc_ol_flags_to_cmdtype(uint64_t ol_flags)
 static inline int
 ixgbe_xmit_cleanup(struct ci_tx_queue *txq)
 {
-	struct ci_tx_entry *sw_ring = txq->sw_ring;
 	volatile union ixgbe_adv_tx_desc *txr = txq->ixgbe_tx_ring;
-	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
-	uint16_t nb_tx_desc = txq->nb_tx_desc;
-	uint16_t desc_to_clean_to;
-	uint16_t nb_tx_to_clean;
-	uint32_t status;
+	const uint16_t last_desc_cleaned = txq->last_desc_cleaned;
+	const uint16_t nb_tx_desc = txq->nb_tx_desc;
 
-	/* Determine the last descriptor needing to be cleaned */
-	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
-	if (desc_to_clean_to >= nb_tx_desc)
-		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
+	const uint16_t rs_idx = (last_desc_cleaned == nb_tx_desc - 1) ?
+			0 :
+			(last_desc_cleaned + 1) >> txq->log2_rs_thresh;
+	uint16_t desc_to_clean_to = (rs_idx << txq->log2_rs_thresh) + (txq->tx_rs_thresh - 1);
 
-	/* Check to make sure the last descriptor to clean is done */
-	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-	status = txr[desc_to_clean_to].wb.status;
+	uint32_t status = txr[txq->rs_last_id[rs_idx]].wb.status;
 	if (!(status & rte_cpu_to_le_32(IXGBE_TXD_STAT_DD))) {
 		PMD_TX_LOG(DEBUG,
 			   "TX descriptor %4u is not done"
 			   "(port=%d queue=%d)",
-			   desc_to_clean_to,
+			   txq->rs_last_id[rs_idx],
 			   txq->port_id, txq->queue_id);
 		/* Failed to clean any descriptors, better luck next time */
 		return -(1);
 	}
 
-	/* Figure out how many descriptors will be cleaned */
-	if (last_desc_cleaned > desc_to_clean_to)
-		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) +
-							desc_to_clean_to);
-	else
-		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
-						last_desc_cleaned);
-
 	PMD_TX_LOG(DEBUG,
 		   "Cleaning %4u TX descriptors: %4u to %4u "
 		   "(port=%d queue=%d)",
-		   nb_tx_to_clean, last_desc_cleaned, desc_to_clean_to,
+		   txq->tx_rs_thresh, last_desc_cleaned, desc_to_clean_to,
 		   txq->port_id, txq->queue_id);
 
-	/*
-	 * The last descriptor to clean is done, so that means all the
-	 * descriptors from the last descriptor that was cleaned
-	 * up to the last descriptor with the RS bit set
-	 * are done. Only reset the threshold descriptor.
-	 */
-	txr[desc_to_clean_to].wb.status = 0;
-
 	/* Update the txq to reflect the last descriptor that was cleaned */
 	txq->last_desc_cleaned = desc_to_clean_to;
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
+	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + txq->tx_rs_thresh);
 
 	/* No Error */
 	return 0;
@@ -749,6 +728,9 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		if (tx_last >= txq->nb_tx_desc)
 			tx_last = (uint16_t) (tx_last - txq->nb_tx_desc);
 
+		/* Track the RS threshold bucket at packet start */
+		uint16_t pkt_rs_idx = (uint16_t)(tx_id >> txq->log2_rs_thresh);
+
 		PMD_TX_LOG(DEBUG, "port_id=%u queue_id=%u pktlen=%u"
 			   " tx_first=%u tx_last=%u",
 			   (unsigned) txq->port_id,
@@ -876,7 +858,6 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 					tx_offload,
 					rte_security_dynfield(tx_pkt));
 
-				txe->last_id = tx_last;
 				tx_id = txe->next_id;
 				txe = txn;
 			}
@@ -922,7 +903,6 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 				rte_cpu_to_le_32(cmd_type_len | slen);
 			txd->read.olinfo_status =
 				rte_cpu_to_le_32(olinfo_status);
-			txe->last_id = tx_last;
 			tx_id = txe->next_id;
 			txe = txn;
 			m_seg = m_seg->next;
@@ -935,8 +915,18 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
-		/* Set RS bit only on threshold packets' last descriptor */
-		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
+		/*
+		 * Check if packet crosses into a new RS threshold bucket.
+		 * The RS bit is set on the last descriptor when we move from one bucket to another.
+		 * For example, with tx_rs_thresh=32 and a 5-descriptor packet using slots 30-34:
+		 *   - pkt_rs_idx = 30 >> 5 = 0 (started in bucket 0)
+		 *   - tx_last = 34, so 35 >> 5 = 1 (next packet is in bucket 1)
+		 *   - Since 0 != 1, set RS bit on descriptor 34, and record rs_last_id[0] = 34
+		 */
+		uint16_t next_rs_idx = ((tx_last + 1) >> txq->log2_rs_thresh);
+
+		if (next_rs_idx != pkt_rs_idx) {
+			/* Packet crossed into a new bucket - set RS bit on last descriptor */
 			PMD_TX_LOG(DEBUG,
 				   "Setting RS bit on TXD id="
 				   "%4u (port=%d queue=%d)",
@@ -944,9 +934,8 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 			cmd_type_len |= IXGBE_TXD_CMD_RS;
 
-			/* Update txq RS bit counters */
-			txq->nb_tx_used = 0;
-			txp = NULL;
+			/* Record the last descriptor ID for the bucket we're leaving */
+			txq->rs_last_id[pkt_rs_idx] = tx_last;
 		} else
 			txp = txd;
 
@@ -2521,6 +2510,7 @@ ixgbe_tx_queue_release(struct ci_tx_queue *txq)
 	if (txq != NULL && txq->ops != NULL) {
 		ci_txq_release_all_mbufs(txq, false);
 		txq->ops->free_swring(txq);
+		rte_free(txq->rs_last_id);
 		rte_memzone_free(txq->mz);
 		rte_free(txq);
 	}
@@ -2825,6 +2815,13 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev,
 			     (int)dev->data->port_id, (int)queue_idx);
 		return -(EINVAL);
 	}
+	if (!rte_is_power_of_2(tx_rs_thresh)) {
+		PMD_INIT_LOG(ERR, "tx_rs_thresh must be a power of 2. (tx_rs_thresh=%u port=%d queue=%d)",
+			     (unsigned int)tx_rs_thresh,
+			     (int)dev->data->port_id,
+			     (int)queue_idx);
+		return -(EINVAL);
+	}
 
 	/*
 	 * If rs_bit_thresh is greater than 1, then TX WTHRESH should be
@@ -2870,6 +2867,7 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	txq->mz = tz;
 	txq->nb_tx_desc = nb_desc;
 	txq->tx_rs_thresh = tx_rs_thresh;
+	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
 	txq->tx_free_thresh = tx_free_thresh;
 	txq->pthresh = tx_conf->tx_thresh.pthresh;
 	txq->hthresh = tx_conf->tx_thresh.hthresh;
@@ -2913,6 +2911,16 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	PMD_INIT_LOG(DEBUG, "sw_ring=%p hw_ring=%p dma_addr=0x%"PRIx64,
 		     txq->sw_ring, txq->ixgbe_tx_ring, txq->tx_ring_dma);
 
+	/* Allocate RS last_id tracking array */
+	uint16_t num_rs_buckets = nb_desc / tx_rs_thresh;
+	txq->rs_last_id = rte_zmalloc_socket(NULL, sizeof(txq->rs_last_id[0]) * num_rs_buckets,
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (txq->rs_last_id == NULL) {
+		ixgbe_tx_queue_release(txq);
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for RS last_id array");
+		return -ENOMEM;
+	}
+
 	/* set up vector or scalar TX function as appropriate */
 	ixgbe_set_tx_function(dev, txq);
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v4 26/35] net/intel: drop unused Tx queue used count
  2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (24 preceding siblings ...)
  2026-02-09 16:45   ` [PATCH v4 25/35] net/ixgbe: " Bruce Richardson
@ 2026-02-09 16:45   ` Bruce Richardson
  2026-02-10 14:14     ` Burakov, Anatoly
  2026-02-09 16:45   ` [PATCH v4 27/35] net/intel: remove index for tracking end of packet Bruce Richardson
                     ` (8 subsequent siblings)
  34 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 16:45 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Vladimir Medvedkin, Anatoly Burakov,
	Jingjing Wu, Praveen Shetty

Since drivers now track the setting of the RS bit based on fixed
thresholds rather than after a fixed number of descriptors, we no longer
need to track the number of descriptors used from one call to another.
Therefore we can remove the tx_used value in the Tx queue structure.

This value was still being used inside the IDPF splitq scalar code,
however, the ipdf driver-specific section of the Tx queue structure also
had an rs_compl_count value that was only used for the vector code
paths, so we can use it to replace the old tx_used value in the scalar
path.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h                   | 1 -
 drivers/net/intel/common/tx_scalar.h            | 1 -
 drivers/net/intel/i40e/i40e_rxtx.c              | 1 -
 drivers/net/intel/iavf/iavf_rxtx.c              | 1 -
 drivers/net/intel/ice/ice_dcf_ethdev.c          | 1 -
 drivers/net/intel/ice/ice_rxtx.c                | 1 -
 drivers/net/intel/idpf/idpf_common_rxtx.c       | 8 +++-----
 drivers/net/intel/ixgbe/ixgbe_rxtx.c            | 8 --------
 drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c | 1 -
 9 files changed, 3 insertions(+), 20 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index e7d79eb7d0..56baefe912 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -131,7 +131,6 @@ struct ci_tx_queue {
 	uint16_t *rs_last_id;
 	uint16_t nb_tx_desc;           /* number of TX descriptors */
 	uint16_t tx_tail; /* current value of tail register */
-	uint16_t nb_tx_used; /* number of TX desc used since RS bit set */
 	/* index to last TX descriptor to have been cleaned */
 	uint16_t last_desc_cleaned;
 	/* Total number of TX descriptors ready to be allocated. */
diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index acda2f0478..cf9a3a817e 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -404,7 +404,6 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			m_seg = m_seg->next;
 		} while (m_seg);
 end_pkt:
-		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
 		/* Check if packet crosses into a new RS threshold bucket.
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index b554bc6c31..1303010819 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -2645,7 +2645,6 @@ i40e_reset_tx_queue(struct ci_tx_queue *txq)
 	txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 
 	txq->last_desc_cleaned = (uint16_t)(txq->nb_tx_desc - 1);
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_desc - 1);
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index e7187f713d..3fcb8d7b79 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -288,7 +288,6 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	}
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 
 	txq->last_desc_cleaned = txq->nb_tx_desc - 1;
 	txq->nb_tx_free = txq->nb_tx_desc - 1;
diff --git a/drivers/net/intel/ice/ice_dcf_ethdev.c b/drivers/net/intel/ice/ice_dcf_ethdev.c
index 4ceecc15c6..02a23629d6 100644
--- a/drivers/net/intel/ice/ice_dcf_ethdev.c
+++ b/drivers/net/intel/ice/ice_dcf_ethdev.c
@@ -414,7 +414,6 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	}
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 
 	txq->last_desc_cleaned = txq->nb_tx_desc - 1;
 	txq->nb_tx_free = txq->nb_tx_desc - 1;
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 2915223397..87ffcd3895 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -1130,7 +1130,6 @@ ice_reset_tx_queue(struct ci_tx_queue *txq)
 	txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 
 	txq->last_desc_cleaned = (uint16_t)(txq->nb_tx_desc - 1);
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_desc - 1);
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 8859bcca86..95f2e1deea 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -224,7 +224,6 @@ idpf_qc_split_tx_descq_reset(struct ci_tx_queue *txq)
 	}
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 
 	/* Use this as next to clean for split desc queue */
 	txq->last_desc_cleaned = 0;
@@ -284,7 +283,6 @@ idpf_qc_single_tx_queue_reset(struct ci_tx_queue *txq)
 	}
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 
 	txq->last_desc_cleaned = txq->nb_tx_desc - 1;
 	txq->nb_tx_free = txq->nb_tx_desc - 1;
@@ -992,12 +990,12 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txd->qw1.cmd_dtype |= IDPF_TXD_FLEX_FLOW_CMD_EOP;
 
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
-		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
+		txq->rs_compl_count += nb_used;
 
-		if (txq->nb_tx_used >= 32) {
+		if (txq->rs_compl_count >= 32) {
 			txd->qw1.cmd_dtype |= IDPF_TXD_FLEX_FLOW_CMD_RE;
 			/* Update txq RE bit counters */
-			txq->nb_tx_used = 0;
+			txq->rs_compl_count = 0;
 		}
 	}
 
diff --git a/drivers/net/intel/ixgbe/ixgbe_rxtx.c b/drivers/net/intel/ixgbe/ixgbe_rxtx.c
index 3e37ccc50d..ea609d926a 100644
--- a/drivers/net/intel/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/intel/ixgbe/ixgbe_rxtx.c
@@ -708,12 +708,6 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		 */
 		nb_used = (uint16_t)(tx_pkt->nb_segs + new_ctx);
 
-		if (txp != NULL &&
-				nb_used + txq->nb_tx_used >= txq->tx_rs_thresh)
-			/* set RS on the previous packet in the burst */
-			txp->read.cmd_type_len |=
-				rte_cpu_to_le_32(IXGBE_TXD_CMD_RS);
-
 		/*
 		 * The number of descriptors that must be allocated for a
 		 * packet is the number of segments of that packet, plus 1
@@ -912,7 +906,6 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		 * The last packet data descriptor needs End Of Packet (EOP)
 		 */
 		cmd_type_len |= IXGBE_TXD_CMD_EOP;
-		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
 		/*
@@ -2551,7 +2544,6 @@ ixgbe_reset_tx_queue(struct ci_tx_queue *txq)
 	txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 	/*
 	 * Always allow 1 descriptor to be un-allocated to avoid
 	 * a H/W race condition
diff --git a/drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c b/drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c
index eb7c79eaf9..63c7cb50d3 100644
--- a/drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c
+++ b/drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c
@@ -47,7 +47,6 @@ ixgbe_reset_tx_queue_vec(struct ci_tx_queue *txq)
 	txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 	/*
 	 * Always allow 1 descriptor to be un-allocated to avoid
 	 * a H/W race condition
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v4 27/35] net/intel: remove index for tracking end of packet
  2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (25 preceding siblings ...)
  2026-02-09 16:45   ` [PATCH v4 26/35] net/intel: drop unused Tx queue used count Bruce Richardson
@ 2026-02-09 16:45   ` Bruce Richardson
  2026-02-10 14:15     ` Burakov, Anatoly
  2026-02-09 16:45   ` [PATCH v4 28/35] net/intel: merge ring writes in simple Tx for ice and i40e Bruce Richardson
                     ` (7 subsequent siblings)
  34 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 16:45 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Vladimir Medvedkin, Anatoly Burakov,
	Jingjing Wu, Praveen Shetty

The last_id value in each tx_sw_queue entry was no longer used in the
datapath, remove it and its initialization. For the function releasing
packets back, rather than relying on "last_id" to identify end of
packet, instead check for the next pointer being NULL.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h             | 1 -
 drivers/net/intel/i40e/i40e_rxtx.c        | 8 +++-----
 drivers/net/intel/iavf/iavf_rxtx.c        | 9 ++++-----
 drivers/net/intel/ice/ice_dcf_ethdev.c    | 1 -
 drivers/net/intel/ice/ice_rxtx.c          | 9 ++++-----
 drivers/net/intel/idpf/idpf_common_rxtx.c | 2 --
 drivers/net/intel/ixgbe/ixgbe_rxtx.c      | 9 ++++-----
 7 files changed, 15 insertions(+), 24 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 56baefe912..203938180b 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -105,7 +105,6 @@ struct ci_tx_queue;
 struct ci_tx_entry {
 	struct rte_mbuf *mbuf; /* mbuf associated with TX desc, if any. */
 	uint16_t next_id; /* Index of next descriptor in ring. */
-	uint16_t last_id; /* Index of last scattered descriptor. */
 };
 
 /**
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 1303010819..ba94c59c0a 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -2536,14 +2536,13 @@ i40e_tx_done_cleanup_full(struct ci_tx_queue *txq,
 			pkt_cnt < free_cnt &&
 			tx_id != tx_last; i++) {
 			if (swr_ring[tx_id].mbuf != NULL) {
-				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
-				swr_ring[tx_id].mbuf = NULL;
-
 				/*
 				 * last segment in the packet,
 				 * increment packet count
 				 */
-				pkt_cnt += (swr_ring[tx_id].last_id == tx_id);
+				pkt_cnt += (swr_ring[tx_id].mbuf->next == NULL) ? 1 : 0;
+				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
+				swr_ring[tx_id].mbuf = NULL;
 			}
 
 			tx_id = swr_ring[tx_id].next_id;
@@ -2636,7 +2635,6 @@ i40e_reset_tx_queue(struct ci_tx_queue *txq)
 		txd->cmd_type_offset_bsz =
 			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
-		txe[i].last_id = i;
 		txe[prev].next_id = i;
 		prev = i;
 	}
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index 3fcb8d7b79..cb3b579d20 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -282,7 +282,6 @@ reset_tx_queue(struct ci_tx_queue *txq)
 		txq->ci_tx_ring[i].cmd_type_offset_bsz =
 			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
-		txe[i].last_id = i;
 		txe[prev].next_id = i;
 		prev = i;
 	}
@@ -3960,14 +3959,14 @@ iavf_tx_done_cleanup_full(struct ci_tx_queue *txq,
 	while (pkt_cnt < free_cnt) {
 		do {
 			if (swr_ring[tx_id].mbuf != NULL) {
-				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
-				swr_ring[tx_id].mbuf = NULL;
-
 				/*
 				 * last segment in the packet,
 				 * increment packet count
 				 */
-				pkt_cnt += (swr_ring[tx_id].last_id == tx_id);
+				pkt_cnt += (swr_ring[tx_id].mbuf->next == NULL) ? 1 : 0;
+				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
+				swr_ring[tx_id].mbuf = NULL;
+
 			}
 
 			tx_id = swr_ring[tx_id].next_id;
diff --git a/drivers/net/intel/ice/ice_dcf_ethdev.c b/drivers/net/intel/ice/ice_dcf_ethdev.c
index 02a23629d6..abd7875e7b 100644
--- a/drivers/net/intel/ice/ice_dcf_ethdev.c
+++ b/drivers/net/intel/ice/ice_dcf_ethdev.c
@@ -408,7 +408,6 @@ reset_tx_queue(struct ci_tx_queue *txq)
 		txq->ci_tx_ring[i].cmd_type_offset_bsz =
 			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
-		txe[i].last_id = i;
 		txe[prev].next_id = i;
 		prev = i;
 	}
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 87ffcd3895..fe65df94da 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -1121,7 +1121,6 @@ ice_reset_tx_queue(struct ci_tx_queue *txq)
 		txd->cmd_type_offset_bsz =
 			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
-		txe[i].last_id = i;
 		txe[prev].next_id = i;
 		prev = i;
 	}
@@ -3201,14 +3200,14 @@ ice_tx_done_cleanup_full(struct ci_tx_queue *txq,
 			pkt_cnt < free_cnt &&
 			tx_id != tx_last; i++) {
 			if (swr_ring[tx_id].mbuf != NULL) {
-				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
-				swr_ring[tx_id].mbuf = NULL;
-
 				/*
 				 * last segment in the packet,
 				 * increment packet count
 				 */
-				pkt_cnt += (swr_ring[tx_id].last_id == tx_id);
+				pkt_cnt += swr_ring[tx_id].mbuf->next == NULL ? 1 : 0;
+				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
+				swr_ring[tx_id].mbuf = NULL;
+
 			}
 
 			tx_id = swr_ring[tx_id].next_id;
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 95f2e1deea..bd77113551 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -218,7 +218,6 @@ idpf_qc_split_tx_descq_reset(struct ci_tx_queue *txq)
 	prev = (uint16_t)(txq->sw_nb_desc - 1);
 	for (i = 0; i < txq->sw_nb_desc; i++) {
 		txe[i].mbuf = NULL;
-		txe[i].last_id = i;
 		txe[prev].next_id = i;
 		prev = i;
 	}
@@ -277,7 +276,6 @@ idpf_qc_single_tx_queue_reset(struct ci_tx_queue *txq)
 		txq->ci_tx_ring[i].cmd_type_offset_bsz =
 			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
-		txe[i].last_id = i;
 		txe[prev].next_id = i;
 		prev = i;
 	}
diff --git a/drivers/net/intel/ixgbe/ixgbe_rxtx.c b/drivers/net/intel/ixgbe/ixgbe_rxtx.c
index ea609d926a..dc9fda8e21 100644
--- a/drivers/net/intel/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/intel/ixgbe/ixgbe_rxtx.c
@@ -2407,14 +2407,14 @@ ixgbe_tx_done_cleanup_full(struct ci_tx_queue *txq, uint32_t free_cnt)
 			pkt_cnt < free_cnt &&
 			tx_id != tx_last; i++) {
 			if (swr_ring[tx_id].mbuf != NULL) {
-				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
-				swr_ring[tx_id].mbuf = NULL;
-
 				/*
 				 * last segment in the packet,
 				 * increment packet count
 				 */
-				pkt_cnt += (swr_ring[tx_id].last_id == tx_id);
+				pkt_cnt += swr_ring[tx_id].mbuf->next == NULL ? 1 : 0;
+				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
+				swr_ring[tx_id].mbuf = NULL;
+
 			}
 
 			tx_id = swr_ring[tx_id].next_id;
@@ -2535,7 +2535,6 @@ ixgbe_reset_tx_queue(struct ci_tx_queue *txq)
 
 		txd->wb.status = rte_cpu_to_le_32(IXGBE_TXD_STAT_DD);
 		txe[i].mbuf = NULL;
-		txe[i].last_id = i;
 		txe[prev].next_id = i;
 		prev = i;
 	}
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v4 28/35] net/intel: merge ring writes in simple Tx for ice and i40e
  2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (26 preceding siblings ...)
  2026-02-09 16:45   ` [PATCH v4 27/35] net/intel: remove index for tracking end of packet Bruce Richardson
@ 2026-02-09 16:45   ` Bruce Richardson
  2026-02-09 23:18     ` Medvedkin, Vladimir
  2026-02-09 16:45   ` [PATCH v4 29/35] net/intel: consolidate ice and i40e buffer free function Bruce Richardson
                     ` (6 subsequent siblings)
  34 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 16:45 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

The ice and i40e drivers have identical code for writing ring entries in
the simple Tx path, so merge in the descriptor writing code.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h                 |  6 ++
 drivers/net/intel/common/tx_scalar.h          | 60 ++++++++++++++
 drivers/net/intel/i40e/i40e_rxtx.c            | 79 +------------------
 drivers/net/intel/i40e/i40e_rxtx.h            |  3 -
 .../net/intel/i40e/i40e_rxtx_vec_altivec.c    |  4 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c   |  4 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c |  4 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_neon.c   |  4 +-
 drivers/net/intel/ice/ice_rxtx.c              | 69 +---------------
 drivers/net/intel/ice/ice_rxtx.h              |  2 -
 drivers/net/intel/ice/ice_rxtx_vec_avx2.c     |  4 +-
 drivers/net/intel/ice/ice_rxtx_vec_avx512.c   |  4 +-
 12 files changed, 86 insertions(+), 157 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 203938180b..ef6d543e7a 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -63,6 +63,12 @@ enum ci_tx_l2tag1_field {
 /* Common maximum data per TX descriptor */
 #define CI_MAX_DATA_PER_TXD     (CI_TXD_QW1_TX_BUF_SZ_M >> CI_TXD_QW1_TX_BUF_SZ_S)
 
+/* Common TX maximum burst size for chunked transmission in simple paths */
+#define CI_TX_MAX_BURST 32
+
+/* Common TX descriptor command flags for simple transmit */
+#define CI_TX_DESC_CMD_DEFAULT (CI_TX_DESC_CMD_ICRC | CI_TX_DESC_CMD_EOP)
+
 /* Checksum offload mask to identify packets requesting offload */
 #define CI_TX_CKSUM_OFFLOAD_MASK (RTE_MBUF_F_TX_IP_CKSUM |		 \
 				   RTE_MBUF_F_TX_L4_MASK |		 \
diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index cf9a3a817e..ef6069efbf 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -12,6 +12,66 @@
 /* depends on common Tx definitions. */
 #include "tx.h"
 
+/* Populate 4 descriptors with data from 4 mbufs */
+static inline void
+ci_tx_fill_hw_ring_tx4(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
+{
+	uint64_t dma_addr;
+	uint32_t i;
+
+	for (i = 0; i < 4; i++, txdp++, pkts++) {
+		dma_addr = rte_mbuf_data_iova(*pkts);
+		txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
+		txdp->cmd_type_offset_bsz =
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)CI_TX_DESC_CMD_DEFAULT << CI_TXD_QW1_CMD_S) |
+				((uint64_t)(*pkts)->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
+	}
+}
+
+/* Populate 1 descriptor with data from 1 mbuf */
+static inline void
+ci_tx_fill_hw_ring_tx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
+{
+	uint64_t dma_addr;
+
+	dma_addr = rte_mbuf_data_iova(*pkts);
+	txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
+	txdp->cmd_type_offset_bsz =
+		rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+			((uint64_t)CI_TX_DESC_CMD_DEFAULT << CI_TXD_QW1_CMD_S) |
+			((uint64_t)(*pkts)->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
+}
+
+/* Fill hardware descriptor ring with mbuf data */
+static inline void
+ci_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
+		   uint16_t nb_pkts)
+{
+	volatile struct ci_tx_desc *txdp = &txq->ci_tx_ring[txq->tx_tail];
+	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
+	const int N_PER_LOOP = 4;
+	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
+	int mainpart, leftover;
+	int i, j;
+
+	mainpart = nb_pkts & ((uint32_t)~N_PER_LOOP_MASK);
+	leftover = nb_pkts & ((uint32_t)N_PER_LOOP_MASK);
+	for (i = 0; i < mainpart; i += N_PER_LOOP) {
+		for (j = 0; j < N_PER_LOOP; ++j)
+			(txep + i + j)->mbuf = *(pkts + i + j);
+		ci_tx_fill_hw_ring_tx4(txdp + i, pkts + i);
+	}
+
+	if (unlikely(leftover > 0)) {
+		for (i = 0; i < leftover; ++i) {
+			(txep + mainpart + i)->mbuf = *(pkts + mainpart + i);
+			ci_tx_fill_hw_ring_tx1(txdp + mainpart + i,
+					       pkts + mainpart + i);
+		}
+	}
+}
+
 /*
  * Common transmit descriptor cleanup function for Intel drivers.
  *
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index ba94c59c0a..174d517e9d 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -311,19 +311,6 @@ i40e_parse_tunneling_params(uint64_t ol_flags,
 		*cd_tunneling |= I40E_TXD_CTX_QW0_L4T_CS_MASK;
 }
 
-/* Construct the tx flags */
-static inline uint64_t
-i40e_build_ctob(uint32_t td_cmd,
-		uint32_t td_offset,
-		unsigned int size,
-		uint32_t td_tag)
-{
-	return rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
-			((uint64_t)td_cmd  << CI_TXD_QW1_CMD_S) |
-			((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
-			((uint64_t)size  << CI_TXD_QW1_TX_BUF_SZ_S) |
-			((uint64_t)td_tag  << CI_TXD_QW1_L2TAG1_S));
-}
 
 static inline int
 #ifdef RTE_LIBRTE_I40E_RX_ALLOW_BULK_ALLOC
@@ -1082,64 +1069,6 @@ i40e_tx_free_bufs(struct ci_tx_queue *txq)
 	return tx_rs_thresh;
 }
 
-/* Populate 4 descriptors with data from 4 mbufs */
-static inline void
-tx4(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
-{
-	uint64_t dma_addr;
-	uint32_t i;
-
-	for (i = 0; i < 4; i++, txdp++, pkts++) {
-		dma_addr = rte_mbuf_data_iova(*pkts);
-		txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
-		txdp->cmd_type_offset_bsz =
-			i40e_build_ctob((uint32_t)I40E_TD_CMD, 0,
-					(*pkts)->data_len, 0);
-	}
-}
-
-/* Populate 1 descriptor with data from 1 mbuf */
-static inline void
-tx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
-{
-	uint64_t dma_addr;
-
-	dma_addr = rte_mbuf_data_iova(*pkts);
-	txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
-	txdp->cmd_type_offset_bsz =
-		i40e_build_ctob((uint32_t)I40E_TD_CMD, 0,
-				(*pkts)->data_len, 0);
-}
-
-/* Fill hardware descriptor ring with mbuf data */
-static inline void
-i40e_tx_fill_hw_ring(struct ci_tx_queue *txq,
-		     struct rte_mbuf **pkts,
-		     uint16_t nb_pkts)
-{
-	volatile struct ci_tx_desc *txdp = &txq->ci_tx_ring[txq->tx_tail];
-	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
-	const int N_PER_LOOP = 4;
-	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
-	int mainpart, leftover;
-	int i, j;
-
-	mainpart = (nb_pkts & ((uint32_t) ~N_PER_LOOP_MASK));
-	leftover = (nb_pkts & ((uint32_t)  N_PER_LOOP_MASK));
-	for (i = 0; i < mainpart; i += N_PER_LOOP) {
-		for (j = 0; j < N_PER_LOOP; ++j) {
-			(txep + i + j)->mbuf = *(pkts + i + j);
-		}
-		tx4(txdp + i, pkts + i);
-	}
-	if (unlikely(leftover > 0)) {
-		for (i = 0; i < leftover; ++i) {
-			(txep + mainpart + i)->mbuf = *(pkts + mainpart + i);
-			tx1(txdp + mainpart + i, pkts + mainpart + i);
-		}
-	}
-}
-
 static inline uint16_t
 tx_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
@@ -1164,7 +1093,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
 	if ((txq->tx_tail + nb_pkts) > txq->nb_tx_desc) {
 		n = (uint16_t)(txq->nb_tx_desc - txq->tx_tail);
-		i40e_tx_fill_hw_ring(txq, tx_pkts, n);
+		ci_tx_fill_hw_ring(txq, tx_pkts, n);
 		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
@@ -1172,7 +1101,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	}
 
 	/* Fill hardware descriptor ring with mbuf data */
-	i40e_tx_fill_hw_ring(txq, tx_pkts + n, (uint16_t)(nb_pkts - n));
+	ci_tx_fill_hw_ring(txq, tx_pkts + n, (uint16_t)(nb_pkts - n));
 	txq->tx_tail = (uint16_t)(txq->tx_tail + (nb_pkts - n));
 
 	/* Determine if RS bit needs to be set */
@@ -1201,13 +1130,13 @@ i40e_xmit_pkts_simple(void *tx_queue,
 {
 	uint16_t nb_tx = 0;
 
-	if (likely(nb_pkts <= I40E_TX_MAX_BURST))
+	if (likely(nb_pkts <= CI_TX_MAX_BURST))
 		return tx_xmit_pkts((struct ci_tx_queue *)tx_queue,
 						tx_pkts, nb_pkts);
 
 	while (nb_pkts) {
 		uint16_t ret, num = (uint16_t)RTE_MIN(nb_pkts,
-						I40E_TX_MAX_BURST);
+						CI_TX_MAX_BURST);
 
 		ret = tx_xmit_pkts((struct ci_tx_queue *)tx_queue,
 						&tx_pkts[nb_tx], num);
diff --git a/drivers/net/intel/i40e/i40e_rxtx.h b/drivers/net/intel/i40e/i40e_rxtx.h
index db8525d52d..88d47f261e 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.h
+++ b/drivers/net/intel/i40e/i40e_rxtx.h
@@ -47,9 +47,6 @@
 #define I40E_RX_DESC_EXT_STATUS_FLEXBL_MASK   0x03
 #define I40E_RX_DESC_EXT_STATUS_FLEXBL_FLEX   0x01
 
-#define I40E_TD_CMD (CI_TX_DESC_CMD_ICRC |\
-		     CI_TX_DESC_CMD_EOP)
-
 enum i40e_header_split_mode {
 	i40e_header_split_none = 0,
 	i40e_header_split_enabled = 1,
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
index 4c36748d94..68667bdc9b 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
@@ -476,8 +476,8 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
-	uint64_t flags = I40E_TD_CMD;
-	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
+	uint64_t flags = CI_TX_DESC_CMD_DEFAULT;
+	uint64_t rs = CI_TX_DESC_CMD_RS | CI_TX_DESC_CMD_DEFAULT;
 	int i;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
index 502a1842c6..e1672c4371 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
@@ -741,8 +741,8 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
-	uint64_t flags = I40E_TD_CMD;
-	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
+	uint64_t flags = CI_TX_DESC_CMD_DEFAULT;
+	uint64_t rs = CI_TX_DESC_CMD_RS | CI_TX_DESC_CMD_DEFAULT;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
 		ci_tx_free_bufs_vec(txq, i40e_tx_desc_done, false);
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
index d48ff9f51e..bceb95ad2d 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
@@ -801,8 +801,8 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
-	uint64_t flags = I40E_TD_CMD;
-	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
+	uint64_t flags = CI_TX_DESC_CMD_DEFAULT;
+	uint64_t rs = CI_TX_DESC_CMD_RS | CI_TX_DESC_CMD_DEFAULT;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
 		ci_tx_free_bufs_vec(txq, i40e_tx_desc_done, false);
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
index be4c64942e..debc9bda28 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
@@ -626,8 +626,8 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict tx_queue,
 	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
-	uint64_t flags = I40E_TD_CMD;
-	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
+	uint64_t flags = CI_TX_DESC_CMD_DEFAULT;
+	uint64_t rs = CI_TX_DESC_CMD_RS | CI_TX_DESC_CMD_DEFAULT;
 	int i;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index fe65df94da..e4fba453a9 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3286,67 +3286,6 @@ ice_tx_done_cleanup(void *txq, uint32_t free_cnt)
 		return ice_tx_done_cleanup_full(q, free_cnt);
 }
 
-/* Populate 4 descriptors with data from 4 mbufs */
-static inline void
-tx4(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
-{
-	uint64_t dma_addr;
-	uint32_t i;
-
-	for (i = 0; i < 4; i++, txdp++, pkts++) {
-		dma_addr = rte_mbuf_data_iova(*pkts);
-		txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
-		txdp->cmd_type_offset_bsz =
-			ice_build_ctob((uint32_t)ICE_TD_CMD, 0,
-				       (*pkts)->data_len, 0);
-	}
-}
-
-/* Populate 1 descriptor with data from 1 mbuf */
-static inline void
-tx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
-{
-	uint64_t dma_addr;
-
-	dma_addr = rte_mbuf_data_iova(*pkts);
-	txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
-	txdp->cmd_type_offset_bsz =
-		ice_build_ctob((uint32_t)ICE_TD_CMD, 0,
-			       (*pkts)->data_len, 0);
-}
-
-static inline void
-ice_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
-		    uint16_t nb_pkts)
-{
-	volatile struct ci_tx_desc *txdp = &txq->ci_tx_ring[txq->tx_tail];
-	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
-	const int N_PER_LOOP = 4;
-	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
-	int mainpart, leftover;
-	int i, j;
-
-	/**
-	 * Process most of the packets in chunks of N pkts.  Any
-	 * leftover packets will get processed one at a time.
-	 */
-	mainpart = nb_pkts & ((uint32_t)~N_PER_LOOP_MASK);
-	leftover = nb_pkts & ((uint32_t)N_PER_LOOP_MASK);
-	for (i = 0; i < mainpart; i += N_PER_LOOP) {
-		/* Copy N mbuf pointers to the S/W ring */
-		for (j = 0; j < N_PER_LOOP; ++j)
-			(txep + i + j)->mbuf = *(pkts + i + j);
-		tx4(txdp + i, pkts + i);
-	}
-
-	if (unlikely(leftover > 0)) {
-		for (i = 0; i < leftover; ++i) {
-			(txep + mainpart + i)->mbuf = *(pkts + mainpart + i);
-			tx1(txdp + mainpart + i, pkts + mainpart + i);
-		}
-	}
-}
-
 static inline uint16_t
 tx_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
@@ -3371,7 +3310,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
 	if ((txq->tx_tail + nb_pkts) > txq->nb_tx_desc) {
 		n = (uint16_t)(txq->nb_tx_desc - txq->tx_tail);
-		ice_tx_fill_hw_ring(txq, tx_pkts, n);
+		ci_tx_fill_hw_ring(txq, tx_pkts, n);
 		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
@@ -3379,7 +3318,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	}
 
 	/* Fill hardware descriptor ring with mbuf data */
-	ice_tx_fill_hw_ring(txq, tx_pkts + n, (uint16_t)(nb_pkts - n));
+	ci_tx_fill_hw_ring(txq, tx_pkts + n, (uint16_t)(nb_pkts - n));
 	txq->tx_tail = (uint16_t)(txq->tx_tail + (nb_pkts - n));
 
 	/* Determine if RS bit needs to be set */
@@ -3408,13 +3347,13 @@ ice_xmit_pkts_simple(void *tx_queue,
 {
 	uint16_t nb_tx = 0;
 
-	if (likely(nb_pkts <= ICE_TX_MAX_BURST))
+	if (likely(nb_pkts <= CI_TX_MAX_BURST))
 		return tx_xmit_pkts((struct ci_tx_queue *)tx_queue,
 				    tx_pkts, nb_pkts);
 
 	while (nb_pkts) {
 		uint16_t ret, num = (uint16_t)RTE_MIN(nb_pkts,
-						      ICE_TX_MAX_BURST);
+						      CI_TX_MAX_BURST);
 
 		ret = tx_xmit_pkts((struct ci_tx_queue *)tx_queue,
 				   &tx_pkts[nb_tx], num);
diff --git a/drivers/net/intel/ice/ice_rxtx.h b/drivers/net/intel/ice/ice_rxtx.h
index 7d6480b410..77ed41f9fd 100644
--- a/drivers/net/intel/ice/ice_rxtx.h
+++ b/drivers/net/intel/ice/ice_rxtx.h
@@ -46,8 +46,6 @@
 
 #define ICE_SUPPORT_CHAIN_NUM 5
 
-#define ICE_TD_CMD                      CI_TX_DESC_CMD_EOP
-
 #define ICE_VPMD_RX_BURST            CI_VPMD_RX_BURST
 #define ICE_VPMD_TX_BURST            32
 #define ICE_VPMD_RXQ_REARM_THRESH    CI_VPMD_RX_REARM_THRESH
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
index 2922671158..d03f2e5b36 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
@@ -845,8 +845,8 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
-	uint64_t flags = ICE_TD_CMD;
-	uint64_t rs = CI_TX_DESC_CMD_RS | ICE_TD_CMD;
+	uint64_t flags = CI_TX_DESC_CMD_DEFAULT;
+	uint64_t rs = CI_TX_DESC_CMD_RS | CI_TX_DESC_CMD_DEFAULT;
 
 	/* cross rx_thresh boundary is not allowed */
 	nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
index e64b6e227b..004c01054a 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
@@ -909,8 +909,8 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
-	uint64_t flags = ICE_TD_CMD;
-	uint64_t rs = CI_TX_DESC_CMD_RS | ICE_TD_CMD;
+	uint64_t flags = CI_TX_DESC_CMD_DEFAULT;
+	uint64_t rs = CI_TX_DESC_CMD_RS | CI_TX_DESC_CMD_DEFAULT;
 
 	/* cross rx_thresh boundary is not allowed */
 	nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v4 29/35] net/intel: consolidate ice and i40e buffer free function
  2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (27 preceding siblings ...)
  2026-02-09 16:45   ` [PATCH v4 28/35] net/intel: merge ring writes in simple Tx for ice and i40e Bruce Richardson
@ 2026-02-09 16:45   ` Bruce Richardson
  2026-02-09 23:19     ` Medvedkin, Vladimir
  2026-02-09 16:45   ` [PATCH v4 30/35] net/intel: complete merging simple Tx paths Bruce Richardson
                     ` (5 subsequent siblings)
  34 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 16:45 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

The buffer freeing function for the simple scalar Tx path is almost
identical in both ice and i40e drivers, except that the i40e has
batching for the FAST_FREE case. Consolidate both functions into a
common one based off the better i40e version.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h        |  3 ++
 drivers/net/intel/common/tx_scalar.h | 58 +++++++++++++++++++++++++
 drivers/net/intel/i40e/i40e_rxtx.c   | 63 +---------------------------
 drivers/net/intel/ice/ice_rxtx.c     | 45 +-------------------
 4 files changed, 65 insertions(+), 104 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index ef6d543e7a..67378a0803 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -66,6 +66,9 @@ enum ci_tx_l2tag1_field {
 /* Common TX maximum burst size for chunked transmission in simple paths */
 #define CI_TX_MAX_BURST 32
 
+/* Common TX maximum free buffer size for batched bulk freeing */
+#define CI_TX_MAX_FREE_BUF_SZ 64
+
 /* Common TX descriptor command flags for simple transmit */
 #define CI_TX_DESC_CMD_DEFAULT (CI_TX_DESC_CMD_ICRC | CI_TX_DESC_CMD_EOP)
 
diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index ef6069efbf..f0e7b4664b 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -72,6 +72,64 @@ ci_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
 	}
 }
 
+/* Free transmitted mbufs from descriptor ring with bulk freeing for Tx simple path */
+static __rte_always_inline int
+ci_tx_free_bufs(struct ci_tx_queue *txq)
+{
+	const uint16_t rs_thresh = txq->tx_rs_thresh;
+	const uint16_t k = RTE_ALIGN_FLOOR(rs_thresh, CI_TX_MAX_FREE_BUF_SZ);
+	const uint16_t m = rs_thresh % CI_TX_MAX_FREE_BUF_SZ;
+	struct rte_mbuf *free[CI_TX_MAX_FREE_BUF_SZ];
+	struct ci_tx_entry *txep;
+
+	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
+			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
+		return 0;
+
+	txep = &txq->sw_ring[txq->tx_next_dd - (rs_thresh - 1)];
+
+	struct rte_mempool *fast_free_mp =
+			likely(txq->fast_free_mp != (void *)UINTPTR_MAX) ?
+				txq->fast_free_mp :
+				(txq->fast_free_mp = txep[0].mbuf->pool);
+
+	if (fast_free_mp) {
+		if (k) {
+			for (uint16_t j = 0; j != k; j += CI_TX_MAX_FREE_BUF_SZ) {
+				for (uint16_t i = 0; i < CI_TX_MAX_FREE_BUF_SZ; ++i, ++txep) {
+					free[i] = txep->mbuf;
+					txep->mbuf = NULL;
+				}
+				rte_mbuf_raw_free_bulk(fast_free_mp, free, CI_TX_MAX_FREE_BUF_SZ);
+			}
+		}
+
+		if (m) {
+			for (uint16_t i = 0; i < m; ++i, ++txep) {
+				free[i] = txep->mbuf;
+				txep->mbuf = NULL;
+			}
+			rte_mbuf_raw_free_bulk(fast_free_mp, free, m);
+		}
+	} else {
+		for (uint16_t i = 0; i < rs_thresh; ++i, ++txep)
+			rte_prefetch0((txep + i)->mbuf);
+
+		for (uint16_t i = 0; i < rs_thresh; ++i, ++txep) {
+			rte_pktmbuf_free_seg(txep->mbuf);
+			txep->mbuf = NULL;
+		}
+	}
+
+	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + rs_thresh);
+	txq->tx_next_dd = (uint16_t)(txq->tx_next_dd + rs_thresh);
+	if (txq->tx_next_dd >= txq->nb_tx_desc)
+		txq->tx_next_dd = (uint16_t)(rs_thresh - 1);
+
+	return rs_thresh;
+}
+
 /*
  * Common transmit descriptor cleanup function for Intel drivers.
  *
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 174d517e9d..6b8d9fd70e 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -1010,65 +1010,6 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			get_context_desc, NULL, NULL);
 }
 
-static __rte_always_inline int
-i40e_tx_free_bufs(struct ci_tx_queue *txq)
-{
-	struct ci_tx_entry *txep;
-	const uint16_t tx_rs_thresh = txq->tx_rs_thresh;
-	uint16_t i, j;
-	struct rte_mbuf *free[I40E_TX_MAX_FREE_BUF_SZ];
-	const uint16_t k = RTE_ALIGN_FLOOR(tx_rs_thresh, I40E_TX_MAX_FREE_BUF_SZ);
-	const uint16_t m = tx_rs_thresh % I40E_TX_MAX_FREE_BUF_SZ;
-
-	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
-			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
-		return 0;
-
-	txep = &txq->sw_ring[txq->tx_next_dd - (tx_rs_thresh - 1)];
-
-	struct rte_mempool *fast_free_mp =
-			likely(txq->fast_free_mp != (void *)UINTPTR_MAX) ?
-			txq->fast_free_mp :
-			(txq->fast_free_mp = txep[0].mbuf->pool);
-
-	if (fast_free_mp != NULL) {
-		if (k) {
-			for (j = 0; j != k; j += I40E_TX_MAX_FREE_BUF_SZ) {
-				for (i = 0; i < I40E_TX_MAX_FREE_BUF_SZ; ++i, ++txep) {
-					free[i] = txep->mbuf;
-					txep->mbuf = NULL;
-				}
-				rte_mbuf_raw_free_bulk(fast_free_mp, free,
-						I40E_TX_MAX_FREE_BUF_SZ);
-			}
-		}
-
-		if (m) {
-			for (i = 0; i < m; ++i, ++txep) {
-				free[i] = txep->mbuf;
-				txep->mbuf = NULL;
-			}
-			rte_mbuf_raw_free_bulk(fast_free_mp, free, m);
-		}
-	} else {
-		for (i = 0; i < tx_rs_thresh; i++)
-			rte_prefetch0((txep + i)->mbuf);
-
-		for (i = 0; i < tx_rs_thresh; ++i, ++txep) {
-			rte_pktmbuf_free_seg(txep->mbuf);
-			txep->mbuf = NULL;
-		}
-	}
-
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + tx_rs_thresh);
-	txq->tx_next_dd = (uint16_t)(txq->tx_next_dd + tx_rs_thresh);
-	if (txq->tx_next_dd >= txq->nb_tx_desc)
-		txq->tx_next_dd = (uint16_t)(tx_rs_thresh - 1);
-
-	return tx_rs_thresh;
-}
-
 static inline uint16_t
 tx_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
@@ -1083,7 +1024,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	 * descriptor, free the associated buffer.
 	 */
 	if (txq->nb_tx_free < txq->tx_free_thresh)
-		i40e_tx_free_bufs(txq);
+		ci_tx_free_bufs(txq);
 
 	/* Use available descriptor only */
 	nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
@@ -2508,7 +2449,7 @@ i40e_tx_done_cleanup_simple(struct ci_tx_queue *txq,
 		if (txq->nb_tx_desc - txq->nb_tx_free < txq->tx_rs_thresh)
 			break;
 
-		n = i40e_tx_free_bufs(txq);
+		n = ci_tx_free_bufs(txq);
 
 		if (n == 0)
 			break;
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index e4fba453a9..a3a94033bf 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3129,47 +3129,6 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	return ci_xmit_pkts(txq, tx_pkts, nb_pkts, CI_VLAN_IN_L2TAG1, get_context_desc, NULL, NULL);
 }
 
-static __rte_always_inline int
-ice_tx_free_bufs(struct ci_tx_queue *txq)
-{
-	struct ci_tx_entry *txep;
-	uint16_t i;
-
-	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
-	     rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
-	    rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
-		return 0;
-
-	txep = &txq->sw_ring[txq->tx_next_dd - (txq->tx_rs_thresh - 1)];
-
-	struct rte_mempool *fast_free_mp =
-			likely(txq->fast_free_mp != (void *)UINTPTR_MAX) ?
-			txq->fast_free_mp :
-			(txq->fast_free_mp = txep[0].mbuf->pool);
-
-	if (fast_free_mp != NULL) {
-		for (i = 0; i < txq->tx_rs_thresh; ++i, ++txep) {
-			rte_mempool_put(fast_free_mp, txep->mbuf);
-			txep->mbuf = NULL;
-		}
-	} else {
-		for (i = 0; i < txq->tx_rs_thresh; i++)
-			rte_prefetch0((txep + i)->mbuf);
-
-		for (i = 0; i < txq->tx_rs_thresh; ++i, ++txep) {
-			rte_pktmbuf_free_seg(txep->mbuf);
-			txep->mbuf = NULL;
-		}
-	}
-
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + txq->tx_rs_thresh);
-	txq->tx_next_dd = (uint16_t)(txq->tx_next_dd + txq->tx_rs_thresh);
-	if (txq->tx_next_dd >= txq->nb_tx_desc)
-		txq->tx_next_dd = (uint16_t)(txq->tx_rs_thresh - 1);
-
-	return txq->tx_rs_thresh;
-}
-
 static int
 ice_tx_done_cleanup_full(struct ci_tx_queue *txq,
 			uint32_t free_cnt)
@@ -3259,7 +3218,7 @@ ice_tx_done_cleanup_simple(struct ci_tx_queue *txq,
 		if (txq->nb_tx_desc - txq->nb_tx_free < txq->tx_rs_thresh)
 			break;
 
-		n = ice_tx_free_bufs(txq);
+		n = ci_tx_free_bufs(txq);
 
 		if (n == 0)
 			break;
@@ -3300,7 +3259,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	 * descriptor, free the associated buffer.
 	 */
 	if (txq->nb_tx_free < txq->tx_free_thresh)
-		ice_tx_free_bufs(txq);
+		ci_tx_free_bufs(txq);
 
 	/* Use available descriptor only */
 	nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v4 30/35] net/intel: complete merging simple Tx paths
  2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (28 preceding siblings ...)
  2026-02-09 16:45   ` [PATCH v4 29/35] net/intel: consolidate ice and i40e buffer free function Bruce Richardson
@ 2026-02-09 16:45   ` Bruce Richardson
  2026-02-09 23:19     ` Medvedkin, Vladimir
  2026-02-09 16:45   ` [PATCH v4 31/35] net/intel: use non-volatile stores in simple Tx function Bruce Richardson
                     ` (4 subsequent siblings)
  34 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 16:45 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

Complete the deduplication/merger of the ice and i40e Tx simple scalar
paths.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar.h | 87 ++++++++++++++++++++++++++++
 drivers/net/intel/i40e/i40e_rxtx.c   | 74 +----------------------
 drivers/net/intel/ice/ice_rxtx.c     | 74 +----------------------
 3 files changed, 89 insertions(+), 146 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index f0e7b4664b..4ba97303cb 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -130,6 +130,93 @@ ci_tx_free_bufs(struct ci_tx_queue *txq)
 	return rs_thresh;
 }
 
+/* Simple burst transmit for descriptor-based simple Tx path
+ *
+ * Transmits a burst of packets by filling hardware descriptors with mbuf
+ * data. Handles ring wrap-around and RS bit management. Performs descriptor
+ * cleanup when tx_free_thresh is reached.
+ *
+ * Returns: number of packets transmitted
+ */
+static inline uint16_t
+ci_xmit_burst_simple(struct ci_tx_queue *txq,
+		     struct rte_mbuf **tx_pkts,
+		     uint16_t nb_pkts)
+{
+	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
+	uint16_t n = 0;
+
+	/**
+	 * Begin scanning the H/W ring for done descriptors when the number
+	 * of available descriptors drops below tx_free_thresh. For each done
+	 * descriptor, free the associated buffer.
+	 */
+	if (txq->nb_tx_free < txq->tx_free_thresh)
+		ci_tx_free_bufs(txq);
+
+	/* Use available descriptor only */
+	nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
+	if (unlikely(!nb_pkts))
+		return 0;
+
+	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
+	if ((txq->tx_tail + nb_pkts) > txq->nb_tx_desc) {
+		n = (uint16_t)(txq->nb_tx_desc - txq->tx_tail);
+		ci_tx_fill_hw_ring(txq, tx_pkts, n);
+		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) <<
+					  CI_TXD_QW1_CMD_S);
+		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
+		txq->tx_tail = 0;
+	}
+
+	/* Fill hardware descriptor ring with mbuf data */
+	ci_tx_fill_hw_ring(txq, tx_pkts + n, (uint16_t)(nb_pkts - n));
+	txq->tx_tail = (uint16_t)(txq->tx_tail + (nb_pkts - n));
+
+	/* Determine if RS bit needs to be set */
+	if (txq->tx_tail > txq->tx_next_rs) {
+		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) <<
+					  CI_TXD_QW1_CMD_S);
+		txq->tx_next_rs =
+			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
+		if (txq->tx_next_rs >= txq->nb_tx_desc)
+			txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
+	}
+
+	if (txq->tx_tail >= txq->nb_tx_desc)
+		txq->tx_tail = 0;
+
+	/* Update the tx tail register */
+	rte_write32_wc((uint32_t)txq->tx_tail, txq->qtx_tail);
+
+	return nb_pkts;
+}
+
+static __rte_always_inline uint16_t
+ci_xmit_pkts_simple(struct ci_tx_queue *txq,
+		     struct rte_mbuf **tx_pkts,
+		     uint16_t nb_pkts)
+{
+	uint16_t nb_tx = 0;
+
+	if (likely(nb_pkts <= CI_TX_MAX_BURST))
+		return ci_xmit_burst_simple(txq, tx_pkts, nb_pkts);
+
+	while (nb_pkts) {
+		uint16_t ret, num = RTE_MIN(nb_pkts, CI_TX_MAX_BURST);
+
+		ret = ci_xmit_burst_simple(txq, &tx_pkts[nb_tx], num);
+		nb_tx += ret;
+		nb_pkts -= ret;
+		if (ret < num)
+			break;
+	}
+
+	return nb_tx;
+}
+
 /*
  * Common transmit descriptor cleanup function for Intel drivers.
  *
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 6b8d9fd70e..bedc78b9ff 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -1010,84 +1010,12 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			get_context_desc, NULL, NULL);
 }
 
-static inline uint16_t
-tx_xmit_pkts(struct ci_tx_queue *txq,
-	     struct rte_mbuf **tx_pkts,
-	     uint16_t nb_pkts)
-{
-	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
-	uint16_t n = 0;
-
-	/**
-	 * Begin scanning the H/W ring for done descriptors when the number
-	 * of available descriptors drops below tx_free_thresh. For each done
-	 * descriptor, free the associated buffer.
-	 */
-	if (txq->nb_tx_free < txq->tx_free_thresh)
-		ci_tx_free_bufs(txq);
-
-	/* Use available descriptor only */
-	nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
-	if (unlikely(!nb_pkts))
-		return 0;
-
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
-	if ((txq->tx_tail + nb_pkts) > txq->nb_tx_desc) {
-		n = (uint16_t)(txq->nb_tx_desc - txq->tx_tail);
-		ci_tx_fill_hw_ring(txq, tx_pkts, n);
-		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
-		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
-		txq->tx_tail = 0;
-	}
-
-	/* Fill hardware descriptor ring with mbuf data */
-	ci_tx_fill_hw_ring(txq, tx_pkts + n, (uint16_t)(nb_pkts - n));
-	txq->tx_tail = (uint16_t)(txq->tx_tail + (nb_pkts - n));
-
-	/* Determine if RS bit needs to be set */
-	if (txq->tx_tail > txq->tx_next_rs) {
-		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
-		txq->tx_next_rs =
-			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
-		if (txq->tx_next_rs >= txq->nb_tx_desc)
-			txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
-	}
-
-	if (txq->tx_tail >= txq->nb_tx_desc)
-		txq->tx_tail = 0;
-
-	/* Update the tx tail register */
-	I40E_PCI_REG_WC_WRITE(txq->qtx_tail, txq->tx_tail);
-
-	return nb_pkts;
-}
-
 static uint16_t
 i40e_xmit_pkts_simple(void *tx_queue,
 		      struct rte_mbuf **tx_pkts,
 		      uint16_t nb_pkts)
 {
-	uint16_t nb_tx = 0;
-
-	if (likely(nb_pkts <= CI_TX_MAX_BURST))
-		return tx_xmit_pkts((struct ci_tx_queue *)tx_queue,
-						tx_pkts, nb_pkts);
-
-	while (nb_pkts) {
-		uint16_t ret, num = (uint16_t)RTE_MIN(nb_pkts,
-						CI_TX_MAX_BURST);
-
-		ret = tx_xmit_pkts((struct ci_tx_queue *)tx_queue,
-						&tx_pkts[nb_tx], num);
-		nb_tx = (uint16_t)(nb_tx + ret);
-		nb_pkts = (uint16_t)(nb_pkts - ret);
-		if (ret < num)
-			break;
-	}
-
-	return nb_tx;
+	return ci_xmit_pkts_simple(tx_queue, tx_pkts, nb_pkts);
 }
 
 #ifndef RTE_ARCH_X86
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index a3a94033bf..2b82a16422 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3245,84 +3245,12 @@ ice_tx_done_cleanup(void *txq, uint32_t free_cnt)
 		return ice_tx_done_cleanup_full(q, free_cnt);
 }
 
-static inline uint16_t
-tx_xmit_pkts(struct ci_tx_queue *txq,
-	     struct rte_mbuf **tx_pkts,
-	     uint16_t nb_pkts)
-{
-	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
-	uint16_t n = 0;
-
-	/**
-	 * Begin scanning the H/W ring for done descriptors when the number
-	 * of available descriptors drops below tx_free_thresh. For each done
-	 * descriptor, free the associated buffer.
-	 */
-	if (txq->nb_tx_free < txq->tx_free_thresh)
-		ci_tx_free_bufs(txq);
-
-	/* Use available descriptor only */
-	nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
-	if (unlikely(!nb_pkts))
-		return 0;
-
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
-	if ((txq->tx_tail + nb_pkts) > txq->nb_tx_desc) {
-		n = (uint16_t)(txq->nb_tx_desc - txq->tx_tail);
-		ci_tx_fill_hw_ring(txq, tx_pkts, n);
-		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
-		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
-		txq->tx_tail = 0;
-	}
-
-	/* Fill hardware descriptor ring with mbuf data */
-	ci_tx_fill_hw_ring(txq, tx_pkts + n, (uint16_t)(nb_pkts - n));
-	txq->tx_tail = (uint16_t)(txq->tx_tail + (nb_pkts - n));
-
-	/* Determine if RS bit needs to be set */
-	if (txq->tx_tail > txq->tx_next_rs) {
-		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
-		txq->tx_next_rs =
-			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
-		if (txq->tx_next_rs >= txq->nb_tx_desc)
-			txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
-	}
-
-	if (txq->tx_tail >= txq->nb_tx_desc)
-		txq->tx_tail = 0;
-
-	/* Update the tx tail register */
-	ICE_PCI_REG_WC_WRITE(txq->qtx_tail, txq->tx_tail);
-
-	return nb_pkts;
-}
-
 static uint16_t
 ice_xmit_pkts_simple(void *tx_queue,
 		     struct rte_mbuf **tx_pkts,
 		     uint16_t nb_pkts)
 {
-	uint16_t nb_tx = 0;
-
-	if (likely(nb_pkts <= CI_TX_MAX_BURST))
-		return tx_xmit_pkts((struct ci_tx_queue *)tx_queue,
-				    tx_pkts, nb_pkts);
-
-	while (nb_pkts) {
-		uint16_t ret, num = (uint16_t)RTE_MIN(nb_pkts,
-						      CI_TX_MAX_BURST);
-
-		ret = tx_xmit_pkts((struct ci_tx_queue *)tx_queue,
-				   &tx_pkts[nb_tx], num);
-		nb_tx = (uint16_t)(nb_tx + ret);
-		nb_pkts = (uint16_t)(nb_pkts - ret);
-		if (ret < num)
-			break;
-	}
-
-	return nb_tx;
+	return ci_xmit_pkts_simple(tx_queue, tx_pkts, nb_pkts);
 }
 
 static const struct ci_rx_path_info ice_rx_path_infos[] = {
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v4 31/35] net/intel: use non-volatile stores in simple Tx function
  2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (29 preceding siblings ...)
  2026-02-09 16:45   ` [PATCH v4 30/35] net/intel: complete merging simple Tx paths Bruce Richardson
@ 2026-02-09 16:45   ` Bruce Richardson
  2026-02-09 23:19     ` Medvedkin, Vladimir
  2026-02-09 16:45   ` [PATCH v4 32/35] net/intel: align scalar simple Tx path with vector logic Bruce Richardson
                     ` (3 subsequent siblings)
  34 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 16:45 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

The simple Tx code path can be reworked to use non-volatile stores - as
is the case with the full-featured Tx path - by reusing the existing
write_txd function (which just needs to be moved up in the header file).
This gives a small performance boost.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar.h | 55 ++++++++--------------------
 1 file changed, 16 insertions(+), 39 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index 4ba97303cb..918cd806a4 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -12,35 +12,13 @@
 /* depends on common Tx definitions. */
 #include "tx.h"
 
-/* Populate 4 descriptors with data from 4 mbufs */
 static inline void
-ci_tx_fill_hw_ring_tx4(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
+write_txd(volatile void *txd, uint64_t qw0, uint64_t qw1)
 {
-	uint64_t dma_addr;
-	uint32_t i;
-
-	for (i = 0; i < 4; i++, txdp++, pkts++) {
-		dma_addr = rte_mbuf_data_iova(*pkts);
-		txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
-		txdp->cmd_type_offset_bsz =
-			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
-				((uint64_t)CI_TX_DESC_CMD_DEFAULT << CI_TXD_QW1_CMD_S) |
-				((uint64_t)(*pkts)->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
-	}
-}
+	uint64_t *txd_qw =  __rte_assume_aligned(RTE_CAST_PTR(void *, txd), 16);
 
-/* Populate 1 descriptor with data from 1 mbuf */
-static inline void
-ci_tx_fill_hw_ring_tx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
-{
-	uint64_t dma_addr;
-
-	dma_addr = rte_mbuf_data_iova(*pkts);
-	txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
-	txdp->cmd_type_offset_bsz =
-		rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
-			((uint64_t)CI_TX_DESC_CMD_DEFAULT << CI_TXD_QW1_CMD_S) |
-			((uint64_t)(*pkts)->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
+	txd_qw[0] = rte_cpu_to_le_64(qw0);
+	txd_qw[1] = rte_cpu_to_le_64(qw1);
 }
 
 /* Fill hardware descriptor ring with mbuf data */
@@ -60,14 +38,22 @@ ci_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
 	for (i = 0; i < mainpart; i += N_PER_LOOP) {
 		for (j = 0; j < N_PER_LOOP; ++j)
 			(txep + i + j)->mbuf = *(pkts + i + j);
-		ci_tx_fill_hw_ring_tx4(txdp + i, pkts + i);
+		for (j = 0; j < N_PER_LOOP; ++j)
+			write_txd(txdp + i + j, rte_mbuf_data_iova(*(pkts + i + j)),
+				CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)CI_TX_DESC_CMD_DEFAULT << CI_TXD_QW1_CMD_S) |
+				((uint64_t)(*(pkts + i + j))->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 	}
 
 	if (unlikely(leftover > 0)) {
 		for (i = 0; i < leftover; ++i) {
-			(txep + mainpart + i)->mbuf = *(pkts + mainpart + i);
-			ci_tx_fill_hw_ring_tx1(txdp + mainpart + i,
-					       pkts + mainpart + i);
+			uint16_t idx = mainpart + i;
+			(txep + idx)->mbuf = *(pkts + idx);
+			write_txd(txdp + idx, rte_mbuf_data_iova(*(pkts + idx)),
+				CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)CI_TX_DESC_CMD_DEFAULT << CI_TXD_QW1_CMD_S) |
+				((uint64_t)(*(pkts + idx))->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
+
 		}
 	}
 }
@@ -364,15 +350,6 @@ struct ci_timestamp_queue_fns {
 	write_ts_tail_t write_ts_tail;
 };
 
-static inline void
-write_txd(volatile void *txd, uint64_t qw0, uint64_t qw1)
-{
-	uint64_t *txd_qw = __rte_assume_aligned(RTE_CAST_PTR(void *, txd), 16);
-
-	txd_qw[0] = rte_cpu_to_le_64(qw0);
-	txd_qw[1] = rte_cpu_to_le_64(qw1);
-}
-
 static inline uint16_t
 ci_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v4 32/35] net/intel: align scalar simple Tx path with vector logic
  2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (30 preceding siblings ...)
  2026-02-09 16:45   ` [PATCH v4 31/35] net/intel: use non-volatile stores in simple Tx function Bruce Richardson
@ 2026-02-09 16:45   ` Bruce Richardson
  2026-02-09 23:19     ` Medvedkin, Vladimir
  2026-02-09 16:45   ` [PATCH v4 33/35] net/intel: use vector SW ring entry for simple path Bruce Richardson
                     ` (2 subsequent siblings)
  34 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 16:45 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

The scalar simple Tx path has the same restrictions as the vector Tx
path, so we can use the same logic flow in both, to try and ensure we
get best performance from the scalar path.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar.h | 54 +++++++++++++++++-----------
 1 file changed, 34 insertions(+), 20 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index 918cd806a4..2405d6a2f0 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -21,13 +21,11 @@ write_txd(volatile void *txd, uint64_t qw0, uint64_t qw1)
 	txd_qw[1] = rte_cpu_to_le_64(qw1);
 }
 
-/* Fill hardware descriptor ring with mbuf data */
+/* Fill hardware descriptor ring with mbuf data (simple path) */
 static inline void
-ci_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
-		   uint16_t nb_pkts)
+ci_tx_fill_hw_ring_simple(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts,
+			  uint16_t nb_pkts)
 {
-	volatile struct ci_tx_desc *txdp = &txq->ci_tx_ring[txq->tx_tail];
-	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
 	const int N_PER_LOOP = 4;
 	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
 	int mainpart, leftover;
@@ -36,8 +34,6 @@ ci_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
 	mainpart = nb_pkts & ((uint32_t)~N_PER_LOOP_MASK);
 	leftover = nb_pkts & ((uint32_t)N_PER_LOOP_MASK);
 	for (i = 0; i < mainpart; i += N_PER_LOOP) {
-		for (j = 0; j < N_PER_LOOP; ++j)
-			(txep + i + j)->mbuf = *(pkts + i + j);
 		for (j = 0; j < N_PER_LOOP; ++j)
 			write_txd(txdp + i + j, rte_mbuf_data_iova(*(pkts + i + j)),
 				CI_TX_DESC_DTYPE_DATA |
@@ -48,12 +44,10 @@ ci_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
 	if (unlikely(leftover > 0)) {
 		for (i = 0; i < leftover; ++i) {
 			uint16_t idx = mainpart + i;
-			(txep + idx)->mbuf = *(pkts + idx);
 			write_txd(txdp + idx, rte_mbuf_data_iova(*(pkts + idx)),
 				CI_TX_DESC_DTYPE_DATA |
 				((uint64_t)CI_TX_DESC_CMD_DEFAULT << CI_TXD_QW1_CMD_S) |
 				((uint64_t)(*(pkts + idx))->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
-
 		}
 	}
 }
@@ -130,6 +124,9 @@ ci_xmit_burst_simple(struct ci_tx_queue *txq,
 		     uint16_t nb_pkts)
 {
 	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
+	volatile struct ci_tx_desc *txdp;
+	struct ci_tx_entry *txep;
+	uint16_t tx_id;
 	uint16_t n = 0;
 
 	/**
@@ -145,23 +142,41 @@ ci_xmit_burst_simple(struct ci_tx_queue *txq,
 	if (unlikely(!nb_pkts))
 		return 0;
 
+	tx_id = txq->tx_tail;
+	txdp = &txr[tx_id];
+	txep = &txq->sw_ring[tx_id];
+
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
-	if ((txq->tx_tail + nb_pkts) > txq->nb_tx_desc) {
-		n = (uint16_t)(txq->nb_tx_desc - txq->tx_tail);
-		ci_tx_fill_hw_ring(txq, tx_pkts, n);
+
+	if ((tx_id + nb_pkts) > txq->nb_tx_desc) {
+		n = (uint16_t)(txq->nb_tx_desc - tx_id);
+
+		/* Store mbufs in backlog */
+		ci_tx_backlog_entry(txep, tx_pkts, n);
+
+		/* Write descriptors to HW ring */
+		ci_tx_fill_hw_ring_simple(txdp, tx_pkts, n);
+
 		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) <<
 					  CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
-		txq->tx_tail = 0;
+
+		tx_id = 0;
+		txdp = &txr[tx_id];
+		txep = &txq->sw_ring[tx_id];
 	}
 
-	/* Fill hardware descriptor ring with mbuf data */
-	ci_tx_fill_hw_ring(txq, tx_pkts + n, (uint16_t)(nb_pkts - n));
-	txq->tx_tail = (uint16_t)(txq->tx_tail + (nb_pkts - n));
+	/* Store remaining mbufs in backlog */
+	ci_tx_backlog_entry(txep, tx_pkts + n, (uint16_t)(nb_pkts - n));
+
+	/* Write remaining descriptors to HW ring */
+	ci_tx_fill_hw_ring_simple(txdp, tx_pkts + n, (uint16_t)(nb_pkts - n));
+
+	tx_id = (uint16_t)(tx_id + (nb_pkts - n));
 
 	/* Determine if RS bit needs to be set */
-	if (txq->tx_tail > txq->tx_next_rs) {
+	if (tx_id > txq->tx_next_rs) {
 		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) <<
 					  CI_TXD_QW1_CMD_S);
@@ -171,11 +186,10 @@ ci_xmit_burst_simple(struct ci_tx_queue *txq,
 			txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 	}
 
-	if (txq->tx_tail >= txq->nb_tx_desc)
-		txq->tx_tail = 0;
+	txq->tx_tail = tx_id;
 
 	/* Update the tx tail register */
-	rte_write32_wc((uint32_t)txq->tx_tail, txq->qtx_tail);
+	rte_write32_wc((uint32_t)tx_id, txq->qtx_tail);
 
 	return nb_pkts;
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v4 33/35] net/intel: use vector SW ring entry for simple path
  2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (31 preceding siblings ...)
  2026-02-09 16:45   ` [PATCH v4 32/35] net/intel: align scalar simple Tx path with vector logic Bruce Richardson
@ 2026-02-09 16:45   ` Bruce Richardson
  2026-02-09 23:19     ` Medvedkin, Vladimir
  2026-02-09 16:45   ` [PATCH v4 34/35] net/intel: use vector mbuf cleanup from simple scalar path Bruce Richardson
  2026-02-09 16:45   ` [PATCH v4 35/35] net/idpf: enable simple Tx function Bruce Richardson
  34 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 16:45 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Praveen Shetty, Vladimir Medvedkin,
	Anatoly Burakov, Jingjing Wu

The simple scalar Tx path does not need to use the full sw_entry
structure that the full Tx path uses, so rename the flag for "vector_tx"
to instead be "use_vec_entry" since its sole purpose is to flag the use
of the smaller tx_entry_vec structure. Then set this flag for the simple
Tx path, giving us a perf boost.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h                    |  6 ++++--
 drivers/net/intel/common/tx_scalar.h             | 14 +++++++-------
 drivers/net/intel/cpfl/cpfl_rxtx.c               |  4 ++--
 drivers/net/intel/i40e/i40e_rxtx.c               |  2 +-
 drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c      |  2 +-
 drivers/net/intel/ice/ice_rxtx.c                 |  2 +-
 drivers/net/intel/idpf/idpf_common_rxtx_avx512.c |  2 +-
 drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c  |  2 +-
 8 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 67378a0803..62d919d338 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -166,7 +166,7 @@ struct ci_tx_queue {
 	rte_iova_t tx_ring_dma;        /* TX ring DMA address */
 	bool tx_deferred_start; /* don't start this queue in dev start */
 	bool q_set;             /* indicate if tx queue has been configured */
-	bool vector_tx;         /* port is using vector TX */
+	bool use_vec_entry;     /* use sw_ring_vec (true for vector and simple paths) */
 	union {                  /* the VSI this queue belongs to */
 		struct i40e_vsi *i40e_vsi;
 		struct iavf_vsi *iavf_vsi;
@@ -354,7 +354,8 @@ ci_txq_release_all_mbufs(struct ci_tx_queue *txq, bool use_ctx)
 	if (unlikely(!txq || !txq->sw_ring))
 		return;
 
-	if (!txq->vector_tx) {
+	if (!txq->use_vec_entry) {
+		/* Regular scalar path uses sw_ring with ci_tx_entry */
 		for (uint16_t i = 0; i < txq->nb_tx_desc; i++) {
 			if (txq->sw_ring[i].mbuf != NULL) {
 				rte_pktmbuf_free_seg(txq->sw_ring[i].mbuf);
@@ -365,6 +366,7 @@ ci_txq_release_all_mbufs(struct ci_tx_queue *txq, bool use_ctx)
 	}
 
 	/**
+	 *  Vector and simple paths use sw_ring_vec (ci_tx_entry_vec).
 	 *  vPMD tx will not set sw_ring's mbuf to NULL after free,
 	 *  so determining buffers to free is a little more complex.
 	 */
diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index 2405d6a2f0..02c60cdaff 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -60,14 +60,14 @@ ci_tx_free_bufs(struct ci_tx_queue *txq)
 	const uint16_t k = RTE_ALIGN_FLOOR(rs_thresh, CI_TX_MAX_FREE_BUF_SZ);
 	const uint16_t m = rs_thresh % CI_TX_MAX_FREE_BUF_SZ;
 	struct rte_mbuf *free[CI_TX_MAX_FREE_BUF_SZ];
-	struct ci_tx_entry *txep;
+	struct ci_tx_entry_vec *txep;
 
 	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
 			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
 			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
 		return 0;
 
-	txep = &txq->sw_ring[txq->tx_next_dd - (rs_thresh - 1)];
+	txep = &txq->sw_ring_vec[txq->tx_next_dd - (rs_thresh - 1)];
 
 	struct rte_mempool *fast_free_mp =
 			likely(txq->fast_free_mp != (void *)UINTPTR_MAX) ?
@@ -125,7 +125,7 @@ ci_xmit_burst_simple(struct ci_tx_queue *txq,
 {
 	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
 	volatile struct ci_tx_desc *txdp;
-	struct ci_tx_entry *txep;
+	struct ci_tx_entry_vec *txep;
 	uint16_t tx_id;
 	uint16_t n = 0;
 
@@ -144,7 +144,7 @@ ci_xmit_burst_simple(struct ci_tx_queue *txq,
 
 	tx_id = txq->tx_tail;
 	txdp = &txr[tx_id];
-	txep = &txq->sw_ring[tx_id];
+	txep = &txq->sw_ring_vec[tx_id];
 
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
 
@@ -152,7 +152,7 @@ ci_xmit_burst_simple(struct ci_tx_queue *txq,
 		n = (uint16_t)(txq->nb_tx_desc - tx_id);
 
 		/* Store mbufs in backlog */
-		ci_tx_backlog_entry(txep, tx_pkts, n);
+		ci_tx_backlog_entry_vec(txep, tx_pkts, n);
 
 		/* Write descriptors to HW ring */
 		ci_tx_fill_hw_ring_simple(txdp, tx_pkts, n);
@@ -164,11 +164,11 @@ ci_xmit_burst_simple(struct ci_tx_queue *txq,
 
 		tx_id = 0;
 		txdp = &txr[tx_id];
-		txep = &txq->sw_ring[tx_id];
+		txep = &txq->sw_ring_vec[tx_id];
 	}
 
 	/* Store remaining mbufs in backlog */
-	ci_tx_backlog_entry(txep, tx_pkts + n, (uint16_t)(nb_pkts - n));
+	ci_tx_backlog_entry_vec(txep, tx_pkts + n, (uint16_t)(nb_pkts - n));
 
 	/* Write remaining descriptors to HW ring */
 	ci_tx_fill_hw_ring_simple(txdp, tx_pkts + n, (uint16_t)(nb_pkts - n));
diff --git a/drivers/net/intel/cpfl/cpfl_rxtx.c b/drivers/net/intel/cpfl/cpfl_rxtx.c
index e7a98ed4f6..b5b9015310 100644
--- a/drivers/net/intel/cpfl/cpfl_rxtx.c
+++ b/drivers/net/intel/cpfl/cpfl_rxtx.c
@@ -329,7 +329,7 @@ cpfl_tx_queue_release(void *txq)
 		rte_free(q->complq);
 	}
 
-	ci_txq_release_all_mbufs(q, q->vector_tx);
+	ci_txq_release_all_mbufs(q, q->use_vec_entry);
 	rte_free(q->sw_ring);
 	rte_free(q->rs_last_id);
 	rte_memzone_free(q->mz);
@@ -1364,7 +1364,7 @@ cpfl_tx_queue_stop(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 	}
 
 	txq = &cpfl_txq->base;
-	ci_txq_release_all_mbufs(txq, txq->vector_tx);
+	ci_txq_release_all_mbufs(txq, txq->use_vec_entry);
 	if (vport->txq_model == VIRTCHNL2_QUEUE_MODEL_SINGLE) {
 		idpf_qc_single_tx_queue_reset(txq);
 	} else {
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index bedc78b9ff..155eec210e 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -1451,7 +1451,7 @@ i40e_dev_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 		PMD_DRV_LOG(WARNING, "TX queue %u is deferred start",
 			    tx_queue_id);
 
-	txq->vector_tx = ad->tx_vec_allowed;
+	txq->use_vec_entry = ad->tx_vec_allowed || ad->tx_simple_allowed;
 
 	/*
 	 * tx_queue_id is queue id application refers to, while
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
index cea4ee9863..374c713a94 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
@@ -1803,7 +1803,7 @@ iavf_xmit_pkts_vec_avx2_offload(void *tx_queue, struct rte_mbuf **tx_pkts,
 int __rte_cold
 iavf_txq_vec_setup(struct ci_tx_queue *txq)
 {
-	txq->vector_tx = true;
+	txq->use_vec_entry = true;
 	return 0;
 }
 
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 2b82a16422..0fc7237234 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -882,7 +882,7 @@ ice_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 		}
 
 	/* record what kind of descriptor cleanup we need on teardown */
-	txq->vector_tx = ad->tx_vec_allowed;
+	txq->use_vec_entry = ad->tx_vec_allowed || ad->tx_simple_allowed;
 
 	if (txq->tsq != NULL && txq->tsq->ts_flag > 0) {
 		struct ice_aqc_set_txtime_qgrp *ts_elem;
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
index 49ace35615..666ad1a4dd 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
@@ -1365,6 +1365,6 @@ idpf_qc_tx_vec_avx512_setup(struct ci_tx_queue *txq)
 	if (!txq)
 		return 0;
 
-	txq->vector_tx = true;
+	txq->use_vec_entry = true;
 	return 0;
 }
diff --git a/drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c b/drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c
index 63c7cb50d3..c42b8fc96b 100644
--- a/drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c
+++ b/drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c
@@ -111,7 +111,7 @@ ixgbe_txq_vec_setup(struct ci_tx_queue *txq)
 	/* leave the first one for overflow */
 	txq->sw_ring_vec = txq->sw_ring_vec + 1;
 	txq->ops = &vec_txq_ops;
-	txq->vector_tx = 1;
+	txq->use_vec_entry = true;
 
 	return 0;
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v4 34/35] net/intel: use vector mbuf cleanup from simple scalar path
  2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (32 preceding siblings ...)
  2026-02-09 16:45   ` [PATCH v4 33/35] net/intel: use vector SW ring entry for simple path Bruce Richardson
@ 2026-02-09 16:45   ` Bruce Richardson
  2026-02-09 23:19     ` Medvedkin, Vladimir
  2026-02-09 16:45   ` [PATCH v4 35/35] net/idpf: enable simple Tx function Bruce Richardson
  34 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 16:45 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

Since the simple scalar path now uses the vector Tx entry struct, we can
leverage the vector mbuf cleanup function from that path and avoid
having a separate cleanup function for it.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar.h | 74 ++++++----------------------
 drivers/net/intel/i40e/i40e_rxtx.c   |  2 +-
 drivers/net/intel/ice/ice_rxtx.c     |  2 +-
 3 files changed, 17 insertions(+), 61 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index 02c60cdaff..b6297437aa 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -21,6 +21,20 @@ write_txd(volatile void *txd, uint64_t qw0, uint64_t qw1)
 	txd_qw[1] = rte_cpu_to_le_64(qw1);
 }
 
+static __rte_always_inline int
+ci_tx_desc_done_simple(struct ci_tx_queue *txq, uint16_t idx)
+{
+	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz & rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
+}
+
+/* Free transmitted mbufs using vector-style cleanup */
+static __rte_always_inline int
+ci_tx_free_bufs_simple(struct ci_tx_queue *txq)
+{
+	return ci_tx_free_bufs_vec(txq, ci_tx_desc_done_simple, false);
+}
+
 /* Fill hardware descriptor ring with mbuf data (simple path) */
 static inline void
 ci_tx_fill_hw_ring_simple(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts,
@@ -52,64 +66,6 @@ ci_tx_fill_hw_ring_simple(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pk
 	}
 }
 
-/* Free transmitted mbufs from descriptor ring with bulk freeing for Tx simple path */
-static __rte_always_inline int
-ci_tx_free_bufs(struct ci_tx_queue *txq)
-{
-	const uint16_t rs_thresh = txq->tx_rs_thresh;
-	const uint16_t k = RTE_ALIGN_FLOOR(rs_thresh, CI_TX_MAX_FREE_BUF_SZ);
-	const uint16_t m = rs_thresh % CI_TX_MAX_FREE_BUF_SZ;
-	struct rte_mbuf *free[CI_TX_MAX_FREE_BUF_SZ];
-	struct ci_tx_entry_vec *txep;
-
-	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
-			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
-		return 0;
-
-	txep = &txq->sw_ring_vec[txq->tx_next_dd - (rs_thresh - 1)];
-
-	struct rte_mempool *fast_free_mp =
-			likely(txq->fast_free_mp != (void *)UINTPTR_MAX) ?
-				txq->fast_free_mp :
-				(txq->fast_free_mp = txep[0].mbuf->pool);
-
-	if (fast_free_mp) {
-		if (k) {
-			for (uint16_t j = 0; j != k; j += CI_TX_MAX_FREE_BUF_SZ) {
-				for (uint16_t i = 0; i < CI_TX_MAX_FREE_BUF_SZ; ++i, ++txep) {
-					free[i] = txep->mbuf;
-					txep->mbuf = NULL;
-				}
-				rte_mbuf_raw_free_bulk(fast_free_mp, free, CI_TX_MAX_FREE_BUF_SZ);
-			}
-		}
-
-		if (m) {
-			for (uint16_t i = 0; i < m; ++i, ++txep) {
-				free[i] = txep->mbuf;
-				txep->mbuf = NULL;
-			}
-			rte_mbuf_raw_free_bulk(fast_free_mp, free, m);
-		}
-	} else {
-		for (uint16_t i = 0; i < rs_thresh; ++i, ++txep)
-			rte_prefetch0((txep + i)->mbuf);
-
-		for (uint16_t i = 0; i < rs_thresh; ++i, ++txep) {
-			rte_pktmbuf_free_seg(txep->mbuf);
-			txep->mbuf = NULL;
-		}
-	}
-
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + rs_thresh);
-	txq->tx_next_dd = (uint16_t)(txq->tx_next_dd + rs_thresh);
-	if (txq->tx_next_dd >= txq->nb_tx_desc)
-		txq->tx_next_dd = (uint16_t)(rs_thresh - 1);
-
-	return rs_thresh;
-}
-
 /* Simple burst transmit for descriptor-based simple Tx path
  *
  * Transmits a burst of packets by filling hardware descriptors with mbuf
@@ -135,7 +91,7 @@ ci_xmit_burst_simple(struct ci_tx_queue *txq,
 	 * descriptor, free the associated buffer.
 	 */
 	if (txq->nb_tx_free < txq->tx_free_thresh)
-		ci_tx_free_bufs(txq);
+		ci_tx_free_bufs_simple(txq);
 
 	/* Use available descriptor only */
 	nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 155eec210e..ffb303158b 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -2377,7 +2377,7 @@ i40e_tx_done_cleanup_simple(struct ci_tx_queue *txq,
 		if (txq->nb_tx_desc - txq->nb_tx_free < txq->tx_rs_thresh)
 			break;
 
-		n = ci_tx_free_bufs(txq);
+		n = ci_tx_free_bufs_simple(txq);
 
 		if (n == 0)
 			break;
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 0fc7237234..321415d839 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3218,7 +3218,7 @@ ice_tx_done_cleanup_simple(struct ci_tx_queue *txq,
 		if (txq->nb_tx_desc - txq->nb_tx_free < txq->tx_rs_thresh)
 			break;
 
-		n = ci_tx_free_bufs(txq);
+		n = ci_tx_free_bufs_simple(txq);
 
 		if (n == 0)
 			break;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v4 35/35] net/idpf: enable simple Tx function
  2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (33 preceding siblings ...)
  2026-02-09 16:45   ` [PATCH v4 34/35] net/intel: use vector mbuf cleanup from simple scalar path Bruce Richardson
@ 2026-02-09 16:45   ` Bruce Richardson
  2026-02-09 23:20     ` Medvedkin, Vladimir
  34 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-09 16:45 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Jingjing Wu, Praveen Shetty

The common "simple Tx" function - in some ways a scalar version of the
vector Tx functions - can be used by the idpf driver as well as i40e and
ice, so add support for it to the driver.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/idpf/idpf_common_device.h |  2 ++
 drivers/net/intel/idpf/idpf_common_rxtx.c   | 19 +++++++++++++++
 drivers/net/intel/idpf/idpf_common_rxtx.h   |  3 +++
 drivers/net/intel/idpf/idpf_rxtx.c          | 26 ++++++++++++++++++++-
 4 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/drivers/net/intel/idpf/idpf_common_device.h b/drivers/net/intel/idpf/idpf_common_device.h
index 31915a03d4..527aa9b3dc 100644
--- a/drivers/net/intel/idpf/idpf_common_device.h
+++ b/drivers/net/intel/idpf/idpf_common_device.h
@@ -78,6 +78,7 @@ enum idpf_rx_func_type {
 enum idpf_tx_func_type {
 	IDPF_TX_DEFAULT,
 	IDPF_TX_SINGLEQ,
+	IDPF_TX_SINGLEQ_SIMPLE,
 	IDPF_TX_SINGLEQ_AVX2,
 	IDPF_TX_AVX512,
 	IDPF_TX_SINGLEQ_AVX512,
@@ -100,6 +101,7 @@ struct idpf_adapter {
 
 	bool is_tx_singleq; /* true - single queue model, false - split queue model */
 	bool is_rx_singleq; /* true - single queue model, false - split queue model */
+	bool tx_simple_allowed; /* true if all queues support simple TX */
 
 	/* For timestamp */
 	uint64_t time_hw;
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index bd77113551..0da2506bf0 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -1347,6 +1347,15 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			idpf_set_tso_ctx, NULL, NULL);
 }
 
+RTE_EXPORT_INTERNAL_SYMBOL(idpf_dp_singleq_xmit_pkts_simple)
+uint16_t
+idpf_dp_singleq_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
+				   uint16_t nb_pkts)
+{
+	return ci_xmit_pkts_simple(tx_queue, tx_pkts, nb_pkts);
+}
+
+
 /* TX prep functions */
 RTE_EXPORT_INTERNAL_SYMBOL(idpf_dp_prep_pkts)
 uint16_t
@@ -1532,6 +1541,16 @@ const struct ci_tx_path_info idpf_tx_path_infos[] = {
 			.single_queue = true
 		}
 	},
+	[IDPF_TX_SINGLEQ_SIMPLE] = {
+		.pkt_burst = idpf_dp_singleq_xmit_pkts_simple,
+		.info = "Single Queue Scalar Simple",
+		.features = {
+			.tx_offloads = IDPF_TX_VECTOR_OFFLOADS,
+			.single_queue = true,
+			.simple_tx = true,
+		}
+	},
+
 #ifdef RTE_ARCH_X86
 	[IDPF_TX_SINGLEQ_AVX2] = {
 		.pkt_burst = idpf_dp_singleq_xmit_pkts_avx2,
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.h b/drivers/net/intel/idpf/idpf_common_rxtx.h
index fe7094d434..914cab0f25 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.h
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.h
@@ -221,6 +221,9 @@ __rte_internal
 uint16_t idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 				   uint16_t nb_pkts);
 __rte_internal
+uint16_t idpf_dp_singleq_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
+				   uint16_t nb_pkts);
+__rte_internal
 uint16_t idpf_dp_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			   uint16_t nb_pkts);
 __rte_internal
diff --git a/drivers/net/intel/idpf/idpf_rxtx.c b/drivers/net/intel/idpf/idpf_rxtx.c
index 9420200f6d..f2e202d57d 100644
--- a/drivers/net/intel/idpf/idpf_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_rxtx.c
@@ -509,6 +509,22 @@ idpf_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 	txq->q_set = true;
 	dev->data->tx_queues[queue_idx] = txq;
 
+	/* Set tx_simple_allowed flag based on queue configuration.
+	 * For queue 0: explicitly set the flag based on its configuration.
+	 * For other queues: only set to false if this queue cannot use simple_tx.
+	 */
+	if (vport->txq_model == VIRTCHNL2_QUEUE_MODEL_SPLIT)
+		goto out;
+
+	/* for first queue, default to true, disable later if any queue can't meet conditions */
+	if (queue_idx == 0)
+		adapter->tx_simple_allowed = true;
+
+	if ((txq->offloads != (txq->offloads & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE)) ||
+			txq->tx_rs_thresh < IDPF_VPMD_TX_MAX_BURST)
+		adapter->tx_simple_allowed = false;
+
+out:
 	return 0;
 
 err_complq_setup:
@@ -651,6 +667,7 @@ int
 idpf_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 {
 	struct idpf_vport *vport = dev->data->dev_private;
+	struct idpf_adapter *ad = vport->adapter;
 	struct ci_tx_queue *txq = dev->data->tx_queues[tx_queue_id];
 	int err = 0;
 
@@ -667,6 +684,12 @@ idpf_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 		return err;
 	}
 
+	/* Record what kind of descriptor cleanup we need on teardown.
+	 * For single queue mode, vector or simple tx paths use vec entry format.
+	 */
+	if (vport->txq_model == VIRTCHNL2_QUEUE_MODEL_SINGLE)
+		txq->use_vec_entry = ad->tx_simple_allowed;
+
 	/* Ready to switch the queue on */
 	err = idpf_vc_queue_switch(vport, tx_queue_id, false, true,
 							VIRTCHNL2_QUEUE_TYPE_TX);
@@ -847,7 +870,8 @@ idpf_set_tx_function(struct rte_eth_dev *dev)
 	struct ci_tx_path_features req_features = {
 		.tx_offloads = dev->data->dev_conf.txmode.offloads,
 		.simd_width = RTE_VECT_SIMD_DISABLED,
-		.single_queue = (vport->txq_model == VIRTCHNL2_QUEUE_MODEL_SINGLE)
+		.single_queue = (vport->txq_model == VIRTCHNL2_QUEUE_MODEL_SINGLE),
+		.simple_tx = ad->tx_simple_allowed
 	};
 
 	/* The primary process selects the tx path for all processes. */
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* RE: [PATCH v4 19/35] eal: add macro for marking assumed alignment
  2026-02-09 16:45   ` [PATCH v4 19/35] eal: add macro for marking assumed alignment Bruce Richardson
@ 2026-02-09 22:35     ` Morten Brørup
  2026-02-11 14:45       ` Bruce Richardson
  0 siblings, 1 reply; 274+ messages in thread
From: Morten Brørup @ 2026-02-09 22:35 UTC (permalink / raw)
  To: Bruce Richardson, dev

> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> Sent: Monday, 9 February 2026 17.45
> 
> Provide a common DPDK macro for the gcc/clang builtin
> __rte_assume_aligned to mark pointers as pointing to something with
> known minimum alignment.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>  lib/eal/include/rte_common.h | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/lib/eal/include/rte_common.h
> b/lib/eal/include/rte_common.h
> index 573bf4f2ce..51a2eaf8b4 100644
> --- a/lib/eal/include/rte_common.h
> +++ b/lib/eal/include/rte_common.h
> @@ -121,6 +121,12 @@ extern "C" {
>  #define __rte_aligned(a) __attribute__((__aligned__(a)))
>  #endif
> 
> +#ifdef RTE_TOOLCHAIN_MSVC
> +#define __rte_assume_aligned(ptr, align) (ptr)
> +#else
> +#define __rte_assume_aligned __builtin_assume_aligned
> +#endif

The GCC/Clang macro supports the optional 3rd parameter (offset), but the MSVC doesn't.
Maybe it's better to pass (ptr, align) to the GCC/Clang variant, so the API consistently only supports two parameters.

If the 3rd parameter ever becomes needed, it can be implemented as a new macro.

Also, a short description of the macro would be nice.


Did you look into using e.g. __rte_assume((ptr % 16) == 0) instead?
It's relevant if it has the desired effect for MSVC, which the macro in this patch doesn't.


^ permalink raw reply	[flat|nested] 274+ messages in thread

* RE: [PATCH v4 20/35] net/intel: write descriptors using non-volatile pointers
  2026-02-09 16:45   ` [PATCH v4 20/35] net/intel: write descriptors using non-volatile pointers Bruce Richardson
@ 2026-02-09 23:08     ` Morten Brørup
  2026-02-10  9:03       ` Bruce Richardson
  2026-02-11 14:44       ` Bruce Richardson
  0 siblings, 2 replies; 274+ messages in thread
From: Morten Brørup @ 2026-02-09 23:08 UTC (permalink / raw)
  To: Bruce Richardson, dev

> +static inline void
> +write_txd(volatile void *txd, uint64_t qw0, uint64_t qw1)
> +{
> +	uint64_t *txd_qw = __rte_assume_aligned(RTE_CAST_PTR(void *,
> txd), 16);
> +
> +	txd_qw[0] = rte_cpu_to_le_64(qw0);
> +	txd_qw[1] = rte_cpu_to_le_64(qw1);
> +}

How about using __rte_aligned() instead, something like this (untested):

struct __rte_aligned(16) txd_t {
	uint64_t	qw0;
	uint64_t	qw1;
};

*RTE_CAST_PTR(volatile struct txd_t *, txd) = {
	rte_cpu_to_le_64(qw0),
	rte_cpu_to_le_64(qw1)
};


And why strip the "volatile"?


^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 28/35] net/intel: merge ring writes in simple Tx for ice and i40e
  2026-02-09 16:45   ` [PATCH v4 28/35] net/intel: merge ring writes in simple Tx for ice and i40e Bruce Richardson
@ 2026-02-09 23:18     ` Medvedkin, Vladimir
  0 siblings, 0 replies; 274+ messages in thread
From: Medvedkin, Vladimir @ 2026-02-09 23:18 UTC (permalink / raw)
  To: Bruce Richardson, dev; +Cc: Anatoly Burakov

[-- Attachment #1: Type: text/plain, Size: 1736 bytes --]


On 2/9/2026 4:45 PM, Bruce Richardson wrote:
> The ice and i40e drivers have identical code for writing ring entries in
> the simple Tx path, so merge in the descriptor writing code.
>
> Signed-off-by: Bruce Richardson<bruce.richardson@intel.com>
> ---
>   drivers/net/intel/common/tx.h                 |  6 ++
>   drivers/net/intel/common/tx_scalar.h          | 60 ++++++++++++++
>   drivers/net/intel/i40e/i40e_rxtx.c            | 79 +------------------
>   drivers/net/intel/i40e/i40e_rxtx.h            |  3 -
>   .../net/intel/i40e/i40e_rxtx_vec_altivec.c    |  4 +-
>   drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c   |  4 +-
>   drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c |  4 +-
>   drivers/net/intel/i40e/i40e_rxtx_vec_neon.c   |  4 +-
>   drivers/net/intel/ice/ice_rxtx.c              | 69 +---------------
>   drivers/net/intel/ice/ice_rxtx.h              |  2 -
>   drivers/net/intel/ice/ice_rxtx_vec_avx2.c     |  4 +-
>   drivers/net/intel/ice/ice_rxtx_vec_avx512.c   |  4 +-
>   12 files changed, 86 insertions(+), 157 deletions(-)
>
> diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
> index 203938180b..ef6d543e7a 100644
> --- a/drivers/net/intel/common/tx.h
> +++ b/drivers/net/intel/common/tx.h
> @@ -63,6 +63,12 @@ enum ci_tx_l2tag1_field {
>   /* Common maximum data per TX descriptor */
>   #define CI_MAX_DATA_PER_TXD     (CI_TXD_QW1_TX_BUF_SZ_M >> CI_TXD_QW1_TX_BUF_SZ_S)
>   
> +/* Common TX maximum burst size for chunked transmission in simple paths */
> +#define CI_TX_MAX_BURST 32
eventually it would be good to replace all the rest {IXGBE, I40E, ICE, 
IDPF_VPMD}_TX_MAX_BURST with this
<snip>
>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>

-- 
Regards,
Vladimir

[-- Attachment #2: Type: text/html, Size: 2807 bytes --]

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 29/35] net/intel: consolidate ice and i40e buffer free function
  2026-02-09 16:45   ` [PATCH v4 29/35] net/intel: consolidate ice and i40e buffer free function Bruce Richardson
@ 2026-02-09 23:19     ` Medvedkin, Vladimir
  0 siblings, 0 replies; 274+ messages in thread
From: Medvedkin, Vladimir @ 2026-02-09 23:19 UTC (permalink / raw)
  To: Bruce Richardson, dev; +Cc: Anatoly Burakov

[-- Attachment #1: Type: text/plain, Size: 1370 bytes --]


On 2/9/2026 4:45 PM, Bruce Richardson wrote:
> The buffer freeing function for the simple scalar Tx path is almost
> identical in both ice and i40e drivers, except that the i40e has
> batching for the FAST_FREE case. Consolidate both functions into a
> common one based off the better i40e version.
>
> Signed-off-by: Bruce Richardson<bruce.richardson@intel.com>
> ---
>   drivers/net/intel/common/tx.h        |  3 ++
>   drivers/net/intel/common/tx_scalar.h | 58 +++++++++++++++++++++++++
>   drivers/net/intel/i40e/i40e_rxtx.c   | 63 +---------------------------
>   drivers/net/intel/ice/ice_rxtx.c     | 45 +-------------------
>   4 files changed, 65 insertions(+), 104 deletions(-)
>
> diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
> index ef6d543e7a..67378a0803 100644
> --- a/drivers/net/intel/common/tx.h
> +++ b/drivers/net/intel/common/tx.h
> @@ -66,6 +66,9 @@ enum ci_tx_l2tag1_field {
>   /* Common TX maximum burst size for chunked transmission in simple paths */
>   #define CI_TX_MAX_BURST 32
>   
> +/* Common TX maximum free buffer size for batched bulk freeing */
> +#define CI_TX_MAX_FREE_BUF_SZ 64
> +
same here, eventually it would be good to replace all the rest {IXGBE, 
I40E, ICE, RTE_FM10K}_TX_MAX_FREE_BUF_SZ  with this
<snip>
>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>

-- 
Regards,
Vladimir

[-- Attachment #2: Type: text/html, Size: 2449 bytes --]

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 30/35] net/intel: complete merging simple Tx paths
  2026-02-09 16:45   ` [PATCH v4 30/35] net/intel: complete merging simple Tx paths Bruce Richardson
@ 2026-02-09 23:19     ` Medvedkin, Vladimir
  0 siblings, 0 replies; 274+ messages in thread
From: Medvedkin, Vladimir @ 2026-02-09 23:19 UTC (permalink / raw)
  To: Bruce Richardson, dev; +Cc: Anatoly Burakov

Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>

On 2/9/2026 4:45 PM, Bruce Richardson wrote:
> Complete the deduplication/merger of the ice and i40e Tx simple scalar
> paths.
>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>   drivers/net/intel/common/tx_scalar.h | 87 ++++++++++++++++++++++++++++
>   drivers/net/intel/i40e/i40e_rxtx.c   | 74 +----------------------
>   drivers/net/intel/ice/ice_rxtx.c     | 74 +----------------------
>   3 files changed, 89 insertions(+), 146 deletions(-)
>
<snip>

-- 
Regards,
Vladimir


^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 31/35] net/intel: use non-volatile stores in simple Tx function
  2026-02-09 16:45   ` [PATCH v4 31/35] net/intel: use non-volatile stores in simple Tx function Bruce Richardson
@ 2026-02-09 23:19     ` Medvedkin, Vladimir
  0 siblings, 0 replies; 274+ messages in thread
From: Medvedkin, Vladimir @ 2026-02-09 23:19 UTC (permalink / raw)
  To: Bruce Richardson, dev

Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>

On 2/9/2026 4:45 PM, Bruce Richardson wrote:
> The simple Tx code path can be reworked to use non-volatile stores - as
> is the case with the full-featured Tx path - by reusing the existing
> write_txd function (which just needs to be moved up in the header file).
> This gives a small performance boost.
>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>   drivers/net/intel/common/tx_scalar.h | 55 ++++++++--------------------
>   1 file changed, 16 insertions(+), 39 deletions(-)
>
<snip>

-- 
Regards,
Vladimir


^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 32/35] net/intel: align scalar simple Tx path with vector logic
  2026-02-09 16:45   ` [PATCH v4 32/35] net/intel: align scalar simple Tx path with vector logic Bruce Richardson
@ 2026-02-09 23:19     ` Medvedkin, Vladimir
  0 siblings, 0 replies; 274+ messages in thread
From: Medvedkin, Vladimir @ 2026-02-09 23:19 UTC (permalink / raw)
  To: Bruce Richardson, dev

Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>

On 2/9/2026 4:45 PM, Bruce Richardson wrote:
> The scalar simple Tx path has the same restrictions as the vector Tx
> path, so we can use the same logic flow in both, to try and ensure we
> get best performance from the scalar path.
>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>   drivers/net/intel/common/tx_scalar.h | 54 +++++++++++++++++-----------
>   1 file changed, 34 insertions(+), 20 deletions(-)
>
<snip>

-- 
Regards,
Vladimir


^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 33/35] net/intel: use vector SW ring entry for simple path
  2026-02-09 16:45   ` [PATCH v4 33/35] net/intel: use vector SW ring entry for simple path Bruce Richardson
@ 2026-02-09 23:19     ` Medvedkin, Vladimir
  0 siblings, 0 replies; 274+ messages in thread
From: Medvedkin, Vladimir @ 2026-02-09 23:19 UTC (permalink / raw)
  To: Bruce Richardson, dev; +Cc: Praveen Shetty, Anatoly Burakov, Jingjing Wu

Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>

On 2/9/2026 4:45 PM, Bruce Richardson wrote:
> The simple scalar Tx path does not need to use the full sw_entry
> structure that the full Tx path uses, so rename the flag for "vector_tx"
> to instead be "use_vec_entry" since its sole purpose is to flag the use
> of the smaller tx_entry_vec structure. Then set this flag for the simple
> Tx path, giving us a perf boost.
>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>   drivers/net/intel/common/tx.h                    |  6 ++++--
>   drivers/net/intel/common/tx_scalar.h             | 14 +++++++-------
>   drivers/net/intel/cpfl/cpfl_rxtx.c               |  4 ++--
>   drivers/net/intel/i40e/i40e_rxtx.c               |  2 +-
>   drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c      |  2 +-
>   drivers/net/intel/ice/ice_rxtx.c                 |  2 +-
>   drivers/net/intel/idpf/idpf_common_rxtx_avx512.c |  2 +-
>   drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c  |  2 +-
>   8 files changed, 18 insertions(+), 16 deletions(-)
>
<snip>

-- 
Regards,
Vladimir


^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 34/35] net/intel: use vector mbuf cleanup from simple scalar path
  2026-02-09 16:45   ` [PATCH v4 34/35] net/intel: use vector mbuf cleanup from simple scalar path Bruce Richardson
@ 2026-02-09 23:19     ` Medvedkin, Vladimir
  0 siblings, 0 replies; 274+ messages in thread
From: Medvedkin, Vladimir @ 2026-02-09 23:19 UTC (permalink / raw)
  To: Bruce Richardson, dev; +Cc: Anatoly Burakov

Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>

On 2/9/2026 4:45 PM, Bruce Richardson wrote:
> Since the simple scalar path now uses the vector Tx entry struct, we can
> leverage the vector mbuf cleanup function from that path and avoid
> having a separate cleanup function for it.
>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>   drivers/net/intel/common/tx_scalar.h | 74 ++++++----------------------
>   drivers/net/intel/i40e/i40e_rxtx.c   |  2 +-
>   drivers/net/intel/ice/ice_rxtx.c     |  2 +-
>   3 files changed, 17 insertions(+), 61 deletions(-)
>
<snip>

-- 
Regards,
Vladimir


^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 35/35] net/idpf: enable simple Tx function
  2026-02-09 16:45   ` [PATCH v4 35/35] net/idpf: enable simple Tx function Bruce Richardson
@ 2026-02-09 23:20     ` Medvedkin, Vladimir
  0 siblings, 0 replies; 274+ messages in thread
From: Medvedkin, Vladimir @ 2026-02-09 23:20 UTC (permalink / raw)
  To: Bruce Richardson, dev; +Cc: Jingjing Wu, Praveen Shetty


On 2/9/2026 4:45 PM, Bruce Richardson wrote:
> The common "simple Tx" function - in some ways a scalar version of the
> vector Tx functions - can be used by the idpf driver as well as i40e and
> ice, so add support for it to the driver.
>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>   drivers/net/intel/idpf/idpf_common_device.h |  2 ++
>   drivers/net/intel/idpf/idpf_common_rxtx.c   | 19 +++++++++++++++
>   drivers/net/intel/idpf/idpf_common_rxtx.h   |  3 +++
>   drivers/net/intel/idpf/idpf_rxtx.c          | 26 ++++++++++++++++++++-
>   4 files changed, 49 insertions(+), 1 deletion(-)
<snip>
> diff --git a/drivers/net/intel/idpf/idpf_rxtx.c b/drivers/net/intel/idpf/idpf_rxtx.c
> index 9420200f6d..f2e202d57d 100644
> --- a/drivers/net/intel/idpf/idpf_rxtx.c
> +++ b/drivers/net/intel/idpf/idpf_rxtx.c
> @@ -509,6 +509,22 @@ idpf_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
>   	txq->q_set = true;
>   	dev->data->tx_queues[queue_idx] = txq;
>   
> +	/* Set tx_simple_allowed flag based on queue configuration.
> +	 * For queue 0: explicitly set the flag based on its configuration.
> +	 * For other queues: only set to false if this queue cannot use simple_tx.
> +	 */
> +	if (vport->txq_model == VIRTCHNL2_QUEUE_MODEL_SPLIT)
> +		goto out;
> +
> +	/* for first queue, default to true, disable later if any queue can't meet conditions */
There are no restrictions in which queue order user may call 
rte_eth_tx_queue_setup(). If user called queue_setup() for queue #1 w/o 
RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE, and then for queue #0 with 
RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE, there could be inconsistencies.
> +	if (queue_idx == 0)
> +		adapter->tx_simple_allowed = true;
> +
> +	if ((txq->offloads != (txq->offloads & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE)) ||
> +			txq->tx_rs_thresh < IDPF_VPMD_TX_MAX_BURST)
> +		adapter->tx_simple_allowed = false;
> +
> +out:
>   	return 0;
>   
>   err_complq_setup:
<snip>
>   
>   	/* The primary process selects the tx path for all processes. */

-- 
Regards,
Vladimir


^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 20/35] net/intel: write descriptors using non-volatile pointers
  2026-02-09 23:08     ` Morten Brørup
@ 2026-02-10  9:03       ` Bruce Richardson
  2026-02-10  9:28         ` Morten Brørup
  2026-02-11 14:44       ` Bruce Richardson
  1 sibling, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-10  9:03 UTC (permalink / raw)
  To: Morten Brørup; +Cc: dev

On Tue, Feb 10, 2026 at 12:08:44AM +0100, Morten Brørup wrote:
> > +static inline void
> > +write_txd(volatile void *txd, uint64_t qw0, uint64_t qw1)
> > +{
> > +	uint64_t *txd_qw = __rte_assume_aligned(RTE_CAST_PTR(void *,
> > txd), 16);
> > +
> > +	txd_qw[0] = rte_cpu_to_le_64(qw0);
> > +	txd_qw[1] = rte_cpu_to_le_64(qw1);
> > +}
> 
> How about using __rte_aligned() instead, something like this (untested):
> 
> struct __rte_aligned(16) txd_t {
> 	uint64_t	qw0;
> 	uint64_t	qw1;
> };

I can see if this works for us...

> 
> *RTE_CAST_PTR(volatile struct txd_t *, txd) = { rte_cpu_to_le_64(qw0),
> rte_cpu_to_le_64(qw1) };
> 
> 
> And why strip the "volatile"?
> 

For the descriptor writes, it doesn't matter the order in which the
descriptors and the descriptor fields are actually written, since the NIC
relies upon the tail pointer update - which includes a fence - to inform it
of when the descriptors are ready. The volatile is necessary for reads,
though, which is why the ring is marked as such, but for Tx it prevents the
compiler from opportunistically e.g. converting two 64-byte writes into a
128-byte write.

/Bruce

^ permalink raw reply	[flat|nested] 274+ messages in thread

* RE: [PATCH v4 20/35] net/intel: write descriptors using non-volatile pointers
  2026-02-10  9:03       ` Bruce Richardson
@ 2026-02-10  9:28         ` Morten Brørup
  2026-02-11 14:44           ` Bruce Richardson
  0 siblings, 1 reply; 274+ messages in thread
From: Morten Brørup @ 2026-02-10  9:28 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> Sent: Tuesday, 10 February 2026 10.04
> 
> On Tue, Feb 10, 2026 at 12:08:44AM +0100, Morten Brørup wrote:
> > > +static inline void
> > > +write_txd(volatile void *txd, uint64_t qw0, uint64_t qw1)
> > > +{
> > > +	uint64_t *txd_qw = __rte_assume_aligned(RTE_CAST_PTR(void *,
> > > txd), 16);
> > > +
> > > +	txd_qw[0] = rte_cpu_to_le_64(qw0);
> > > +	txd_qw[1] = rte_cpu_to_le_64(qw1);
> > > +}
> >
> > How about using __rte_aligned() instead, something like this
> (untested):
> >
> > struct __rte_aligned(16) txd_t {
> > 	uint64_t	qw0;
> > 	uint64_t	qw1;
> > };
> 
> I can see if this works for us...
> 
> >
> > *RTE_CAST_PTR(volatile struct txd_t *, txd) = {
> rte_cpu_to_le_64(qw0),
> > rte_cpu_to_le_64(qw1) };
> >
> >
> > And why strip the "volatile"?
> >
> 
> For the descriptor writes, it doesn't matter the order in which the
> descriptors and the descriptor fields are actually written, since the
> NIC
> relies upon the tail pointer update - which includes a fence - to
> inform it
> of when the descriptors are ready. The volatile is necessary for reads,
> though, which is why the ring is marked as such, but for Tx it prevents
> the
> compiler from opportunistically e.g. converting two 64-byte writes into
> a
> 128-byte write.

Makes sense.
Suggest that you spread out a few comments about this at the relevant locations in the source code.



^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 03/35] net/intel: create common post-Tx cleanup function
  2026-02-09 16:45   ` [PATCH v4 03/35] net/intel: create common post-Tx cleanup function Bruce Richardson
@ 2026-02-10 12:18     ` Burakov, Anatoly
  0 siblings, 0 replies; 274+ messages in thread
From: Burakov, Anatoly @ 2026-02-10 12:18 UTC (permalink / raw)
  To: Bruce Richardson, dev; +Cc: Vladimir Medvedkin, Jingjing Wu, Praveen Shetty

On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> The code used in ice, iavf, idpf and i40e for doing cleanup of mbufs
> after they had been transmitted was identical. Therefore deduplicate it
> by moving to common and remove the driver-specific versions.
> 
> Rather than having all Tx code in the one file, which could start
> getting rather long, create a new header file for scalar datapath
> functions.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---

<snip>

> +static __rte_always_inline int
> +ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
> +{
> +	struct ci_tx_entry *sw_ring = txq->sw_ring;
> +	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
> +	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
> +	uint16_t nb_tx_desc = txq->nb_tx_desc;
> +	uint16_t desc_to_clean_to;
> +	uint16_t nb_tx_to_clean;
> +
> +	/* Determine the last descriptor needing to be cleaned */
> +	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
> +	if (desc_to_clean_to >= nb_tx_desc)
> +		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
> +
> +	/* Check if descriptor is done - all drivers use 0xF as done value in bits 3:0 */
> +	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
> +	if ((txd[desc_to_clean_to].cmd_type_offset_bsz & rte_cpu_to_le_64(0x0FUL)) !=
> +			rte_cpu_to_le_64(0x0FUL))

Kind of a nitpick - we do this kind of "(value & mask) == mask" in a lot 
of places, and occasionally it needs these byteswaps and whatnot, which 
IMO hurts readability as there's too much going on on a single line.

I think it would be good to add a helper function e.g.

uint64_t desc_done_msk = rte_cpu_to_le_64(0x0FUL);
if (mask_is_set(txd[desc_to_clean_to].cmd_type_offset_bsz, 
desc_done_msk)) ....

and use it where we can to make the code a bit more semantically meaningful.

That said, not a biggie, so

Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 04/35] net/intel: consolidate definitions for Tx desc fields
  2026-02-09 16:45   ` [PATCH v4 04/35] net/intel: consolidate definitions for Tx desc fields Bruce Richardson
@ 2026-02-10 12:26     ` Burakov, Anatoly
  2026-02-10 16:47       ` Bruce Richardson
  0 siblings, 1 reply; 274+ messages in thread
From: Burakov, Anatoly @ 2026-02-10 12:26 UTC (permalink / raw)
  To: Bruce Richardson, dev; +Cc: Vladimir Medvedkin, Jingjing Wu, Praveen Shetty

On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> The offsets of the various fields within the Tx descriptors are common
> for i40e, iavf, ice and idpf, so put a single set of defines in tx.h and
> use those throughout all drivers. (NOTE: there was a small difference in
> mask of CMD field between drivers depending on whether reserved fields
> or not were included. Those can be ignored as those bits are unused in
> the drivers for which they are reserved). Similarly, the various flag
> fields, such as End-of-packet (EOP) and Report-status (RS) are the same,
> as are offload definitions so consolidate them.

Nitpick: the NOTE should IMO be separated, as otherwise the flow of the 
commit message is a bit confusing, as the latter part kinda parses as 
being part of the NOTE section that accidentally got left out of the 
parens, instead of being a continuation of the pre-NOTE section.

Otherwise, LGTM

Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 05/35] net/intel: add common fn to calculate needed descriptors
  2026-02-09 16:45   ` [PATCH v4 05/35] net/intel: add common fn to calculate needed descriptors Bruce Richardson
@ 2026-02-10 12:29     ` Burakov, Anatoly
  2026-02-10 14:08       ` Bruce Richardson
  0 siblings, 1 reply; 274+ messages in thread
From: Burakov, Anatoly @ 2026-02-10 12:29 UTC (permalink / raw)
  To: Bruce Richardson, dev; +Cc: Vladimir Medvedkin, Jingjing Wu, Praveen Shetty

On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> Multiple drivers used the same logic to calculate how many Tx data
> descriptors were needed. Move that calculation to common code. In the
> process of updating drivers, fix idpf driver calculation for the TSO
> case.
> 

"Fix TSO for idpf" sounds like a bugfix? Can it be backported to stable?

Otherwise,

Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 06/35] net/ice: refactor context descriptor handling
  2026-02-09 16:45   ` [PATCH v4 06/35] net/ice: refactor context descriptor handling Bruce Richardson
@ 2026-02-10 12:42     ` Burakov, Anatoly
  2026-02-10 17:40       ` Bruce Richardson
  0 siblings, 1 reply; 274+ messages in thread
From: Burakov, Anatoly @ 2026-02-10 12:42 UTC (permalink / raw)
  To: Bruce Richardson, dev

On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> Create a single function to manage all context descriptor handling,
> which returns either 0 or 1 depending on whether a descriptor is needed
> or not, as well as returning directly the descriptor contents if
> relevant.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---

<snip>

> +static __rte_always_inline uint16_t
> +get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
> +	const union ci_tx_offload *tx_offload, const struct ci_tx_queue *txq,
> +	uint64_t *qw0, uint64_t *qw1)
> +{
> +	uint16_t cd_l2tag2 = 0;
> +	uint64_t cd_type_cmd_tso_mss = ICE_TX_DESC_DTYPE_CTX;
> +	uint32_t cd_tunneling_params = 0;
> +	uint64_t ptp_tx_index = txq->ice_vsi->adapter->ptp_tx_index;
> +
> +	if (ice_calc_context_desc(ol_flags) == 0)
> +		return 0;
> +
> +	if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)
> +		ice_parse_tunneling_params(ol_flags, *tx_offload, &cd_tunneling_params);
> +
> +	if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
> +		cd_type_cmd_tso_mss |= ice_set_tso_ctx(ol_flags, tx_pkt, *tx_offload);
> +	else if (ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST)
> +		cd_type_cmd_tso_mss |=
> +			((uint64_t)CI_TX_CTX_DESC_TSYN << CI_TXD_QW1_CMD_S) |
> +			((ptp_tx_index << ICE_TXD_CTX_QW1_TSYN_S) & ICE_TXD_CTX_QW1_TSYN_M);

It's tangentially related to this commit but it caught my attention that 
TSO and timestamping are mutually exclusive here. They *are* mutually 
exclusive as far as the driver is concerned so that part is fine, but I 
couldn't find any signs of us enforcing this limitation anywhere in our 
configuration path, so a well behaved application could theoretically 
arrive at this combination of mbuf flags without breaking anything.

(if I understand things correctly, this applies to both ice and i40e)

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 07/35] net/i40e: refactor context descriptor handling
  2026-02-09 16:45   ` [PATCH v4 07/35] net/i40e: " Bruce Richardson
@ 2026-02-10 12:48     ` Burakov, Anatoly
  2026-02-10 14:10       ` Bruce Richardson
  0 siblings, 1 reply; 274+ messages in thread
From: Burakov, Anatoly @ 2026-02-10 12:48 UTC (permalink / raw)
  To: Bruce Richardson, dev

On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> Move all context descriptor handling to a single function, as with the
> ice driver, and use the same function signature as that driver.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---

<snip>

> +static __rte_always_inline uint16_t
> +get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
> +		 const union ci_tx_offload *tx_offload,
> +		 const struct ci_tx_queue *txq __rte_unused,
> +		 uint64_t *qw0, uint64_t *qw1)
> +{
> +	uint16_t cd_l2tag2 = 0;
> +	uint64_t cd_type_cmd_tso_mss = I40E_TX_DESC_DTYPE_CONTEXT;
> +	uint32_t cd_tunneling_params = 0;
> +
> +	if (i40e_calc_context_desc(ol_flags) == 0)
> +		return 0;
> +
> +	if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)
> +		i40e_parse_tunneling_params(ol_flags, *tx_offload, &cd_tunneling_params);
> +
> +	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
> +		cd_type_cmd_tso_mss |= i40e_set_tso_ctx(ol_flags, tx_pkt, *tx_offload);
> +	} else {
> +#ifdef RTE_LIBRTE_IEEE1588
> +		if (ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST)
> +			cd_type_cmd_tso_mss |=
> +				((uint64_t)I40E_TX_CTX_DESC_TSYN << I40E_TXD_CTX_QW1_CMD_SHIFT);
> +#endif

I couldn't find any places where we define this, it appears to be some 
sort of legacy define, making this basically dead code?

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 08/35] net/idpf: refactor context descriptor handling
  2026-02-09 16:45   ` [PATCH v4 08/35] net/idpf: " Bruce Richardson
@ 2026-02-10 12:52     ` Burakov, Anatoly
  0 siblings, 0 replies; 274+ messages in thread
From: Burakov, Anatoly @ 2026-02-10 12:52 UTC (permalink / raw)
  To: Bruce Richardson, dev; +Cc: Ciara Loftus, Jingjing Wu, Praveen Shetty

On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> Move all context descriptor handling to a single function, as with the
> ice and i40e drivers.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> Acked-by: Ciara Loftus <ciara.loftus@intel.com>
> ---
>   drivers/net/intel/idpf/idpf_common_rxtx.c | 61 +++++++++++------------
>   1 file changed, 28 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
> index 11d6848430..9219ad9047 100644
> --- a/drivers/net/intel/idpf/idpf_common_rxtx.c
> +++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
> @@ -845,37 +845,36 @@ idpf_calc_context_desc(uint64_t flags)
>   	return 0;
>   }
>   
> -/* set TSO context descriptor
> +/* set TSO context descriptor, returns 0 if no context needed, 1 if context set
>    */
> -static inline void
> -idpf_set_splitq_tso_ctx(struct rte_mbuf *mbuf,
> +static inline uint16_t
> +idpf_set_tso_ctx(uint64_t ol_flags, struct rte_mbuf *mbuf,
>   			union ci_tx_offload tx_offload,
> -			volatile union idpf_flex_tx_ctx_desc *ctx_desc)
> +			uint64_t *qw0, uint64_t *qw1)
>   {
> -	uint16_t cmd_dtype;
> +	uint16_t cmd_dtype = IDPF_TX_DESC_DTYPE_FLEX_TSO_CTX | IDPF_TX_FLEX_CTX_DESC_CMD_TSO;
> +	uint16_t tso_segsz = mbuf->tso_segsz;
>   	uint32_t tso_len;
>   	uint8_t hdr_len;
>   
> +	if (idpf_calc_context_desc(ol_flags) == 0)
> +		return 0;
> +
> +	/* TSO context descriptor setup */
>   	if (tx_offload.l4_len == 0) {
>   		TX_LOG(DEBUG, "L4 length set to 0");
> -		return;
> +		return 0;
>   	}
>   
> -	hdr_len = tx_offload.l2_len +
> -		tx_offload.l3_len +
> -		tx_offload.l4_len;
> -	cmd_dtype = IDPF_TX_DESC_DTYPE_FLEX_TSO_CTX |
> -		IDPF_TX_FLEX_CTX_DESC_CMD_TSO;
> +	hdr_len = tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len;
>   	tso_len = mbuf->pkt_len - hdr_len;
>   
> -	ctx_desc->tso.qw1.cmd_dtype = rte_cpu_to_le_16(cmd_dtype);
> -	ctx_desc->tso.qw0.hdr_len = hdr_len;
> -	ctx_desc->tso.qw0.mss_rt =
> -		rte_cpu_to_le_16((uint16_t)mbuf->tso_segsz &
> -				 IDPF_TXD_FLEX_CTX_MSS_RT_M);
> -	ctx_desc->tso.qw0.flex_tlen =
> -		rte_cpu_to_le_32(tso_len &
> -				 IDPF_TXD_FLEX_CTX_MSS_RT_M);
> +	*qw0 = rte_cpu_to_le_32(tso_len & IDPF_TXD_FLEX_CTX_MSS_RT_M) |
> +	       ((uint64_t)rte_cpu_to_le_16(tso_segsz & IDPF_TXD_FLEX_CTX_MSS_RT_M) << 32) |
> +	       ((uint64_t)hdr_len << 48);
> +	*qw1 = rte_cpu_to_le_16(cmd_dtype);
> +
> +	return 1;
>   }
>   
>   RTE_EXPORT_INTERNAL_SYMBOL(idpf_dp_splitq_xmit_pkts)
> @@ -933,7 +932,8 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>   		tx_offload.l4_len = tx_pkt->l4_len;
>   		tx_offload.tso_segsz = tx_pkt->tso_segsz;
>   		/* Calculate the number of context descriptors needed. */
> -		nb_ctx = idpf_calc_context_desc(ol_flags);
> +		uint64_t cd_qw0, cd_qw1;
> +		nb_ctx = idpf_set_tso_ctx(ol_flags, tx_pkt, tx_offload, &cd_qw0, &cd_qw1);
>   
>   		/* Calculate the number of TX descriptors needed for
>   		 * each packet. For TSO packets, use ci_calc_pkt_desc as
> @@ -950,12 +950,10 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>   
>   		/* context descriptor */
>   		if (nb_ctx != 0) {
> -			volatile union idpf_flex_tx_ctx_desc *ctx_desc =
> -				(volatile union idpf_flex_tx_ctx_desc *)&txr[tx_id];
> +			uint64_t *ctx_desc = RTE_CAST_PTR(uint64_t *, &txr[tx_id]);
>   
> -			if ((ol_flags & RTE_MBUF_F_TX_TCP_SEG) != 0)
> -				idpf_set_splitq_tso_ctx(tx_pkt, tx_offload,
> -							ctx_desc);
> +			ctx_desc[0] = cd_qw0;
> +			ctx_desc[1] = cd_qw1;
>   
>   			tx_id++;
>   			if (tx_id == txq->nb_tx_desc)
> @@ -1388,7 +1386,8 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>   		tx_offload.l4_len = tx_pkt->l4_len;
>   		tx_offload.tso_segsz = tx_pkt->tso_segsz;
>   		/* Calculate the number of context descriptors needed. */
> -		nb_ctx = idpf_calc_context_desc(ol_flags);
> +		uint64_t cd_qw0, cd_qw1;
> +		nb_ctx = idpf_set_tso_ctx(ol_flags, tx_pkt, tx_offload, &cd_qw0, &cd_qw1);
>   
>   		/* The number of descriptors that must be allocated for
>   		 * a packet. For TSO packets, use ci_calc_pkt_desc as
> @@ -1431,9 +1430,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>   
>   		if (nb_ctx != 0) {
>   			/* Setup TX context descriptor if required */
> -			volatile union idpf_flex_tx_ctx_desc *ctx_txd =
> -				(volatile union idpf_flex_tx_ctx_desc *)
> -				&txr[tx_id];
> +			uint64_t *ctx_txd = RTE_CAST_PTR(uint64_t *, &txr[tx_id]);
>   
>   			txn = &sw_ring[txe->next_id];
>   			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
> @@ -1442,10 +1439,8 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>   				txe->mbuf = NULL;
>   			}
>   
> -			/* TSO enabled */
> -			if ((ol_flags & RTE_MBUF_F_TX_TCP_SEG) != 0)
> -				idpf_set_splitq_tso_ctx(tx_pkt, tx_offload,
> -							ctx_txd);
> +			ctx_txd[0] = cd_qw0;
> +			ctx_txd[1] = cd_qw1;
>   
>   			txe->last_id = tx_last;
>   			tx_id = txe->next_id;

Yay type safety! :D Gotta love C

Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 09/35] net/intel: consolidate checksum mask definition
  2026-02-09 16:45   ` [PATCH v4 09/35] net/intel: consolidate checksum mask definition Bruce Richardson
@ 2026-02-10 13:00     ` Burakov, Anatoly
  0 siblings, 0 replies; 274+ messages in thread
From: Burakov, Anatoly @ 2026-02-10 13:00 UTC (permalink / raw)
  To: Bruce Richardson, dev; +Cc: Vladimir Medvedkin, Jingjing Wu, Praveen Shetty

On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> Create a common definition for checksum masks across iavf, idpf, i40e
> and ice drivers.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---

Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 10/35] net/intel: create common checksum Tx offload function
  2026-02-09 16:45   ` [PATCH v4 10/35] net/intel: create common checksum Tx offload function Bruce Richardson
@ 2026-02-10 13:04     ` Burakov, Anatoly
  2026-02-10 17:56       ` Bruce Richardson
  0 siblings, 1 reply; 274+ messages in thread
From: Burakov, Anatoly @ 2026-02-10 13:04 UTC (permalink / raw)
  To: Bruce Richardson, dev

On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> Since i40e and ice have the same checksum offload logic, merge their
> functions into one. Future rework should enable this to be used by more
> drivers also.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>   drivers/net/intel/common/tx_scalar.h | 58 +++++++++++++++++++++++++++
>   drivers/net/intel/i40e/i40e_rxtx.c   | 52 +-----------------------
>   drivers/net/intel/i40e/i40e_rxtx.h   |  1 +
>   drivers/net/intel/ice/ice_rxtx.c     | 60 +---------------------------
>   drivers/net/intel/ice/ice_rxtx.h     |  1 +
>   5 files changed, 62 insertions(+), 110 deletions(-)
> 
> diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
> index 573f5136a9..cf0dcb4b2c 100644
> --- a/drivers/net/intel/common/tx_scalar.h
> +++ b/drivers/net/intel/common/tx_scalar.h
> @@ -59,6 +59,64 @@ ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
>   	return 0;
>   }
>   
> +/* Common checksum enable function for Intel drivers (ice, i40e, etc.) */
> +static inline void
> +ci_txd_enable_checksum(uint64_t ol_flags,
> +		       uint32_t *td_cmd,
> +		       uint32_t *td_offset,
> +		       union ci_tx_offload tx_offload)
> +{
> +	/* Enable L3 checksum offloads */
> +	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
> +		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
> +		*td_offset |= (tx_offload.l3_len >> 2) <<
> +			CI_TX_DESC_LEN_IPLEN_S;
> +	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
> +		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
> +		*td_offset |= (tx_offload.l3_len >> 2) <<
> +			CI_TX_DESC_LEN_IPLEN_S;
> +	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
> +		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
> +		*td_offset |= (tx_offload.l3_len >> 2) <<
> +			CI_TX_DESC_LEN_IPLEN_S;
> +	}
> +
> +	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
> +		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
> +		*td_offset |= (tx_offload.l4_len >> 2) <<
> +			      CI_TX_DESC_LEN_L4_LEN_S;
> +		return;
> +	}
> +
> +	if (ol_flags & RTE_MBUF_F_TX_UDP_SEG) {
> +		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
> +		*td_offset |= (tx_offload.l4_len >> 2) <<
> +			      CI_TX_DESC_LEN_L4_LEN_S;
> +		return;
> +	}
> +
> +	/* Enable L4 checksum offloads */
> +	switch (ol_flags & RTE_MBUF_F_TX_L4_MASK) {
> +	case RTE_MBUF_F_TX_TCP_CKSUM:
> +		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
> +		*td_offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
> +			      CI_TX_DESC_LEN_L4_LEN_S;
> +		break;
> +	case RTE_MBUF_F_TX_SCTP_CKSUM:
> +		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
> +		*td_offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
> +			      CI_TX_DESC_LEN_L4_LEN_S;
> +		break;
> +	case RTE_MBUF_F_TX_UDP_CKSUM:
> +		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
> +		*td_offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
> +			      CI_TX_DESC_LEN_L4_LEN_S;
> +		break;
> +	default:
> +		break;
> +	}

Nitpick: some of the indentation here is inconststent. Perhaps enabling 
whitespace view in your editor would help, if you haven't done so?

(the inconsistency was already present in the ice function but that 
doesn't mean we have to copy it!)

Otherwise,

Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 11/35] net/intel: create a common scalar Tx function
  2026-02-09 16:45   ` [PATCH v4 11/35] net/intel: create a common scalar Tx function Bruce Richardson
@ 2026-02-10 13:14     ` Burakov, Anatoly
  2026-02-10 18:03       ` Bruce Richardson
  0 siblings, 1 reply; 274+ messages in thread
From: Burakov, Anatoly @ 2026-02-10 13:14 UTC (permalink / raw)
  To: Bruce Richardson, dev

On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> Given the similarities between the transmit functions across various
> Intel drivers, make a start on consolidating them by moving the ice Tx
> function into common, for reuse by other drivers.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---

> +	if (ts_fns != NULL)
> +		ts_id = ts_fns->get_ts_tail(txq);
> +
> +	/* Check if the descriptor ring needs to be cleaned. */
> +	if (txq->nb_tx_free < txq->tx_free_thresh)
> +		(void)ci_tx_xmit_cleanup(txq);

Why (void) ?

<snip>

> +		if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
> +			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
> +		else
> +			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
> +		tx_last = (uint16_t)(tx_id + nb_used - 1);
> +
> +		/* Circular ring */

nicholas_cage_you_dont_say.jpg

Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 12/35] net/i40e: use common scalar Tx function
  2026-02-09 16:45   ` [PATCH v4 12/35] net/i40e: use " Bruce Richardson
@ 2026-02-10 13:14     ` Burakov, Anatoly
  0 siblings, 0 replies; 274+ messages in thread
From: Burakov, Anatoly @ 2026-02-10 13:14 UTC (permalink / raw)
  To: Bruce Richardson, dev

On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> Following earlier rework, the scalar transmit function for i40e can use
> the common function previously moved over from ice driver. This saves
> hundreds of duplicated lines of code.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---

Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 13/35] net/intel: add IPsec hooks to common Tx function
  2026-02-09 16:45   ` [PATCH v4 13/35] net/intel: add IPsec hooks to common " Bruce Richardson
@ 2026-02-10 13:16     ` Burakov, Anatoly
  0 siblings, 0 replies; 274+ messages in thread
From: Burakov, Anatoly @ 2026-02-10 13:16 UTC (permalink / raw)
  To: Bruce Richardson, dev

On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> The iavf driver has IPsec offload support on Tx, so add hooks to the
> common Tx function to support that. Do so in a way that has zero
> performance impact for drivers which do not have IPSec support, by
> passing in compile-time NULL constants for the function pointers, which
> can be optimized away by the compiler.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 14/35] net/intel: support configurable VLAN tag insertion on Tx
  2026-02-09 16:45   ` [PATCH v4 14/35] net/intel: support configurable VLAN tag insertion on Tx Bruce Richardson
@ 2026-02-10 13:21     ` Burakov, Anatoly
  2026-02-10 18:20       ` Bruce Richardson
  0 siblings, 1 reply; 274+ messages in thread
From: Burakov, Anatoly @ 2026-02-10 13:21 UTC (permalink / raw)
  To: Bruce Richardson, dev

On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> Make the VLAN tag insertion logic configurable in the common code, as to
> where inner/outer tags get placed.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---

Hi Bruce,
I might be missing something obvious, but...

>   
> -		/* Descriptor based VLAN insertion */
> -		if (ol_flags & (RTE_MBUF_F_TX_VLAN | RTE_MBUF_F_TX_QINQ)) {
> +		/* Descriptor based VLAN/QinQ insertion */
> +		/* for single vlan offload, only insert in data desc with VLAN_IN_L2TAG1 is set
> +		 * for qinq offload, we always put inner tag in L2Tag1
> +		 */
> +		if (((ol_flags & RTE_MBUF_F_TX_VLAN) && l2tag1_field == CI_VLAN_IN_L2TAG1) ||
> +				(ol_flags & RTE_MBUF_F_TX_QINQ)) {
>   			td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
>   			td_tag = tx_pkt->vlan_tci;
>   		}

I can see that we insert VLAN tag on TX_VLAN and VLAN_IN_TAG1. But then...


> @@ -1004,8 +1004,7 @@ get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
>   	/* TX context descriptor based double VLAN insert */
>   	if (ol_flags & RTE_MBUF_F_TX_QINQ) {
>   		cd_l2tag2 = tx_pkt->vlan_tci_outer;
> -		cd_type_cmd_tso_mss |=
> -				((uint64_t)I40E_TX_CTX_DESC_IL2TAG2 << I40E_TXD_CTX_QW1_CMD_SHIFT);
> +		cd_type_cmd_tso_mss |= (I40E_TX_CTX_DESC_IL2TAG2 << I40E_TXD_CTX_QW1_CMD_SHIFT);

this logic is only triggered for QinQ. Meaning, there's nowhere we 
insert VLAN tag on TX_VLAN and VLAN_IN_TAG2?

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 15/35] net/iavf: use common scalar Tx function
  2026-02-09 16:45   ` [PATCH v4 15/35] net/iavf: use common scalar Tx function Bruce Richardson
@ 2026-02-10 13:27     ` Burakov, Anatoly
  2026-02-10 18:31       ` Bruce Richardson
  0 siblings, 1 reply; 274+ messages in thread
From: Burakov, Anatoly @ 2026-02-10 13:27 UTC (permalink / raw)
  To: Bruce Richardson, dev; +Cc: Vladimir Medvedkin

On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> Now that the common scalar Tx function has all necessary hooks for the
> features supported by the iavf driver, use the common function to avoid
> duplicated code.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---


> +		}
>   
> -		return;
> +		/* TSO segmentation field */
> +		tlen = iavf_fill_ctx_desc_segmentation_field(&cd_type_cmd,
> +							     mbuf, ipsec_md);
> +		(void)tlen; /* Suppress unused variable warning */

RTE_SET_USED?

Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 16/35] net/i40e: document requirement for QinQ support
  2026-02-09 16:45   ` [PATCH v4 16/35] net/i40e: document requirement for QinQ support Bruce Richardson
@ 2026-02-10 13:27     ` Burakov, Anatoly
  0 siblings, 0 replies; 274+ messages in thread
From: Burakov, Anatoly @ 2026-02-10 13:27 UTC (permalink / raw)
  To: Bruce Richardson, dev

On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> In order to get multiple VLANs inserted in an outgoing packet with QinQ
> offload the i40e driver needs to be set to double vlan mode. This is
> done by using the VLAN_EXTEND Rx config flag. Add a code check for this
> dependency and update the docs about it.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---

Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 17/35] net/idpf: use common scalar Tx function
  2026-02-09 16:45   ` [PATCH v4 17/35] net/idpf: use common scalar Tx function Bruce Richardson
@ 2026-02-10 13:30     ` Burakov, Anatoly
  0 siblings, 0 replies; 274+ messages in thread
From: Burakov, Anatoly @ 2026-02-10 13:30 UTC (permalink / raw)
  To: Bruce Richardson, dev; +Cc: Jingjing Wu, Praveen Shetty

On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> Update idpf driver to use the common scalar Tx function in single-queue
> configuration.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---

Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 18/35] net/intel: avoid writing the final pkt descriptor twice
  2026-02-09 16:45   ` [PATCH v4 18/35] net/intel: avoid writing the final pkt descriptor twice Bruce Richardson
@ 2026-02-10 13:31     ` Burakov, Anatoly
  0 siblings, 0 replies; 274+ messages in thread
From: Burakov, Anatoly @ 2026-02-10 13:31 UTC (permalink / raw)
  To: Bruce Richardson, dev

On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> In the scalar datapath, there is a loop to handle multi-segment, and
> multi-descriptor packets on Tx. After that loop, the end-of-packet bit
> was written to the descriptor separately, meaning that for each
> single-descriptor packet there were two writes to the second quad-word -
> basically 3 x 64-bit writes rather than just 2. Adjusting the code to
> compute the EOP bit inside the loop saves that extra write per packet
> and so improves performance.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---

Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 21/35] net/intel: remove unnecessary flag clearing
  2026-02-09 16:45   ` [PATCH v4 21/35] net/intel: remove unnecessary flag clearing Bruce Richardson
@ 2026-02-10 13:33     ` Burakov, Anatoly
  0 siblings, 0 replies; 274+ messages in thread
From: Burakov, Anatoly @ 2026-02-10 13:33 UTC (permalink / raw)
  To: Bruce Richardson, dev

On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> When cleaning the Tx ring, there is no need to zero out the done flag
> from the completed entry. That flag will be automatically cleared when
> the descriptor is next written. This gives a small performance benefit.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---

Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 22/35] net/intel: mark mid-burst ring cleanup as unlikely
  2026-02-09 16:45   ` [PATCH v4 22/35] net/intel: mark mid-burst ring cleanup as unlikely Bruce Richardson
@ 2026-02-10 13:36     ` Burakov, Anatoly
  2026-02-10 14:13       ` Bruce Richardson
  0 siblings, 1 reply; 274+ messages in thread
From: Burakov, Anatoly @ 2026-02-10 13:36 UTC (permalink / raw)
  To: Bruce Richardson, dev

On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> It should rarely be the case that we need to cleanup the descriptor ring
> mid-burst, so mark as unlikely to help performance.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---

Does it measurably help performance? I'm by no means a performance 
person so this isn't really my wheelhouse, but I remember enough patches 
removing "unlikely() for performance reasons" to get suspicious!

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 23/35] net/intel: add special handling for single desc packets
  2026-02-09 16:45   ` [PATCH v4 23/35] net/intel: add special handling for single desc packets Bruce Richardson
@ 2026-02-10 13:57     ` Burakov, Anatoly
  0 siblings, 0 replies; 274+ messages in thread
From: Burakov, Anatoly @ 2026-02-10 13:57 UTC (permalink / raw)
  To: Bruce Richardson, dev

On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> Within the scalar packets, add a shortcut path for packets that don't
> use TSO and have only a single data descriptor.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---

Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 05/35] net/intel: add common fn to calculate needed descriptors
  2026-02-10 12:29     ` Burakov, Anatoly
@ 2026-02-10 14:08       ` Bruce Richardson
  2026-02-10 14:17         ` Burakov, Anatoly
  0 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-10 14:08 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev, Vladimir Medvedkin, Jingjing Wu, Praveen Shetty

On Tue, Feb 10, 2026 at 01:29:48PM +0100, Burakov, Anatoly wrote:
> On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> > Multiple drivers used the same logic to calculate how many Tx data
> > descriptors were needed. Move that calculation to common code. In the
> > process of updating drivers, fix idpf driver calculation for the TSO
> > case.
> > 
> 
> "Fix TSO for idpf" sounds like a bugfix? Can it be backported to stable?
> 
Yes, it is a bug fix for a particular edge case. However, as done here, the
fix is implied by the code changes in the consolidation, and depends upon
them. Any fix for backport would need to be a different, standalone patch,
based on this.

/Bruce

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 07/35] net/i40e: refactor context descriptor handling
  2026-02-10 12:48     ` Burakov, Anatoly
@ 2026-02-10 14:10       ` Bruce Richardson
  2026-02-10 14:19         ` Burakov, Anatoly
  0 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-10 14:10 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev

On Tue, Feb 10, 2026 at 01:48:20PM +0100, Burakov, Anatoly wrote:
> On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> > Move all context descriptor handling to a single function, as with the
> > ice driver, and use the same function signature as that driver.
> > 
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > ---
> 
> <snip>
> 
> > +static __rte_always_inline uint16_t
> > +get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
> > +		 const union ci_tx_offload *tx_offload,
> > +		 const struct ci_tx_queue *txq __rte_unused,
> > +		 uint64_t *qw0, uint64_t *qw1)
> > +{
> > +	uint16_t cd_l2tag2 = 0;
> > +	uint64_t cd_type_cmd_tso_mss = I40E_TX_DESC_DTYPE_CONTEXT;
> > +	uint32_t cd_tunneling_params = 0;
> > +
> > +	if (i40e_calc_context_desc(ol_flags) == 0)
> > +		return 0;
> > +
> > +	if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)
> > +		i40e_parse_tunneling_params(ol_flags, *tx_offload, &cd_tunneling_params);
> > +
> > +	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
> > +		cd_type_cmd_tso_mss |= i40e_set_tso_ctx(ol_flags, tx_pkt, *tx_offload);
> > +	} else {
> > +#ifdef RTE_LIBRTE_IEEE1588
> > +		if (ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST)
> > +			cd_type_cmd_tso_mss |=
> > +				((uint64_t)I40E_TX_CTX_DESC_TSYN << I40E_TXD_CTX_QW1_CMD_SHIFT);
> > +#endif
> 
> I couldn't find any places where we define this, it appears to be some sort
> of legacy define, making this basically dead code?
> 

It is legacy, and does need to be fixed, but across all of DPDK I think.
Testpmd, for example, has IEEE1588 ifdefs also.

However, for this patch, it's probably harmless enough to remove the ifdef
here and always allow this code path to execute.

/Bruce

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 24/35] net/intel: use separate array for desc status tracking
  2026-02-09 16:45   ` [PATCH v4 24/35] net/intel: use separate array for desc status tracking Bruce Richardson
@ 2026-02-10 14:11     ` Burakov, Anatoly
  0 siblings, 0 replies; 274+ messages in thread
From: Burakov, Anatoly @ 2026-02-10 14:11 UTC (permalink / raw)
  To: Bruce Richardson, dev; +Cc: Praveen Shetty, Vladimir Medvedkin, Jingjing Wu

On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> Rather than writing a last_id for each individual descriptor, we can
> write one only for places where the "report status" (RS) bit is set,
> i.e. the descriptors which will be written back when done. The method
> used for marking what descriptors are free is also changed in the
> process, even if the last descriptor with the "done" bits set is past
> the expected point, we only track up to the expected point, and leave
> the rest to be counted as freed next time. This means that we always
> have the RS/DD bits set at fixed intervals, and we always track free
> slots in units of the same tx_free_thresh intervals.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---


>   	if (!is_splitq) {
>   		txq->ci_tx_ring = mz->addr;
>   		idpf_qc_single_tx_queue_reset(txq);
> @@ -628,6 +642,9 @@ cpfl_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
>   	return 0;
>   
>   err_complq_setup:
> +	rte_free(txq->rs_last_id);
> +err_rs_last_id_alloc:
> +	rte_free(txq->sw_ring);

This looks like this free wasn't there before and it should've been. 
Separate it out as a bugfix to stable?

<snip>

>   		txq->ci_tx_ring = mz->addr;
>   		idpf_qc_single_tx_queue_reset(txq);
> @@ -502,6 +512,9 @@ idpf_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
>   	return 0;
>   
>   err_complq_setup:
> +	rte_free(txq->rs_last_id);
> +err_rs_last_id_alloc:
> +	rte_free(txq->sw_ring);

Same, bugfix to stable?

>   err_sw_ring_alloc:
>   	idpf_dma_zone_release(mz);
>   err_mz_reserve:

Otherwise LGTM

Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 25/35] net/ixgbe: use separate array for desc status tracking
  2026-02-09 16:45   ` [PATCH v4 25/35] net/ixgbe: " Bruce Richardson
@ 2026-02-10 14:12     ` Burakov, Anatoly
  0 siblings, 0 replies; 274+ messages in thread
From: Burakov, Anatoly @ 2026-02-10 14:12 UTC (permalink / raw)
  To: Bruce Richardson, dev; +Cc: Vladimir Medvedkin

On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> Due to significant differences in the ixgbe transmit descriptors, the
> ixgbe driver does not use the common scalar Tx functionality. Update the
> driver directly so its use of the rs_last_id array matches that of the
> common Tx code.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---

Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 22/35] net/intel: mark mid-burst ring cleanup as unlikely
  2026-02-10 13:36     ` Burakov, Anatoly
@ 2026-02-10 14:13       ` Bruce Richardson
  2026-02-11 18:12         ` Bruce Richardson
  0 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-10 14:13 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev

On Tue, Feb 10, 2026 at 02:36:56PM +0100, Burakov, Anatoly wrote:
> On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> > It should rarely be the case that we need to cleanup the descriptor ring
> > mid-burst, so mark as unlikely to help performance.
> > 
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > ---
> 
> Does it measurably help performance? I'm by no means a performance person so
> this isn't really my wheelhouse, but I remember enough patches removing
> "unlikely() for performance reasons" to get suspicious!
> 
> -- 
Yes, I tend to be suspicous too. I will reverify and, if not measurable,
will drop this patch from the set.

/Bruce

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 26/35] net/intel: drop unused Tx queue used count
  2026-02-09 16:45   ` [PATCH v4 26/35] net/intel: drop unused Tx queue used count Bruce Richardson
@ 2026-02-10 14:14     ` Burakov, Anatoly
  0 siblings, 0 replies; 274+ messages in thread
From: Burakov, Anatoly @ 2026-02-10 14:14 UTC (permalink / raw)
  To: Bruce Richardson, dev; +Cc: Vladimir Medvedkin, Jingjing Wu, Praveen Shetty

On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> Since drivers now track the setting of the RS bit based on fixed
> thresholds rather than after a fixed number of descriptors, we no longer
> need to track the number of descriptors used from one call to another.
> Therefore we can remove the tx_used value in the Tx queue structure.
> 
> This value was still being used inside the IDPF splitq scalar code,
> however, the ipdf driver-specific section of the Tx queue structure also
> had an rs_compl_count value that was only used for the vector code
> paths, so we can use it to replace the old tx_used value in the scalar
> path.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---

>   		txd->qw1.cmd_dtype |= IDPF_TXD_FLEX_FLOW_CMD_EOP;
>   
>   		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
> -		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
> +		txq->rs_compl_count += nb_used;
>   
> -		if (txq->nb_tx_used >= 32) {
> +		if (txq->rs_compl_count >= 32) {

Should 32 perpahs be a macro?

>   			txd->qw1.cmd_dtype |= IDPF_TXD_FLEX_FLOW_CMD_RE;
>   			/* Update txq RE bit counters */
> -			txq->nb_tx_used = 0;
> +			txq->rs_compl_count = 0;
>   		}
>   	}
>   

Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 27/35] net/intel: remove index for tracking end of packet
  2026-02-09 16:45   ` [PATCH v4 27/35] net/intel: remove index for tracking end of packet Bruce Richardson
@ 2026-02-10 14:15     ` Burakov, Anatoly
  0 siblings, 0 replies; 274+ messages in thread
From: Burakov, Anatoly @ 2026-02-10 14:15 UTC (permalink / raw)
  To: Bruce Richardson, dev; +Cc: Vladimir Medvedkin, Jingjing Wu, Praveen Shetty

On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> The last_id value in each tx_sw_queue entry was no longer used in the
> datapath, remove it and its initialization. For the function releasing
> packets back, rather than relying on "last_id" to identify end of
> packet, instead check for the next pointer being NULL.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---

Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 05/35] net/intel: add common fn to calculate needed descriptors
  2026-02-10 14:08       ` Bruce Richardson
@ 2026-02-10 14:17         ` Burakov, Anatoly
  2026-02-10 17:25           ` Bruce Richardson
  0 siblings, 1 reply; 274+ messages in thread
From: Burakov, Anatoly @ 2026-02-10 14:17 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev, Vladimir Medvedkin, Jingjing Wu, Praveen Shetty

On 2/10/2026 3:08 PM, Bruce Richardson wrote:
> On Tue, Feb 10, 2026 at 01:29:48PM +0100, Burakov, Anatoly wrote:
>> On 2/9/2026 5:45 PM, Bruce Richardson wrote:
>>> Multiple drivers used the same logic to calculate how many Tx data
>>> descriptors were needed. Move that calculation to common code. In the
>>> process of updating drivers, fix idpf driver calculation for the TSO
>>> case.
>>>
>>
>> "Fix TSO for idpf" sounds like a bugfix? Can it be backported to stable?
>>
> Yes, it is a bug fix for a particular edge case. However, as done here, the
> fix is implied by the code changes in the consolidation, and depends upon
> them. Any fix for backport would need to be a different, standalone patch,
> based on this.
> 
> /Bruce

So the original code didn't have TSO at all? I.e. this can't be fixed as 
a prerequisite patch to this patchset?

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 07/35] net/i40e: refactor context descriptor handling
  2026-02-10 14:10       ` Bruce Richardson
@ 2026-02-10 14:19         ` Burakov, Anatoly
  2026-02-10 17:54           ` Bruce Richardson
  0 siblings, 1 reply; 274+ messages in thread
From: Burakov, Anatoly @ 2026-02-10 14:19 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

On 2/10/2026 3:10 PM, Bruce Richardson wrote:
> On Tue, Feb 10, 2026 at 01:48:20PM +0100, Burakov, Anatoly wrote:
>> On 2/9/2026 5:45 PM, Bruce Richardson wrote:
>>> Move all context descriptor handling to a single function, as with the
>>> ice driver, and use the same function signature as that driver.
>>>
>>> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
>>> ---
>>
>> <snip>
>>
>>> +static __rte_always_inline uint16_t
>>> +get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
>>> +		 const union ci_tx_offload *tx_offload,
>>> +		 const struct ci_tx_queue *txq __rte_unused,
>>> +		 uint64_t *qw0, uint64_t *qw1)
>>> +{
>>> +	uint16_t cd_l2tag2 = 0;
>>> +	uint64_t cd_type_cmd_tso_mss = I40E_TX_DESC_DTYPE_CONTEXT;
>>> +	uint32_t cd_tunneling_params = 0;
>>> +
>>> +	if (i40e_calc_context_desc(ol_flags) == 0)
>>> +		return 0;
>>> +
>>> +	if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)
>>> +		i40e_parse_tunneling_params(ol_flags, *tx_offload, &cd_tunneling_params);
>>> +
>>> +	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
>>> +		cd_type_cmd_tso_mss |= i40e_set_tso_ctx(ol_flags, tx_pkt, *tx_offload);
>>> +	} else {
>>> +#ifdef RTE_LIBRTE_IEEE1588
>>> +		if (ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST)
>>> +			cd_type_cmd_tso_mss |=
>>> +				((uint64_t)I40E_TX_CTX_DESC_TSYN << I40E_TXD_CTX_QW1_CMD_SHIFT);
>>> +#endif
>>
>> I couldn't find any places where we define this, it appears to be some sort
>> of legacy define, making this basically dead code?
>>
> 
> It is legacy, and does need to be fixed, but across all of DPDK I think.
> Testpmd, for example, has IEEE1588 ifdefs also.
> 
> However, for this patch, it's probably harmless enough to remove the ifdef
> here and always allow this code path to execute.
> 
> /Bruce

Sure, but I would've preferred this to be a separate patch as it's 
semantically different from what you're doing here. Perhaps it can be 
fixed as one of the early patches in the series as "preparatory work" 
for this one.

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 04/35] net/intel: consolidate definitions for Tx desc fields
  2026-02-10 12:26     ` Burakov, Anatoly
@ 2026-02-10 16:47       ` Bruce Richardson
  0 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-10 16:47 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev, Vladimir Medvedkin, Jingjing Wu, Praveen Shetty

On Tue, Feb 10, 2026 at 01:26:19PM +0100, Burakov, Anatoly wrote:
> On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> > The offsets of the various fields within the Tx descriptors are common
> > for i40e, iavf, ice and idpf, so put a single set of defines in tx.h and
> > use those throughout all drivers. (NOTE: there was a small difference in
> > mask of CMD field between drivers depending on whether reserved fields
> > or not were included. Those can be ignored as those bits are unused in
> > the drivers for which they are reserved). Similarly, the various flag
> > fields, such as End-of-packet (EOP) and Report-status (RS) are the same,
> > as are offload definitions so consolidate them.
> 
> Nitpick: the NOTE should IMO be separated, as otherwise the flow of the
> commit message is a bit confusing, as the latter part kinda parses as being
> part of the NOTE section that accidentally got left out of the parens,
> instead of being a continuation of the pre-NOTE section.
> 
Splitting into separate paragraphs for v5.

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 05/35] net/intel: add common fn to calculate needed descriptors
  2026-02-10 14:17         ` Burakov, Anatoly
@ 2026-02-10 17:25           ` Bruce Richardson
  2026-02-11  9:14             ` Burakov, Anatoly
  0 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-10 17:25 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev, Vladimir Medvedkin, Jingjing Wu, Praveen Shetty

On Tue, Feb 10, 2026 at 03:17:32PM +0100, Burakov, Anatoly wrote:
> On 2/10/2026 3:08 PM, Bruce Richardson wrote:
> > On Tue, Feb 10, 2026 at 01:29:48PM +0100, Burakov, Anatoly wrote:
> > > On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> > > > Multiple drivers used the same logic to calculate how many Tx data
> > > > descriptors were needed. Move that calculation to common code. In the
> > > > process of updating drivers, fix idpf driver calculation for the TSO
> > > > case.
> > > > 
> > > 
> > > "Fix TSO for idpf" sounds like a bugfix? Can it be backported to stable?
> > > 
> > Yes, it is a bug fix for a particular edge case. However, as done here, the
> > fix is implied by the code changes in the consolidation, and depends upon
> > them. Any fix for backport would need to be a different, standalone patch,
> > based on this.
> > 
> > /Bruce
> 
> So the original code didn't have TSO at all? I.e. this can't be fixed as a
> prerequisite patch to this patchset?
> 
Original code did have TSO, it just didn't support the case where a single
mbuf segment had more data than the max allowed to be described by a single
descriptor, i.e. where we had multiple descriptors for one mbuf segment, so
that "nb_descs != mbuf->nb_segs + ctx_descs". The other drivers all solve
this by having a separate function that iterated through the descriptors,
and to fix this in older releases would be to add such a function to this
driver. There is little point adding that function in this series just to
delete it later here, so I think for 26.03 the fix here is best and for
earlier releases the best fix is to just put the necessary code directly
into idpf driver. I've noted this down to do a backported patch after this
series goes in.

/Bruce

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 06/35] net/ice: refactor context descriptor handling
  2026-02-10 12:42     ` Burakov, Anatoly
@ 2026-02-10 17:40       ` Bruce Richardson
  2026-02-11  9:17         ` Burakov, Anatoly
  0 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-10 17:40 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev

On Tue, Feb 10, 2026 at 01:42:17PM +0100, Burakov, Anatoly wrote:
> On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> > Create a single function to manage all context descriptor handling,
> > which returns either 0 or 1 depending on whether a descriptor is needed
> > or not, as well as returning directly the descriptor contents if
> > relevant.
> > 
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > ---
> 
> <snip>
> 
> > +static __rte_always_inline uint16_t
> > +get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
> > +	const union ci_tx_offload *tx_offload, const struct ci_tx_queue *txq,
> > +	uint64_t *qw0, uint64_t *qw1)
> > +{
> > +	uint16_t cd_l2tag2 = 0;
> > +	uint64_t cd_type_cmd_tso_mss = ICE_TX_DESC_DTYPE_CTX;
> > +	uint32_t cd_tunneling_params = 0;
> > +	uint64_t ptp_tx_index = txq->ice_vsi->adapter->ptp_tx_index;
> > +
> > +	if (ice_calc_context_desc(ol_flags) == 0)
> > +		return 0;
> > +
> > +	if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)
> > +		ice_parse_tunneling_params(ol_flags, *tx_offload, &cd_tunneling_params);
> > +
> > +	if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
> > +		cd_type_cmd_tso_mss |= ice_set_tso_ctx(ol_flags, tx_pkt, *tx_offload);
> > +	else if (ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST)
> > +		cd_type_cmd_tso_mss |=
> > +			((uint64_t)CI_TX_CTX_DESC_TSYN << CI_TXD_QW1_CMD_S) |
> > +			((ptp_tx_index << ICE_TXD_CTX_QW1_TSYN_S) & ICE_TXD_CTX_QW1_TSYN_M);
> 
> It's tangentially related to this commit but it caught my attention that TSO
> and timestamping are mutually exclusive here. They *are* mutually exclusive
> as far as the driver is concerned so that part is fine, but I couldn't find
> any signs of us enforcing this limitation anywhere in our configuration
> path, so a well behaved application could theoretically arrive at this
> combination of mbuf flags without breaking anything.
> 
> (if I understand things correctly, this applies to both ice and i40e)
> 
Yes, you are correct here. However, I'm not sure if we can or should
enforce this, as it is completely possible to have a queue where some
packets are sent with TSO and others are sent with the timesync flag on
them. There is no way for the actual Tx function to flag a bad packet.
Best we can possibly do is add a check to the pre-Tx packet prepare
function. WDYT?

/Bruce

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 07/35] net/i40e: refactor context descriptor handling
  2026-02-10 14:19         ` Burakov, Anatoly
@ 2026-02-10 17:54           ` Bruce Richardson
  2026-02-11  9:20             ` Burakov, Anatoly
  0 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-10 17:54 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev

On Tue, Feb 10, 2026 at 03:19:27PM +0100, Burakov, Anatoly wrote:
> On 2/10/2026 3:10 PM, Bruce Richardson wrote:
> > On Tue, Feb 10, 2026 at 01:48:20PM +0100, Burakov, Anatoly wrote:
> > > On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> > > > Move all context descriptor handling to a single function, as with
> > > > the ice driver, and use the same function signature as that driver.
> > > > 
> > > > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> ---
> > > 
> > > <snip>
> > > 
> > > > +static __rte_always_inline uint16_t +get_context_desc(uint64_t
> > > > ol_flags, const struct rte_mbuf *tx_pkt, +		 const
> > > > union ci_tx_offload *tx_offload, +		 const struct
> > > > ci_tx_queue *txq __rte_unused, +		 uint64_t *qw0,
> > > > uint64_t *qw1) +{ +	uint16_t cd_l2tag2 = 0; +	uint64_t
> > > > cd_type_cmd_tso_mss = I40E_TX_DESC_DTYPE_CONTEXT; +	uint32_t
> > > > cd_tunneling_params = 0; + +	if
> > > > (i40e_calc_context_desc(ol_flags) == 0) +		return 0; +
> > > > +	if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) +
> > > > i40e_parse_tunneling_params(ol_flags, *tx_offload,
> > > > &cd_tunneling_params); + +	if (ol_flags &
> > > > RTE_MBUF_F_TX_TCP_SEG) { +		cd_type_cmd_tso_mss |=
> > > > i40e_set_tso_ctx(ol_flags, tx_pkt, *tx_offload); +	} else {
> > > > +#ifdef RTE_LIBRTE_IEEE1588 +		if (ol_flags &
> > > > RTE_MBUF_F_TX_IEEE1588_TMST) +			cd_type_cmd_tso_mss
> > > > |= +				((uint64_t)I40E_TX_CTX_DESC_TSYN <<
> > > > I40E_TXD_CTX_QW1_CMD_SHIFT); +#endif
> > > 
> > > I couldn't find any places where we define this, it appears to be
> > > some sort of legacy define, making this basically dead code?
> > > 
> > 
> > It is legacy, and does need to be fixed, but across all of DPDK I
> > think.  Testpmd, for example, has IEEE1588 ifdefs also.
> > 
> > However, for this patch, it's probably harmless enough to remove the
> > ifdef here and always allow this code path to execute.
> > 
> > /Bruce
> 
> Sure, but I would've preferred this to be a separate patch as it's
> semantically different from what you're doing here. Perhaps it can be
> fixed as one of the early patches in the series as "preparatory work" for
> this one.
> 

I don't think it belongs in this series at all, actually. The IEEE1588
define appears in multiple drivers, not just Intel ones, as well as testpmd
(as I previously said). However, if you think it's worth patching it out
just for i40e I can add the patch to do so to this set.

/Bruce

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 10/35] net/intel: create common checksum Tx offload function
  2026-02-10 13:04     ` Burakov, Anatoly
@ 2026-02-10 17:56       ` Bruce Richardson
  0 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-10 17:56 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev

On Tue, Feb 10, 2026 at 02:04:55PM +0100, Burakov, Anatoly wrote:
> On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> > Since i40e and ice have the same checksum offload logic, merge their
> > functions into one. Future rework should enable this to be used by more
> > drivers also.
> > 
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> ---
> > drivers/net/intel/common/tx_scalar.h | 58 +++++++++++++++++++++++++++
> > drivers/net/intel/i40e/i40e_rxtx.c   | 52 +-----------------------
> > drivers/net/intel/i40e/i40e_rxtx.h   |  1 +
> > drivers/net/intel/ice/ice_rxtx.c     | 60 +---------------------------
> > drivers/net/intel/ice/ice_rxtx.h     |  1 + 5 files changed, 62
> > insertions(+), 110 deletions(-)
> > 
> > diff --git a/drivers/net/intel/common/tx_scalar.h
> > b/drivers/net/intel/common/tx_scalar.h index 573f5136a9..cf0dcb4b2c
> > 100644 --- a/drivers/net/intel/common/tx_scalar.h +++
> > b/drivers/net/intel/common/tx_scalar.h @@ -59,6 +59,64 @@
> > ci_tx_xmit_cleanup(struct ci_tx_queue *txq) return 0; } +/* Common
> > checksum enable function for Intel drivers (ice, i40e, etc.) */ +static
> > inline void +ci_txd_enable_checksum(uint64_t ol_flags, +
> > uint32_t *td_cmd, +		       uint32_t *td_offset, +
> > union ci_tx_offload tx_offload) +{ +	/* Enable L3 checksum
> > offloads */ +	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) { +
> > *td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM; +		*td_offset |=
> > (tx_offload.l3_len >> 2) << +
> > CI_TX_DESC_LEN_IPLEN_S; +	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
> > +		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4; +		*td_offset
> > |= (tx_offload.l3_len >> 2) << +
> > CI_TX_DESC_LEN_IPLEN_S; +	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
> > +		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6; +		*td_offset
> > |= (tx_offload.l3_len >> 2) << +
> > CI_TX_DESC_LEN_IPLEN_S; +	} + +	if (ol_flags &
> > RTE_MBUF_F_TX_TCP_SEG) { +		*td_cmd |=
> > CI_TX_DESC_CMD_L4T_EOFT_TCP; +		*td_offset |=
> > (tx_offload.l4_len >> 2) << +
> > CI_TX_DESC_LEN_L4_LEN_S; +		return; +	} + +	if
> > (ol_flags & RTE_MBUF_F_TX_UDP_SEG) { +		*td_cmd |=
> > CI_TX_DESC_CMD_L4T_EOFT_UDP; +		*td_offset |=
> > (tx_offload.l4_len >> 2) << +
> > CI_TX_DESC_LEN_L4_LEN_S; +		return; +	} + +	/* Enable
> > L4 checksum offloads */ +	switch (ol_flags & RTE_MBUF_F_TX_L4_MASK) {
> > +	case RTE_MBUF_F_TX_TCP_CKSUM: +		*td_cmd |=
> > CI_TX_DESC_CMD_L4T_EOFT_TCP; +		*td_offset |=
> > (sizeof(struct rte_tcp_hdr) >> 2) << +
> > CI_TX_DESC_LEN_L4_LEN_S; +		break; +	case
> > RTE_MBUF_F_TX_SCTP_CKSUM: +		*td_cmd |=
> > CI_TX_DESC_CMD_L4T_EOFT_SCTP; +		*td_offset |=
> > (sizeof(struct rte_sctp_hdr) >> 2) << +
> > CI_TX_DESC_LEN_L4_LEN_S; +		break; +	case
> > RTE_MBUF_F_TX_UDP_CKSUM: +		*td_cmd |=
> > CI_TX_DESC_CMD_L4T_EOFT_UDP; +		*td_offset |=
> > (sizeof(struct rte_udp_hdr) >> 2) << +
> > CI_TX_DESC_LEN_L4_LEN_S; +		break; +	default: +
> > break; +	}
> 
> Nitpick: some of the indentation here is inconststent. Perhaps enabling
> whitespace view in your editor would help, if you haven't done so?
> 
> (the inconsistency was already present in the ice function but that
> doesn't mean we have to copy it!)
> 
Yep, I suspect the indentation was meant to align with opening braces but
that of course gets messed up when renaming things in refactoring (or in
this case adding in an extra "*" dereference char). Will clean this up in
v5.

/Bruce

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 11/35] net/intel: create a common scalar Tx function
  2026-02-10 13:14     ` Burakov, Anatoly
@ 2026-02-10 18:03       ` Bruce Richardson
  2026-02-11  9:26         ` Burakov, Anatoly
  0 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-10 18:03 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev

On Tue, Feb 10, 2026 at 02:14:04PM +0100, Burakov, Anatoly wrote:
> On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> > Given the similarities between the transmit functions across various
> > Intel drivers, make a start on consolidating them by moving the ice Tx
> > function into common, for reuse by other drivers.
> > 
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > ---
> 
> > +	if (ts_fns != NULL)
> > +		ts_id = ts_fns->get_ts_tail(txq);
> > +
> > +	/* Check if the descriptor ring needs to be cleaned. */
> > +	if (txq->nb_tx_free < txq->tx_free_thresh)
> > +		(void)ci_tx_xmit_cleanup(txq);
> 
> Why (void) ?
>

Not sure, it seems superfluous, but I think it may help with some static
analysis perhaps? I've seen some tools warn you if you generally check a
return value from a function but fail to do so in one place. The other
places where the xmit_cleanup function is called here the return value is
checked each time, making this the outlier and so an explicit "void"
doesn't hurt.

If you feel it should be removed, I can do so though....
 
> 
> > +		if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
> > +			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
> > +		else
> > +			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
> > +		tx_last = (uint16_t)(tx_id + nb_used - 1);
> > +
> > +		/* Circular ring */
> 
> nicholas_cage_you_dont_say.jpg
> 
> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
> 
> -- 
> Thanks,
> Anatoly

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 14/35] net/intel: support configurable VLAN tag insertion on Tx
  2026-02-10 13:21     ` Burakov, Anatoly
@ 2026-02-10 18:20       ` Bruce Richardson
  2026-02-11  9:29         ` Burakov, Anatoly
  0 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-10 18:20 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev

On Tue, Feb 10, 2026 at 02:21:49PM +0100, Burakov, Anatoly wrote:
> On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> > Make the VLAN tag insertion logic configurable in the common code, as to
> > where inner/outer tags get placed.
> > 
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > ---
> 
> Hi Bruce,
> I might be missing something obvious, but...
> 
> > -		/* Descriptor based VLAN insertion */
> > -		if (ol_flags & (RTE_MBUF_F_TX_VLAN | RTE_MBUF_F_TX_QINQ)) {
> > +		/* Descriptor based VLAN/QinQ insertion */
> > +		/* for single vlan offload, only insert in data desc with VLAN_IN_L2TAG1 is set
> > +		 * for qinq offload, we always put inner tag in L2Tag1
> > +		 */
> > +		if (((ol_flags & RTE_MBUF_F_TX_VLAN) && l2tag1_field == CI_VLAN_IN_L2TAG1) ||
> > +				(ol_flags & RTE_MBUF_F_TX_QINQ)) {
> >   			td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
> >   			td_tag = tx_pkt->vlan_tci;
> >   		}
> 
> I can see that we insert VLAN tag on TX_VLAN and VLAN_IN_TAG1. But then...
> 
> 
> > @@ -1004,8 +1004,7 @@ get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
> >   	/* TX context descriptor based double VLAN insert */
> >   	if (ol_flags & RTE_MBUF_F_TX_QINQ) {
> >   		cd_l2tag2 = tx_pkt->vlan_tci_outer;
> > -		cd_type_cmd_tso_mss |=
> > -				((uint64_t)I40E_TX_CTX_DESC_IL2TAG2 << I40E_TXD_CTX_QW1_CMD_SHIFT);
> > +		cd_type_cmd_tso_mss |= (I40E_TX_CTX_DESC_IL2TAG2 << I40E_TXD_CTX_QW1_CMD_SHIFT);
> 
> this logic is only triggered for QinQ. Meaning, there's nowhere we insert
> VLAN tag on TX_VLAN and VLAN_IN_TAG2?
> 
For VLAN_IN_L2TAG2 the vlan tag goes in the context descriptor, and
creating the context descriptor is the responsibility of each individual
driver. The two drivers using this common path, i40e and ice both put the
tag in the TAG1 slot, so don't have a path to add it to the TAG2 slot. We
make it configurable in this patch so that in the next patches we can add
in support for the iavf driver - which does support putting the VLAN tag in
the Tag2 slot, in it's context descriptor function.

/Bruce

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 15/35] net/iavf: use common scalar Tx function
  2026-02-10 13:27     ` Burakov, Anatoly
@ 2026-02-10 18:31       ` Bruce Richardson
  0 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-10 18:31 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev, Vladimir Medvedkin

On Tue, Feb 10, 2026 at 02:27:18PM +0100, Burakov, Anatoly wrote:
> On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> > Now that the common scalar Tx function has all necessary hooks for the
> > features supported by the iavf driver, use the common function to avoid
> > duplicated code.
> > 
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > ---
> 
> 
> > +		}
> > -		return;
> > +		/* TSO segmentation field */
> > +		tlen = iavf_fill_ctx_desc_segmentation_field(&cd_type_cmd,
> > +							     mbuf, ipsec_md);
> > +		(void)tlen; /* Suppress unused variable warning */
> 
> RTE_SET_USED?
> 
> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
> 
since tlen is a local variable and is only used for the return value here,
I can just get rid of it, saving having to pretend it's actually being used..

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 05/35] net/intel: add common fn to calculate needed descriptors
  2026-02-10 17:25           ` Bruce Richardson
@ 2026-02-11  9:14             ` Burakov, Anatoly
  0 siblings, 0 replies; 274+ messages in thread
From: Burakov, Anatoly @ 2026-02-11  9:14 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev, Vladimir Medvedkin, Jingjing Wu, Praveen Shetty

On 2/10/2026 6:25 PM, Bruce Richardson wrote:
> On Tue, Feb 10, 2026 at 03:17:32PM +0100, Burakov, Anatoly wrote:
>> On 2/10/2026 3:08 PM, Bruce Richardson wrote:
>>> On Tue, Feb 10, 2026 at 01:29:48PM +0100, Burakov, Anatoly wrote:
>>>> On 2/9/2026 5:45 PM, Bruce Richardson wrote:
>>>>> Multiple drivers used the same logic to calculate how many Tx data
>>>>> descriptors were needed. Move that calculation to common code. In the
>>>>> process of updating drivers, fix idpf driver calculation for the TSO
>>>>> case.
>>>>>
>>>>
>>>> "Fix TSO for idpf" sounds like a bugfix? Can it be backported to stable?
>>>>
>>> Yes, it is a bug fix for a particular edge case. However, as done here, the
>>> fix is implied by the code changes in the consolidation, and depends upon
>>> them. Any fix for backport would need to be a different, standalone patch,
>>> based on this.
>>>
>>> /Bruce
>>
>> So the original code didn't have TSO at all? I.e. this can't be fixed as a
>> prerequisite patch to this patchset?
>>
> Original code did have TSO, it just didn't support the case where a single
> mbuf segment had more data than the max allowed to be described by a single
> descriptor, i.e. where we had multiple descriptors for one mbuf segment, so
> that "nb_descs != mbuf->nb_segs + ctx_descs". The other drivers all solve
> this by having a separate function that iterated through the descriptors,
> and to fix this in older releases would be to add such a function to this
> driver. There is little point adding that function in this series just to
> delete it later here, so I think for 26.03 the fix here is best and for
> earlier releases the best fix is to just put the necessary code directly
> into idpf driver. I've noted this down to do a backported patch after this
> series goes in.
> 
> /Bruce

Got it, thanks!

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 06/35] net/ice: refactor context descriptor handling
  2026-02-10 17:40       ` Bruce Richardson
@ 2026-02-11  9:17         ` Burakov, Anatoly
  2026-02-11 10:38           ` Bruce Richardson
  0 siblings, 1 reply; 274+ messages in thread
From: Burakov, Anatoly @ 2026-02-11  9:17 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

On 2/10/2026 6:40 PM, Bruce Richardson wrote:
> On Tue, Feb 10, 2026 at 01:42:17PM +0100, Burakov, Anatoly wrote:
>> On 2/9/2026 5:45 PM, Bruce Richardson wrote:
>>> Create a single function to manage all context descriptor handling,
>>> which returns either 0 or 1 depending on whether a descriptor is needed
>>> or not, as well as returning directly the descriptor contents if
>>> relevant.
>>>
>>> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
>>> ---
>>
>> <snip>
>>
>>> +static __rte_always_inline uint16_t
>>> +get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
>>> +	const union ci_tx_offload *tx_offload, const struct ci_tx_queue *txq,
>>> +	uint64_t *qw0, uint64_t *qw1)
>>> +{
>>> +	uint16_t cd_l2tag2 = 0;
>>> +	uint64_t cd_type_cmd_tso_mss = ICE_TX_DESC_DTYPE_CTX;
>>> +	uint32_t cd_tunneling_params = 0;
>>> +	uint64_t ptp_tx_index = txq->ice_vsi->adapter->ptp_tx_index;
>>> +
>>> +	if (ice_calc_context_desc(ol_flags) == 0)
>>> +		return 0;
>>> +
>>> +	if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)
>>> +		ice_parse_tunneling_params(ol_flags, *tx_offload, &cd_tunneling_params);
>>> +
>>> +	if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
>>> +		cd_type_cmd_tso_mss |= ice_set_tso_ctx(ol_flags, tx_pkt, *tx_offload);
>>> +	else if (ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST)
>>> +		cd_type_cmd_tso_mss |=
>>> +			((uint64_t)CI_TX_CTX_DESC_TSYN << CI_TXD_QW1_CMD_S) |
>>> +			((ptp_tx_index << ICE_TXD_CTX_QW1_TSYN_S) & ICE_TXD_CTX_QW1_TSYN_M);
>>
>> It's tangentially related to this commit but it caught my attention that TSO
>> and timestamping are mutually exclusive here. They *are* mutually exclusive
>> as far as the driver is concerned so that part is fine, but I couldn't find
>> any signs of us enforcing this limitation anywhere in our configuration
>> path, so a well behaved application could theoretically arrive at this
>> combination of mbuf flags without breaking anything.
>>
>> (if I understand things correctly, this applies to both ice and i40e)
>>
> Yes, you are correct here. However, I'm not sure if we can or should
> enforce this, as it is completely possible to have a queue where some
> packets are sent with TSO and others are sent with the timesync flag on
> them. There is no way for the actual Tx function to flag a bad packet.
> Best we can possibly do is add a check to the pre-Tx packet prepare
> function. WDYT?
> 
> /Bruce

Now that I think of it, I believe these features do not make logical 
sense together anyway (TSO means segmented packets while timestamping 
means you have one packet you timestamp) so perhaps this can be 
considered user error? I mean we could add a check if it doesn't hurt 
performance, but maybe this isn't a problem we need to solve.

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 07/35] net/i40e: refactor context descriptor handling
  2026-02-10 17:54           ` Bruce Richardson
@ 2026-02-11  9:20             ` Burakov, Anatoly
  2026-02-11 12:04               ` Bruce Richardson
  0 siblings, 1 reply; 274+ messages in thread
From: Burakov, Anatoly @ 2026-02-11  9:20 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

On 2/10/2026 6:54 PM, Bruce Richardson wrote:
> On Tue, Feb 10, 2026 at 03:19:27PM +0100, Burakov, Anatoly wrote:
>> On 2/10/2026 3:10 PM, Bruce Richardson wrote:
>>> On Tue, Feb 10, 2026 at 01:48:20PM +0100, Burakov, Anatoly wrote:
>>>> On 2/9/2026 5:45 PM, Bruce Richardson wrote:
>>>>> Move all context descriptor handling to a single function, as with
>>>>> the ice driver, and use the same function signature as that driver.
>>>>>
>>>>> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> ---
>>>>
>>>> <snip>
>>>>
>>>>> +static __rte_always_inline uint16_t +get_context_desc(uint64_t
>>>>> ol_flags, const struct rte_mbuf *tx_pkt, +		 const
>>>>> union ci_tx_offload *tx_offload, +		 const struct
>>>>> ci_tx_queue *txq __rte_unused, +		 uint64_t *qw0,
>>>>> uint64_t *qw1) +{ +	uint16_t cd_l2tag2 = 0; +	uint64_t
>>>>> cd_type_cmd_tso_mss = I40E_TX_DESC_DTYPE_CONTEXT; +	uint32_t
>>>>> cd_tunneling_params = 0; + +	if
>>>>> (i40e_calc_context_desc(ol_flags) == 0) +		return 0; +
>>>>> +	if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) +
>>>>> i40e_parse_tunneling_params(ol_flags, *tx_offload,
>>>>> &cd_tunneling_params); + +	if (ol_flags &
>>>>> RTE_MBUF_F_TX_TCP_SEG) { +		cd_type_cmd_tso_mss |=
>>>>> i40e_set_tso_ctx(ol_flags, tx_pkt, *tx_offload); +	} else {
>>>>> +#ifdef RTE_LIBRTE_IEEE1588 +		if (ol_flags &
>>>>> RTE_MBUF_F_TX_IEEE1588_TMST) +			cd_type_cmd_tso_mss
>>>>> |= +				((uint64_t)I40E_TX_CTX_DESC_TSYN <<
>>>>> I40E_TXD_CTX_QW1_CMD_SHIFT); +#endif
>>>>
>>>> I couldn't find any places where we define this, it appears to be
>>>> some sort of legacy define, making this basically dead code?
>>>>
>>>
>>> It is legacy, and does need to be fixed, but across all of DPDK I
>>> think.  Testpmd, for example, has IEEE1588 ifdefs also.
>>>
>>> However, for this patch, it's probably harmless enough to remove the
>>> ifdef here and always allow this code path to execute.
>>>
>>> /Bruce
>>
>> Sure, but I would've preferred this to be a separate patch as it's
>> semantically different from what you're doing here. Perhaps it can be
>> fixed as one of the early patches in the series as "preparatory work" for
>> this one.
>>
> 
> I don't think it belongs in this series at all, actually. The IEEE1588
> define appears in multiple drivers, not just Intel ones, as well as testpmd
> (as I previously said). However, if you think it's worth patching it out
> just for i40e I can add the patch to do so to this set.
> 
> /Bruce

I don't have strong feelings about *when* to remove it, but if we are to 
remove it, IMO we should do it as a separate patch, not as a rolled-in 
change into this one. However, I don't think it's a big issue so if you 
think it's not worth the rework then that's fine.

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 11/35] net/intel: create a common scalar Tx function
  2026-02-10 18:03       ` Bruce Richardson
@ 2026-02-11  9:26         ` Burakov, Anatoly
  0 siblings, 0 replies; 274+ messages in thread
From: Burakov, Anatoly @ 2026-02-11  9:26 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

On 2/10/2026 7:03 PM, Bruce Richardson wrote:
> On Tue, Feb 10, 2026 at 02:14:04PM +0100, Burakov, Anatoly wrote:
>> On 2/9/2026 5:45 PM, Bruce Richardson wrote:
>>> Given the similarities between the transmit functions across various
>>> Intel drivers, make a start on consolidating them by moving the ice Tx
>>> function into common, for reuse by other drivers.
>>>
>>> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
>>> ---
>>
>>> +	if (ts_fns != NULL)
>>> +		ts_id = ts_fns->get_ts_tail(txq);
>>> +
>>> +	/* Check if the descriptor ring needs to be cleaned. */
>>> +	if (txq->nb_tx_free < txq->tx_free_thresh)
>>> +		(void)ci_tx_xmit_cleanup(txq);
>>
>> Why (void) ?
>>
> 
> Not sure, it seems superfluous, but I think it may help with some static
> analysis perhaps? I've seen some tools warn you if you generally check a
> return value from a function but fail to do so in one place. The other
> places where the xmit_cleanup function is called here the return value is
> checked each time, making this the outlier and so an explicit "void"
> doesn't hurt.
> 
> If you feel it should be removed, I can do so though....

If you need to ignore the value perhaps do RTE_SET_USED()? I mean it's 
usually used for variables, so perhaps (void) is fine, in which case you 
can disregard this.

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 14/35] net/intel: support configurable VLAN tag insertion on Tx
  2026-02-10 18:20       ` Bruce Richardson
@ 2026-02-11  9:29         ` Burakov, Anatoly
  2026-02-11 14:19           ` Bruce Richardson
  0 siblings, 1 reply; 274+ messages in thread
From: Burakov, Anatoly @ 2026-02-11  9:29 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

On 2/10/2026 7:20 PM, Bruce Richardson wrote:
> On Tue, Feb 10, 2026 at 02:21:49PM +0100, Burakov, Anatoly wrote:
>> On 2/9/2026 5:45 PM, Bruce Richardson wrote:
>>> Make the VLAN tag insertion logic configurable in the common code, as to
>>> where inner/outer tags get placed.
>>>
>>> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
>>> ---
>>
>> Hi Bruce,
>> I might be missing something obvious, but...
>>
>>> -		/* Descriptor based VLAN insertion */
>>> -		if (ol_flags & (RTE_MBUF_F_TX_VLAN | RTE_MBUF_F_TX_QINQ)) {
>>> +		/* Descriptor based VLAN/QinQ insertion */
>>> +		/* for single vlan offload, only insert in data desc with VLAN_IN_L2TAG1 is set
>>> +		 * for qinq offload, we always put inner tag in L2Tag1
>>> +		 */
>>> +		if (((ol_flags & RTE_MBUF_F_TX_VLAN) && l2tag1_field == CI_VLAN_IN_L2TAG1) ||
>>> +				(ol_flags & RTE_MBUF_F_TX_QINQ)) {
>>>    			td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
>>>    			td_tag = tx_pkt->vlan_tci;
>>>    		}
>>
>> I can see that we insert VLAN tag on TX_VLAN and VLAN_IN_TAG1. But then...
>>
>>
>>> @@ -1004,8 +1004,7 @@ get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
>>>    	/* TX context descriptor based double VLAN insert */
>>>    	if (ol_flags & RTE_MBUF_F_TX_QINQ) {
>>>    		cd_l2tag2 = tx_pkt->vlan_tci_outer;
>>> -		cd_type_cmd_tso_mss |=
>>> -				((uint64_t)I40E_TX_CTX_DESC_IL2TAG2 << I40E_TXD_CTX_QW1_CMD_SHIFT);
>>> +		cd_type_cmd_tso_mss |= (I40E_TX_CTX_DESC_IL2TAG2 << I40E_TXD_CTX_QW1_CMD_SHIFT);
>>
>> this logic is only triggered for QinQ. Meaning, there's nowhere we insert
>> VLAN tag on TX_VLAN and VLAN_IN_TAG2?
>>
> For VLAN_IN_L2TAG2 the vlan tag goes in the context descriptor, and
> creating the context descriptor is the responsibility of each individual
> driver. The two drivers using this common path, i40e and ice both put the
> tag in the TAG1 slot, so don't have a path to add it to the TAG2 slot. We
> make it configurable in this patch so that in the next patches we can add
> in support for the iavf driver - which does support putting the VLAN tag in
> the Tag2 slot, in it's context descriptor function.
> 
> /Bruce

Perhaps we need to document that more explicitly then, and double-check 
that all driver paths actually do this in all required cases (i.e. not 
just for QinQ but also for regular VLAN insertion case).

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 06/35] net/ice: refactor context descriptor handling
  2026-02-11  9:17         ` Burakov, Anatoly
@ 2026-02-11 10:38           ` Bruce Richardson
  0 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 10:38 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev

On Wed, Feb 11, 2026 at 10:17:34AM +0100, Burakov, Anatoly wrote:
> On 2/10/2026 6:40 PM, Bruce Richardson wrote:
> > On Tue, Feb 10, 2026 at 01:42:17PM +0100, Burakov, Anatoly wrote:
> > > On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> > > > Create a single function to manage all context descriptor handling,
> > > > which returns either 0 or 1 depending on whether a descriptor is needed
> > > > or not, as well as returning directly the descriptor contents if
> > > > relevant.
> > > > 
> > > > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > > > ---
> > > 
> > > <snip>
> > > 
> > > > +static __rte_always_inline uint16_t
> > > > +get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
> > > > +	const union ci_tx_offload *tx_offload, const struct ci_tx_queue *txq,
> > > > +	uint64_t *qw0, uint64_t *qw1)
> > > > +{
> > > > +	uint16_t cd_l2tag2 = 0;
> > > > +	uint64_t cd_type_cmd_tso_mss = ICE_TX_DESC_DTYPE_CTX;
> > > > +	uint32_t cd_tunneling_params = 0;
> > > > +	uint64_t ptp_tx_index = txq->ice_vsi->adapter->ptp_tx_index;
> > > > +
> > > > +	if (ice_calc_context_desc(ol_flags) == 0)
> > > > +		return 0;
> > > > +
> > > > +	if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)
> > > > +		ice_parse_tunneling_params(ol_flags, *tx_offload, &cd_tunneling_params);
> > > > +
> > > > +	if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
> > > > +		cd_type_cmd_tso_mss |= ice_set_tso_ctx(ol_flags, tx_pkt, *tx_offload);
> > > > +	else if (ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST)
> > > > +		cd_type_cmd_tso_mss |=
> > > > +			((uint64_t)CI_TX_CTX_DESC_TSYN << CI_TXD_QW1_CMD_S) |
> > > > +			((ptp_tx_index << ICE_TXD_CTX_QW1_TSYN_S) & ICE_TXD_CTX_QW1_TSYN_M);
> > > 
> > > It's tangentially related to this commit but it caught my attention that TSO
> > > and timestamping are mutually exclusive here. They *are* mutually exclusive
> > > as far as the driver is concerned so that part is fine, but I couldn't find
> > > any signs of us enforcing this limitation anywhere in our configuration
> > > path, so a well behaved application could theoretically arrive at this
> > > combination of mbuf flags without breaking anything.
> > > 
> > > (if I understand things correctly, this applies to both ice and i40e)
> > > 
> > Yes, you are correct here. However, I'm not sure if we can or should
> > enforce this, as it is completely possible to have a queue where some
> > packets are sent with TSO and others are sent with the timesync flag on
> > them. There is no way for the actual Tx function to flag a bad packet.
> > Best we can possibly do is add a check to the pre-Tx packet prepare
> > function. WDYT?
> > 
> > /Bruce
> 
> Now that I think of it, I believe these features do not make logical sense
> together anyway (TSO means segmented packets while timestamping means you
> have one packet you timestamp) so perhaps this can be considered user error?
> I mean we could add a check if it doesn't hurt performance, but maybe this
> isn't a problem we need to solve.
> 
I wouldn't put a datapath check for this. Maybe one in tx_prepare just, but
even then I'm not sure of the cost/benefit.

/Bruce

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 07/35] net/i40e: refactor context descriptor handling
  2026-02-11  9:20             ` Burakov, Anatoly
@ 2026-02-11 12:04               ` Bruce Richardson
  0 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 12:04 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev

On Wed, Feb 11, 2026 at 10:20:40AM +0100, Burakov, Anatoly wrote:
> On 2/10/2026 6:54 PM, Bruce Richardson wrote:
> > On Tue, Feb 10, 2026 at 03:19:27PM +0100, Burakov, Anatoly wrote:
> > > On 2/10/2026 3:10 PM, Bruce Richardson wrote:
> > > > On Tue, Feb 10, 2026 at 01:48:20PM +0100, Burakov, Anatoly wrote:
> > > > > On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> > > > > > Move all context descriptor handling to a single function, as with
> > > > > > the ice driver, and use the same function signature as that driver.
> > > > > > 
> > > > > > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> ---
> > > > > 
> > > > > <snip>
> > > > > 
> > > > > > +static __rte_always_inline uint16_t +get_context_desc(uint64_t
> > > > > > ol_flags, const struct rte_mbuf *tx_pkt, +		 const
> > > > > > union ci_tx_offload *tx_offload, +		 const struct
> > > > > > ci_tx_queue *txq __rte_unused, +		 uint64_t *qw0,
> > > > > > uint64_t *qw1) +{ +	uint16_t cd_l2tag2 = 0; +	uint64_t
> > > > > > cd_type_cmd_tso_mss = I40E_TX_DESC_DTYPE_CONTEXT; +	uint32_t
> > > > > > cd_tunneling_params = 0; + +	if
> > > > > > (i40e_calc_context_desc(ol_flags) == 0) +		return 0; +
> > > > > > +	if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) +
> > > > > > i40e_parse_tunneling_params(ol_flags, *tx_offload,
> > > > > > &cd_tunneling_params); + +	if (ol_flags &
> > > > > > RTE_MBUF_F_TX_TCP_SEG) { +		cd_type_cmd_tso_mss |=
> > > > > > i40e_set_tso_ctx(ol_flags, tx_pkt, *tx_offload); +	} else {
> > > > > > +#ifdef RTE_LIBRTE_IEEE1588 +		if (ol_flags &
> > > > > > RTE_MBUF_F_TX_IEEE1588_TMST) +			cd_type_cmd_tso_mss
> > > > > > |= +				((uint64_t)I40E_TX_CTX_DESC_TSYN <<
> > > > > > I40E_TXD_CTX_QW1_CMD_SHIFT); +#endif
> > > > > 
> > > > > I couldn't find any places where we define this, it appears to be
> > > > > some sort of legacy define, making this basically dead code?
> > > > > 
> > > > 
> > > > It is legacy, and does need to be fixed, but across all of DPDK I
> > > > think.  Testpmd, for example, has IEEE1588 ifdefs also.
> > > > 
> > > > However, for this patch, it's probably harmless enough to remove the
> > > > ifdef here and always allow this code path to execute.
> > > > 
> > > > /Bruce
> > > 
> > > Sure, but I would've preferred this to be a separate patch as it's
> > > semantically different from what you're doing here. Perhaps it can be
> > > fixed as one of the early patches in the series as "preparatory work" for
> > > this one.
> > > 
> > 
> > I don't think it belongs in this series at all, actually. The IEEE1588
> > define appears in multiple drivers, not just Intel ones, as well as testpmd
> > (as I previously said). However, if you think it's worth patching it out
> > just for i40e I can add the patch to do so to this set.
> > 
> > /Bruce
> 
> I don't have strong feelings about *when* to remove it, but if we are to
> remove it, IMO we should do it as a separate patch, not as a rolled-in
> change into this one. However, I don't think it's a big issue so if you
> think it's not worth the rework then that's fine.
> 
Ack.

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 14/35] net/intel: support configurable VLAN tag insertion on Tx
  2026-02-11  9:29         ` Burakov, Anatoly
@ 2026-02-11 14:19           ` Bruce Richardson
  0 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 14:19 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev

On Wed, Feb 11, 2026 at 10:29:17AM +0100, Burakov, Anatoly wrote:
> On 2/10/2026 7:20 PM, Bruce Richardson wrote:
> > On Tue, Feb 10, 2026 at 02:21:49PM +0100, Burakov, Anatoly wrote:
> > > On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> > > > Make the VLAN tag insertion logic configurable in the common code, as to
> > > > where inner/outer tags get placed.
> > > > 
> > > > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > > > ---
> > > 
> > > Hi Bruce,
> > > I might be missing something obvious, but...
> > > 
> > > > -		/* Descriptor based VLAN insertion */
> > > > -		if (ol_flags & (RTE_MBUF_F_TX_VLAN | RTE_MBUF_F_TX_QINQ)) {
> > > > +		/* Descriptor based VLAN/QinQ insertion */
> > > > +		/* for single vlan offload, only insert in data desc with VLAN_IN_L2TAG1 is set
> > > > +		 * for qinq offload, we always put inner tag in L2Tag1
> > > > +		 */
> > > > +		if (((ol_flags & RTE_MBUF_F_TX_VLAN) && l2tag1_field == CI_VLAN_IN_L2TAG1) ||
> > > > +				(ol_flags & RTE_MBUF_F_TX_QINQ)) {
> > > >    			td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
> > > >    			td_tag = tx_pkt->vlan_tci;
> > > >    		}
> > > 
> > > I can see that we insert VLAN tag on TX_VLAN and VLAN_IN_TAG1. But then...
> > > 
> > > 
> > > > @@ -1004,8 +1004,7 @@ get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
> > > >    	/* TX context descriptor based double VLAN insert */
> > > >    	if (ol_flags & RTE_MBUF_F_TX_QINQ) {
> > > >    		cd_l2tag2 = tx_pkt->vlan_tci_outer;
> > > > -		cd_type_cmd_tso_mss |=
> > > > -				((uint64_t)I40E_TX_CTX_DESC_IL2TAG2 << I40E_TXD_CTX_QW1_CMD_SHIFT);
> > > > +		cd_type_cmd_tso_mss |= (I40E_TX_CTX_DESC_IL2TAG2 << I40E_TXD_CTX_QW1_CMD_SHIFT);
> > > 
> > > this logic is only triggered for QinQ. Meaning, there's nowhere we insert
> > > VLAN tag on TX_VLAN and VLAN_IN_TAG2?
> > > 
> > For VLAN_IN_L2TAG2 the vlan tag goes in the context descriptor, and
> > creating the context descriptor is the responsibility of each individual
> > driver. The two drivers using this common path, i40e and ice both put the
> > tag in the TAG1 slot, so don't have a path to add it to the TAG2 slot. We
> > make it configurable in this patch so that in the next patches we can add
> > in support for the iavf driver - which does support putting the VLAN tag in
> > the Tag2 slot, in it's context descriptor function.
> > 
> > /Bruce
> 
> Perhaps we need to document that more explicitly then, and double-check that
> all driver paths actually do this in all required cases (i.e. not just for
> QinQ but also for regular VLAN insertion case).
> 
 There's already a comment at the top of the enum definition, but I've also
expanded out the documentation on the L2TAG2 value to clarify that drivers
need to set the value themselves rather than relying on the common Tx code.

/Bruce

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 20/35] net/intel: write descriptors using non-volatile pointers
  2026-02-09 23:08     ` Morten Brørup
  2026-02-10  9:03       ` Bruce Richardson
@ 2026-02-11 14:44       ` Bruce Richardson
  1 sibling, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 14:44 UTC (permalink / raw)
  To: Morten Brørup; +Cc: dev

On Tue, Feb 10, 2026 at 12:08:44AM +0100, Morten Brørup wrote:
> > +static inline void
> > +write_txd(volatile void *txd, uint64_t qw0, uint64_t qw1)
> > +{
> > +	uint64_t *txd_qw = __rte_assume_aligned(RTE_CAST_PTR(void *,
> > txd), 16);
> > +
> > +	txd_qw[0] = rte_cpu_to_le_64(qw0);
> > +	txd_qw[1] = rte_cpu_to_le_64(qw1);
> > +}
> 
> How about using __rte_aligned() instead, something like this (untested):
> 
> struct __rte_aligned(16) txd_t {
> 	uint64_t	qw0;
> 	uint64_t	qw1;
> };
> 
> *RTE_CAST_PTR(volatile struct txd_t *, txd) = {
> 	rte_cpu_to_le_64(qw0),
> 	rte_cpu_to_le_64(qw1)
> };
> 
This approach works fine, and allows me to drop the previous patch too.
Updating to use a struct type in next revision.

/Bruce

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 20/35] net/intel: write descriptors using non-volatile pointers
  2026-02-10  9:28         ` Morten Brørup
@ 2026-02-11 14:44           ` Bruce Richardson
  0 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 14:44 UTC (permalink / raw)
  To: Morten Brørup; +Cc: dev

On Tue, Feb 10, 2026 at 10:28:10AM +0100, Morten Brørup wrote:
> > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > Sent: Tuesday, 10 February 2026 10.04
> > 
> > On Tue, Feb 10, 2026 at 12:08:44AM +0100, Morten Brørup wrote:
> > > > +static inline void
> > > > +write_txd(volatile void *txd, uint64_t qw0, uint64_t qw1)
> > > > +{
> > > > +	uint64_t *txd_qw = __rte_assume_aligned(RTE_CAST_PTR(void *,
> > > > txd), 16);
> > > > +
> > > > +	txd_qw[0] = rte_cpu_to_le_64(qw0);
> > > > +	txd_qw[1] = rte_cpu_to_le_64(qw1);
> > > > +}
> > >
> > > How about using __rte_aligned() instead, something like this
> > (untested):
> > >
> > > struct __rte_aligned(16) txd_t {
> > > 	uint64_t	qw0;
> > > 	uint64_t	qw1;
> > > };
> > 
> > I can see if this works for us...
> > 
> > >
> > > *RTE_CAST_PTR(volatile struct txd_t *, txd) = {
> > rte_cpu_to_le_64(qw0),
> > > rte_cpu_to_le_64(qw1) };
> > >
> > >
> > > And why strip the "volatile"?
> > >
> > 
> > For the descriptor writes, it doesn't matter the order in which the
> > descriptors and the descriptor fields are actually written, since the
> > NIC
> > relies upon the tail pointer update - which includes a fence - to
> > inform it
> > of when the descriptors are ready. The volatile is necessary for reads,
> > though, which is why the ring is marked as such, but for Tx it prevents
> > the
> > compiler from opportunistically e.g. converting two 64-byte writes into
> > a
> > 128-byte write.
> 
> Makes sense.
> Suggest that you spread out a few comments about this at the relevant locations in the source code.
> 
Adding an explanation as part of the write_txd function, which is where the
volatile gets cast away.

/Bruce

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 19/35] eal: add macro for marking assumed alignment
  2026-02-09 22:35     ` Morten Brørup
@ 2026-02-11 14:45       ` Bruce Richardson
  0 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 14:45 UTC (permalink / raw)
  To: Morten Brørup; +Cc: dev

On Mon, Feb 09, 2026 at 11:35:26PM +0100, Morten Brørup wrote:
> > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > Sent: Monday, 9 February 2026 17.45
> > 
> > Provide a common DPDK macro for the gcc/clang builtin
> > __rte_assume_aligned to mark pointers as pointing to something with
> > known minimum alignment.
> > 
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > ---
> >  lib/eal/include/rte_common.h | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/lib/eal/include/rte_common.h
> > b/lib/eal/include/rte_common.h
> > index 573bf4f2ce..51a2eaf8b4 100644
> > --- a/lib/eal/include/rte_common.h
> > +++ b/lib/eal/include/rte_common.h
> > @@ -121,6 +121,12 @@ extern "C" {
> >  #define __rte_aligned(a) __attribute__((__aligned__(a)))
> >  #endif
> > 
> > +#ifdef RTE_TOOLCHAIN_MSVC
> > +#define __rte_assume_aligned(ptr, align) (ptr)
> > +#else
> > +#define __rte_assume_aligned __builtin_assume_aligned
> > +#endif
> 
> The GCC/Clang macro supports the optional 3rd parameter (offset), but the MSVC doesn't.
> Maybe it's better to pass (ptr, align) to the GCC/Clang variant, so the API consistently only supports two parameters.
> 
> If the 3rd parameter ever becomes needed, it can be implemented as a new macro.
> 
> Also, a short description of the macro would be nice.
> 
> 
> Did you look into using e.g. __rte_assume((ptr % 16) == 0) instead?
> It's relevant if it has the desired effect for MSVC, which the macro in this patch doesn't.
>
Dropping this patch from the series, making the discussion moot, based on
your feedback on the next patch. Thanks. 

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v4 22/35] net/intel: mark mid-burst ring cleanup as unlikely
  2026-02-10 14:13       ` Bruce Richardson
@ 2026-02-11 18:12         ` Bruce Richardson
  0 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 18:12 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev

On Tue, Feb 10, 2026 at 02:13:49PM +0000, Bruce Richardson wrote:
> On Tue, Feb 10, 2026 at 02:36:56PM +0100, Burakov, Anatoly wrote:
> > On 2/9/2026 5:45 PM, Bruce Richardson wrote:
> > > It should rarely be the case that we need to cleanup the descriptor ring
> > > mid-burst, so mark as unlikely to help performance.
> > > 
> > > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > > ---
> > 
> > Does it measurably help performance? I'm by no means a performance person so
> > this isn't really my wheelhouse, but I remember enough patches removing
> > "unlikely() for performance reasons" to get suspicious!
> > 
> > -- 
> Yes, I tend to be suspicous too. I will reverify and, if not measurable,
> will drop this patch from the set.
> 
This patch doesn't help performance when tested at the end of the series.
Will drop from v5.

^ permalink raw reply	[flat|nested] 274+ messages in thread

* [PATCH v5 00/35] combine multiple Intel scalar Tx paths
  2025-12-19 17:25 [RFC PATCH 00/27] combine multiple Intel scalar Tx paths Bruce Richardson
                   ` (29 preceding siblings ...)
  2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
@ 2026-02-11 18:12 ` Bruce Richardson
  2026-02-11 18:12   ` [PATCH v5 01/35] net/intel: create common Tx descriptor structure Bruce Richardson
                     ` (35 more replies)
  30 siblings, 36 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 18:12 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

The scalar Tx paths, with support for offloads and multiple mbufs
per packet, are almost identical across drivers ice, i40e, iavf and
the single-queue mode of idpf. Therefore, we can do some rework to
combine these code paths into a single function which is parameterized
by compile-time constants, allowing code saving to give us a single
path to optimize and maintain - apart from edge cases like IPSec
support in iavf.

The ixgbe driver has a number of similarities too, which we take
advantage of where we can, but the overall descriptor format is
sufficiently different that its main scalar code path is kept
separate.

Once merged, we can then optimize the drivers a bit to improve
performance, and also easily extend some drivers to use additional
paths for better performance, e.g. add the "simple scalar" path
to IDPF driver for better performance on platforms without AVX.

V5:
- more updates following review including:
  * dropped patch 19 for new EAL macro, and used struct alignment instead
  * added extra comments for some code, e.g. reason to remove volatile
  * dropped patch 22 marking a branch as unlikely
  * split bugfix off patch 24
  * corrected idpf path selection logic to not assume in-order queue setup

V4:
- Updates following review:
  - merged patches 3 and 5
  - renamed new file to tx_scalar.h
  - added UDP_TSO flag to drivers
  - other minor fixups

V3:
- rebase on top of latest next-net-intel tree
- fix issues with iavf and cpfl drivers seen in some testing

V2:
 - reworked the simple-scalar path as well as full scalar one
 - added simple scalar path support to idpf driver
 - small cleanups, e.g. issues flagged by checkpatch


Bruce Richardson (36):
  net/intel: create common Tx descriptor structure
  net/intel: fix memory leak on TX queue setup failure
  net/intel: use common Tx ring structure
  net/intel: create common post-Tx cleanup function
  net/intel: consolidate definitions for Tx desc fields
  net/intel: add common fn to calculate needed descriptors
  net/ice: refactor context descriptor handling
  net/i40e: refactor context descriptor handling
  net/idpf: refactor context descriptor handling
  net/intel: consolidate checksum mask definition
  net/intel: create common checksum Tx offload function
  net/intel: create a common scalar Tx function
  net/i40e: use common scalar Tx function
  net/intel: add IPsec hooks to common Tx function
  net/intel: support configurable VLAN tag insertion on Tx
  net/iavf: use common scalar Tx function
  net/i40e: document requirement for QinQ support
  net/idpf: use common scalar Tx function
  net/intel: avoid writing the final pkt descriptor twice
  net/intel: write descriptors using non-volatile pointers
  net/intel: remove unnecessary flag clearing
  net/intel: mark mid-burst ring cleanup as unlikely
  net/intel: add special handling for single desc packets
  net/intel: use separate array for desc status tracking
  net/ixgbe: use separate array for desc status tracking
  net/intel: drop unused Tx queue used count
  net/intel: remove index for tracking end of packet
  net/intel: merge ring writes in simple Tx for ice and i40e
  net/intel: consolidate ice and i40e buffer free function
  net/intel: complete merging simple Tx paths
  net/intel: use non-volatile stores in simple Tx function
  net/intel: align scalar simple Tx path with vector logic
  net/intel: use vector SW ring entry for simple path
  net/intel: use vector mbuf cleanup from simple scalar path
  net/idpf: enable simple Tx function
  net/cpfl: enable simple Tx function

 doc/guides/nics/i40e.rst                      |  18 +
 drivers/net/intel/common/tx.h                 | 124 ++-
 drivers/net/intel/common/tx_scalar.h          | 589 ++++++++++++++
 drivers/net/intel/cpfl/cpfl_rxtx.c            |  64 +-
 drivers/net/intel/i40e/i40e_fdir.c            |  34 +-
 drivers/net/intel/i40e/i40e_rxtx.c            | 673 +++-------------
 drivers/net/intel/i40e/i40e_rxtx.h            |  17 +-
 .../net/intel/i40e/i40e_rxtx_vec_altivec.c    |  25 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c   |  36 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c |  52 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_common.h |   6 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_neon.c   |  25 +-
 drivers/net/intel/iavf/iavf_rxtx.c            | 637 ++++-----------
 drivers/net/intel/iavf/iavf_rxtx.h            |  31 +-
 drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c   |  55 +-
 drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c | 104 +--
 drivers/net/intel/iavf/iavf_rxtx_vec_common.h |  36 +-
 drivers/net/intel/ice/ice_dcf_ethdev.c        |  10 +-
 drivers/net/intel/ice/ice_rxtx.c              | 740 ++++--------------
 drivers/net/intel/ice/ice_rxtx.h              |  16 +-
 drivers/net/intel/ice/ice_rxtx_vec_avx2.c     |  55 +-
 drivers/net/intel/ice/ice_rxtx_vec_avx512.c   |  53 +-
 drivers/net/intel/ice/ice_rxtx_vec_common.h   |  43 +-
 drivers/net/intel/idpf/idpf_common_device.h   |   1 +
 drivers/net/intel/idpf/idpf_common_rxtx.c     | 314 ++------
 drivers/net/intel/idpf/idpf_common_rxtx.h     |  24 +-
 .../net/intel/idpf/idpf_common_rxtx_avx2.c    |  53 +-
 .../net/intel/idpf/idpf_common_rxtx_avx512.c  |  54 +-
 drivers/net/intel/idpf/idpf_rxtx.c            |  55 +-
 drivers/net/intel/idpf/idpf_rxtx_vec_common.h |   6 +-
 drivers/net/intel/ixgbe/ixgbe_rxtx.c          | 103 ++-
 .../net/intel/ixgbe/ixgbe_rxtx_vec_common.c   |   3 +-
 32 files changed, 1612 insertions(+), 2444 deletions(-)
 create mode 100644 drivers/net/intel/common/tx_scalar.h

--
2.51.0


^ permalink raw reply	[flat|nested] 274+ messages in thread

* [PATCH v5 01/35] net/intel: create common Tx descriptor structure
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
@ 2026-02-11 18:12   ` Bruce Richardson
  2026-02-11 18:12   ` [PATCH v5 02/35] net/intel: fix memory leak on TX queue setup failure Bruce Richardson
                     ` (34 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 18:12 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Ciara Loftus, Praveen Shetty,
	Vladimir Medvedkin, Anatoly Burakov, Jingjing Wu

The Tx descriptors used by the i40e, iavf, ice and idpf drivers are all
identical 16-byte descriptors, so define a common struct for them. Since
original struct definitions are in base code, leave them in place, but
only use the new structs in DPDK code.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Ciara Loftus <ciara.loftus@intel.com>
---
 drivers/net/intel/common/tx.h                 | 16 ++++++---
 drivers/net/intel/cpfl/cpfl_rxtx.c            |  2 +-
 drivers/net/intel/i40e/i40e_fdir.c            |  4 +--
 drivers/net/intel/i40e/i40e_rxtx.c            | 26 +++++++-------
 .../net/intel/i40e/i40e_rxtx_vec_altivec.c    |  6 ++--
 drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c   |  6 ++--
 drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c |  6 ++--
 drivers/net/intel/i40e/i40e_rxtx_vec_neon.c   |  6 ++--
 drivers/net/intel/iavf/iavf_rxtx.c            | 16 ++++-----
 drivers/net/intel/iavf/iavf_rxtx.h            |  2 +-
 drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c   |  6 ++--
 drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c | 12 +++----
 drivers/net/intel/ice/ice_dcf_ethdev.c        |  2 +-
 drivers/net/intel/ice/ice_rxtx.c              | 36 +++++++++----------
 drivers/net/intel/ice/ice_rxtx_vec_avx2.c     |  6 ++--
 drivers/net/intel/ice/ice_rxtx_vec_avx512.c   |  6 ++--
 drivers/net/intel/idpf/idpf_common_rxtx.c     | 20 +++++------
 drivers/net/intel/idpf/idpf_common_rxtx.h     |  2 +-
 .../net/intel/idpf/idpf_common_rxtx_avx2.c    |  8 ++---
 .../net/intel/idpf/idpf_common_rxtx_avx512.c  |  8 ++---
 drivers/net/intel/idpf/idpf_rxtx.c            |  2 +-
 drivers/net/intel/idpf/idpf_rxtx_vec_common.h |  2 +-
 22 files changed, 104 insertions(+), 96 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index e295d83e3a..d7561a2bbb 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -10,6 +10,14 @@
 #include <rte_ethdev.h>
 #include <rte_vect.h>
 
+/*
+ * Structure of a 16-byte Tx descriptor common across i40e, ice, iavf and idpf drivers
+ */
+struct ci_tx_desc {
+	uint64_t buffer_addr; /* Address of descriptor's data buf */
+	uint64_t cmd_type_offset_bsz;
+};
+
 /* forward declaration of the common intel (ci) queue structure */
 struct ci_tx_queue;
 
@@ -33,10 +41,10 @@ typedef void (*ice_tx_release_mbufs_t)(struct ci_tx_queue *txq);
 
 struct ci_tx_queue {
 	union { /* TX ring virtual address */
-		volatile struct i40e_tx_desc *i40e_tx_ring;
-		volatile struct iavf_tx_desc *iavf_tx_ring;
-		volatile struct ice_tx_desc *ice_tx_ring;
-		volatile struct idpf_base_tx_desc *idpf_tx_ring;
+		volatile struct ci_tx_desc *i40e_tx_ring;
+		volatile struct ci_tx_desc *iavf_tx_ring;
+		volatile struct ci_tx_desc *ice_tx_ring;
+		volatile struct ci_tx_desc *idpf_tx_ring;
 		volatile union ixgbe_adv_tx_desc *ixgbe_tx_ring;
 	};
 	volatile uint8_t *qtx_tail;               /* register address of tail */
diff --git a/drivers/net/intel/cpfl/cpfl_rxtx.c b/drivers/net/intel/cpfl/cpfl_rxtx.c
index d0438b5da0..78bc3e9b49 100644
--- a/drivers/net/intel/cpfl/cpfl_rxtx.c
+++ b/drivers/net/intel/cpfl/cpfl_rxtx.c
@@ -131,7 +131,7 @@ cpfl_dma_zone_reserve(struct rte_eth_dev *dev, uint16_t queue_idx,
 			ring_size = RTE_ALIGN(len * sizeof(struct idpf_flex_tx_sched_desc),
 					      CPFL_DMA_MEM_ALIGN);
 		else
-			ring_size = RTE_ALIGN(len * sizeof(struct idpf_base_tx_desc),
+			ring_size = RTE_ALIGN(len * sizeof(struct ci_tx_desc),
 					      CPFL_DMA_MEM_ALIGN);
 		memcpy(ring_name, "cpfl Tx ring", sizeof("cpfl Tx ring"));
 		break;
diff --git a/drivers/net/intel/i40e/i40e_fdir.c b/drivers/net/intel/i40e/i40e_fdir.c
index 55d18c5d4a..605df73c9e 100644
--- a/drivers/net/intel/i40e/i40e_fdir.c
+++ b/drivers/net/intel/i40e/i40e_fdir.c
@@ -1377,7 +1377,7 @@ i40e_find_available_buffer(struct rte_eth_dev *dev)
 	 */
 	if (fdir_info->txq_available_buf_count <= 0) {
 		uint16_t tmp_tail;
-		volatile struct i40e_tx_desc *tmp_txdp;
+		volatile struct ci_tx_desc *tmp_txdp;
 
 		tmp_tail = txq->tx_tail;
 		tmp_txdp = &txq->i40e_tx_ring[tmp_tail + 1];
@@ -1628,7 +1628,7 @@ i40e_flow_fdir_filter_programming(struct i40e_pf *pf,
 	struct ci_tx_queue *txq = pf->fdir.txq;
 	struct ci_rx_queue *rxq = pf->fdir.rxq;
 	const struct i40e_fdir_action *fdir_action = &filter->action;
-	volatile struct i40e_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	volatile struct i40e_filter_program_desc *fdirdp;
 	uint32_t td_cmd;
 	uint16_t vsi_id;
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 1c3586778c..92d49ccb79 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -388,7 +388,7 @@ static inline int
 i40e_xmit_cleanup(struct ci_tx_queue *txq)
 {
 	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	volatile struct i40e_tx_desc *txd = txq->i40e_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->i40e_tx_ring;
 	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
 	uint16_t nb_tx_desc = txq->nb_tx_desc;
 	uint16_t desc_to_clean_to;
@@ -1092,8 +1092,8 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	struct ci_tx_queue *txq;
 	struct ci_tx_entry *sw_ring;
 	struct ci_tx_entry *txe, *txn;
-	volatile struct i40e_tx_desc *txd;
-	volatile struct i40e_tx_desc *txr;
+	volatile struct ci_tx_desc *txd;
+	volatile struct ci_tx_desc *txr;
 	struct rte_mbuf *tx_pkt;
 	struct rte_mbuf *m_seg;
 	uint32_t cd_tunneling_params;
@@ -1398,7 +1398,7 @@ i40e_tx_free_bufs(struct ci_tx_queue *txq)
 
 /* Populate 4 descriptors with data from 4 mbufs */
 static inline void
-tx4(volatile struct i40e_tx_desc *txdp, struct rte_mbuf **pkts)
+tx4(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
 {
 	uint64_t dma_addr;
 	uint32_t i;
@@ -1414,7 +1414,7 @@ tx4(volatile struct i40e_tx_desc *txdp, struct rte_mbuf **pkts)
 
 /* Populate 1 descriptor with data from 1 mbuf */
 static inline void
-tx1(volatile struct i40e_tx_desc *txdp, struct rte_mbuf **pkts)
+tx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
 {
 	uint64_t dma_addr;
 
@@ -1431,7 +1431,7 @@ i40e_tx_fill_hw_ring(struct ci_tx_queue *txq,
 		     struct rte_mbuf **pkts,
 		     uint16_t nb_pkts)
 {
-	volatile struct i40e_tx_desc *txdp = &txq->i40e_tx_ring[txq->tx_tail];
+	volatile struct ci_tx_desc *txdp = &txq->i40e_tx_ring[txq->tx_tail];
 	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
 	const int N_PER_LOOP = 4;
 	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
@@ -1459,7 +1459,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
 	     uint16_t nb_pkts)
 {
-	volatile struct i40e_tx_desc *txr = txq->i40e_tx_ring;
+	volatile struct ci_tx_desc *txr = txq->i40e_tx_ring;
 	uint16_t n = 0;
 
 	/**
@@ -2616,7 +2616,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	}
 
 	/* Allocate TX hardware ring descriptors. */
-	ring_size = sizeof(struct i40e_tx_desc) * I40E_MAX_RING_DESC;
+	ring_size = sizeof(struct ci_tx_desc) * I40E_MAX_RING_DESC;
 	ring_size = RTE_ALIGN(ring_size, I40E_DMA_MEM_ALIGN);
 	tz = rte_eth_dma_zone_reserve(dev, "i40e_tx_ring", queue_idx,
 			      ring_size, I40E_RING_BASE_ALIGN, socket_id);
@@ -2640,7 +2640,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	txq->tx_deferred_start = tx_conf->tx_deferred_start;
 
 	txq->tx_ring_dma = tz->iova;
-	txq->i40e_tx_ring = (struct i40e_tx_desc *)tz->addr;
+	txq->i40e_tx_ring = (struct ci_tx_desc *)tz->addr;
 
 	/* Allocate software ring */
 	txq->sw_ring =
@@ -2913,13 +2913,13 @@ i40e_reset_tx_queue(struct ci_tx_queue *txq)
 	}
 
 	txe = txq->sw_ring;
-	size = sizeof(struct i40e_tx_desc) * txq->nb_tx_desc;
+	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
 		((volatile char *)txq->i40e_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		volatile struct i40e_tx_desc *txd = &txq->i40e_tx_ring[i];
+		volatile struct ci_tx_desc *txd = &txq->i40e_tx_ring[i];
 
 		txd->cmd_type_offset_bsz =
 			rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE);
@@ -3221,7 +3221,7 @@ i40e_fdir_setup_tx_resources(struct i40e_pf *pf)
 	}
 
 	/* Allocate TX hardware ring descriptors. */
-	ring_size = sizeof(struct i40e_tx_desc) * I40E_FDIR_NUM_TX_DESC;
+	ring_size = sizeof(struct ci_tx_desc) * I40E_FDIR_NUM_TX_DESC;
 	ring_size = RTE_ALIGN(ring_size, I40E_DMA_MEM_ALIGN);
 
 	tz = rte_eth_dma_zone_reserve(dev, "fdir_tx_ring",
@@ -3240,7 +3240,7 @@ i40e_fdir_setup_tx_resources(struct i40e_pf *pf)
 	txq->i40e_vsi = pf->fdir.fdir_vsi;
 
 	txq->tx_ring_dma = tz->iova;
-	txq->i40e_tx_ring = (struct i40e_tx_desc *)tz->addr;
+	txq->i40e_tx_ring = (struct ci_tx_desc *)tz->addr;
 
 	/*
 	 * don't need to allocate software ring and reset for the fdir
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
index bbb6d907cf..ef5b252898 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
@@ -446,7 +446,7 @@ i40e_recv_scattered_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 }
 
 static inline void
-vtx1(volatile struct i40e_tx_desc *txdp,
+vtx1(volatile struct ci_tx_desc *txdp,
 	struct rte_mbuf *pkt, uint64_t flags)
 {
 	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
@@ -459,7 +459,7 @@ vtx1(volatile struct i40e_tx_desc *txdp,
 }
 
 static inline void
-vtx(volatile struct i40e_tx_desc *txdp,
+vtx(volatile struct ci_tx_desc *txdp,
 	struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
 	int i;
@@ -473,7 +473,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 			  uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct i40e_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
index 4e398b3140..137c1f9765 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
@@ -681,7 +681,7 @@ i40e_recv_scattered_pkts_vec_avx2(void *rx_queue, struct rte_mbuf **rx_pkts,
 
 
 static inline void
-vtx1(volatile struct i40e_tx_desc *txdp,
+vtx1(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf *pkt, uint64_t flags)
 {
 	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
@@ -694,7 +694,7 @@ vtx1(volatile struct i40e_tx_desc *txdp,
 }
 
 static inline void
-vtx(volatile struct i40e_tx_desc *txdp,
+vtx(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
 	const uint64_t hi_qw_tmpl = (I40E_TX_DESC_DTYPE_DATA |
@@ -739,7 +739,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 			  uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct i40e_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
index 571987d27a..6971488750 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
@@ -750,7 +750,7 @@ i40e_recv_scattered_pkts_vec_avx512(void *rx_queue,
 }
 
 static inline void
-vtx1(volatile struct i40e_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags)
+vtx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags)
 {
 	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
 		((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT) |
@@ -762,7 +762,7 @@ vtx1(volatile struct i40e_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags)
 }
 
 static inline void
-vtx(volatile struct i40e_tx_desc *txdp,
+vtx(volatile struct ci_tx_desc *txdp,
 	struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
 	const uint64_t hi_qw_tmpl = (I40E_TX_DESC_DTYPE_DATA |
@@ -807,7 +807,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 				 uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct i40e_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
index b5be0c1b59..6404b70c56 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
@@ -597,7 +597,7 @@ i40e_recv_scattered_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 }
 
 static inline void
-vtx1(volatile struct i40e_tx_desc *txdp,
+vtx1(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf *pkt, uint64_t flags)
 {
 	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
@@ -609,7 +609,7 @@ vtx1(volatile struct i40e_tx_desc *txdp,
 }
 
 static inline void
-vtx(volatile struct i40e_tx_desc *txdp, struct rte_mbuf **pkt,
+vtx(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkt,
 		uint16_t nb_pkts,  uint64_t flags)
 {
 	int i;
@@ -623,7 +623,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict tx_queue,
 	struct rte_mbuf **__rte_restrict tx_pkts, uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct i40e_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index 4b763627bc..e4421a9932 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -267,7 +267,7 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	}
 
 	txe = txq->sw_ring;
-	size = sizeof(struct iavf_tx_desc) * txq->nb_tx_desc;
+	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
 		((volatile char *)txq->iavf_tx_ring)[i] = 0;
 
@@ -827,7 +827,7 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	}
 
 	/* Allocate TX hardware ring descriptors. */
-	ring_size = sizeof(struct iavf_tx_desc) * IAVF_MAX_RING_DESC;
+	ring_size = sizeof(struct ci_tx_desc) * IAVF_MAX_RING_DESC;
 	ring_size = RTE_ALIGN(ring_size, IAVF_DMA_MEM_ALIGN);
 	mz = rte_eth_dma_zone_reserve(dev, "iavf_tx_ring", queue_idx,
 				      ring_size, IAVF_RING_BASE_ALIGN,
@@ -839,7 +839,7 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
 		return -ENOMEM;
 	}
 	txq->tx_ring_dma = mz->iova;
-	txq->iavf_tx_ring = (struct iavf_tx_desc *)mz->addr;
+	txq->iavf_tx_ring = (struct ci_tx_desc *)mz->addr;
 
 	txq->mz = mz;
 	reset_tx_queue(txq);
@@ -2333,7 +2333,7 @@ iavf_xmit_cleanup(struct ci_tx_queue *txq)
 	uint16_t desc_to_clean_to;
 	uint16_t nb_tx_to_clean;
 
-	volatile struct iavf_tx_desc *txd = txq->iavf_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->iavf_tx_ring;
 
 	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
 	if (desc_to_clean_to >= nb_tx_desc)
@@ -2723,7 +2723,7 @@ iavf_calc_pkt_desc(struct rte_mbuf *tx_pkt)
 }
 
 static inline void
-iavf_fill_data_desc(volatile struct iavf_tx_desc *desc,
+iavf_fill_data_desc(volatile struct ci_tx_desc *desc,
 	uint64_t desc_template,	uint16_t buffsz,
 	uint64_t buffer_addr)
 {
@@ -2756,7 +2756,7 @@ uint16_t
 iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = tx_queue;
-	volatile struct iavf_tx_desc *txr = txq->iavf_tx_ring;
+	volatile struct ci_tx_desc *txr = txq->iavf_tx_ring;
 	struct ci_tx_entry *txe_ring = txq->sw_ring;
 	struct ci_tx_entry *txe, *txn;
 	struct rte_mbuf *mb, *mb_seg;
@@ -2774,7 +2774,7 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	txe = &txe_ring[desc_idx];
 
 	for (idx = 0; idx < nb_pkts; idx++) {
-		volatile struct iavf_tx_desc *ddesc;
+		volatile struct ci_tx_desc *ddesc;
 		struct iavf_ipsec_crypto_pkt_metadata *ipsec_md;
 
 		uint16_t nb_desc_ctx, nb_desc_ipsec;
@@ -2895,7 +2895,7 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		mb_seg = mb;
 
 		do {
-			ddesc = (volatile struct iavf_tx_desc *)
+			ddesc = (volatile struct ci_tx_desc *)
 					&txr[desc_idx];
 
 			txn = &txe_ring[txe->next_id];
diff --git a/drivers/net/intel/iavf/iavf_rxtx.h b/drivers/net/intel/iavf/iavf_rxtx.h
index e1f78dcde0..dd6d884fc1 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.h
+++ b/drivers/net/intel/iavf/iavf_rxtx.h
@@ -678,7 +678,7 @@ void iavf_dump_tx_descriptor(const struct ci_tx_queue *txq,
 			    const volatile void *desc, uint16_t tx_id)
 {
 	const char *name;
-	const volatile struct iavf_tx_desc *tx_desc = desc;
+	const volatile struct ci_tx_desc *tx_desc = desc;
 	enum iavf_tx_desc_dtype_value type;
 
 
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
index e29958e0bc..5b62d51cf7 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
@@ -1630,7 +1630,7 @@ iavf_recv_scattered_pkts_vec_avx2_flex_rxd_offload(void *rx_queue,
 
 
 static __rte_always_inline void
-iavf_vtx1(volatile struct iavf_tx_desc *txdp,
+iavf_vtx1(volatile struct ci_tx_desc *txdp,
 	  struct rte_mbuf *pkt, uint64_t flags, bool offload, uint8_t vlan_flag)
 {
 	uint64_t high_qw =
@@ -1646,7 +1646,7 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp,
 }
 
 static __rte_always_inline void
-iavf_vtx(volatile struct iavf_tx_desc *txdp,
+iavf_vtx(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags, bool offload, uint8_t vlan_flag)
 {
 	const uint64_t hi_qw_tmpl = (IAVF_TX_DESC_DTYPE_DATA |
@@ -1713,7 +1713,7 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 			       uint16_t nb_pkts, bool offload)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct iavf_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	/* bit2 is reserved and must be set to 1 according to Spec */
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c b/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
index 7c0907b7cf..d79d96c7b7 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
@@ -1840,7 +1840,7 @@ tx_backlog_entry_avx512(struct ci_tx_entry_vec *txep,
 }
 
 static __rte_always_inline void
-iavf_vtx1(volatile struct iavf_tx_desc *txdp,
+iavf_vtx1(volatile struct ci_tx_desc *txdp,
 	  struct rte_mbuf *pkt, uint64_t flags,
 	  bool offload, uint8_t vlan_flag)
 {
@@ -1859,7 +1859,7 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp,
 #define IAVF_TX_LEN_MASK 0xAA
 #define IAVF_TX_OFF_MASK 0x55
 static __rte_always_inline void
-iavf_vtx(volatile struct iavf_tx_desc *txdp,
+iavf_vtx(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags,
 		bool offload, uint8_t vlan_flag)
 {
@@ -2068,7 +2068,7 @@ iavf_fill_ctx_desc_tunnelling_field(volatile uint64_t *qw0,
 }
 
 static __rte_always_inline void
-ctx_vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt,
+ctx_vtx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf *pkt,
 		uint64_t flags, bool offload, uint8_t vlan_flag)
 {
 	uint64_t high_ctx_qw = IAVF_TX_DESC_DTYPE_CONTEXT;
@@ -2106,7 +2106,7 @@ ctx_vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt,
 }
 
 static __rte_always_inline void
-ctx_vtx(volatile struct iavf_tx_desc *txdp,
+ctx_vtx(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags,
 		bool offload, uint8_t vlan_flag)
 {
@@ -2203,7 +2203,7 @@ iavf_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 				 uint16_t nb_pkts, bool offload)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct iavf_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	/* bit2 is reserved and must be set to 1 according to Spec */
@@ -2271,7 +2271,7 @@ iavf_xmit_fixed_burst_vec_avx512_ctx(void *tx_queue, struct rte_mbuf **tx_pkts,
 				 uint16_t nb_pkts, bool offload)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct iavf_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, nb_mbuf, tx_id;
 	/* bit2 is reserved and must be set to 1 according to Spec */
diff --git a/drivers/net/intel/ice/ice_dcf_ethdev.c b/drivers/net/intel/ice/ice_dcf_ethdev.c
index 81da5a4656..ab1d499cef 100644
--- a/drivers/net/intel/ice/ice_dcf_ethdev.c
+++ b/drivers/net/intel/ice/ice_dcf_ethdev.c
@@ -399,7 +399,7 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	}
 
 	txe = txq->sw_ring;
-	size = sizeof(struct ice_tx_desc) * txq->nb_tx_desc;
+	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
 		((volatile char *)txq->ice_tx_ring)[i] = 0;
 
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index f3bc79423d..74b80e7df3 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -1115,13 +1115,13 @@ ice_reset_tx_queue(struct ci_tx_queue *txq)
 	}
 
 	txe = txq->sw_ring;
-	size = sizeof(struct ice_tx_desc) * txq->nb_tx_desc;
+	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
 		((volatile char *)txq->ice_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		volatile struct ice_tx_desc *txd = &txq->ice_tx_ring[i];
+		volatile struct ci_tx_desc *txd = &txq->ice_tx_ring[i];
 
 		txd->cmd_type_offset_bsz =
 			rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE);
@@ -1623,7 +1623,7 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
 	}
 
 	/* Allocate TX hardware ring descriptors. */
-	ring_size = sizeof(struct ice_tx_desc) * ICE_MAX_NUM_DESC_BY_MAC(hw);
+	ring_size = sizeof(struct ci_tx_desc) * ICE_MAX_NUM_DESC_BY_MAC(hw);
 	ring_size = RTE_ALIGN(ring_size, ICE_DMA_MEM_ALIGN);
 	tz = rte_eth_dma_zone_reserve(dev, "ice_tx_ring", queue_idx,
 				      ring_size, ICE_RING_BASE_ALIGN,
@@ -2619,7 +2619,7 @@ ice_fdir_setup_tx_resources(struct ice_pf *pf)
 	}
 
 	/* Allocate TX hardware ring descriptors. */
-	ring_size = sizeof(struct ice_tx_desc) * ICE_FDIR_NUM_TX_DESC;
+	ring_size = sizeof(struct ci_tx_desc) * ICE_FDIR_NUM_TX_DESC;
 	ring_size = RTE_ALIGN(ring_size, ICE_DMA_MEM_ALIGN);
 
 	tz = rte_eth_dma_zone_reserve(dev, "fdir_tx_ring",
@@ -2638,7 +2638,7 @@ ice_fdir_setup_tx_resources(struct ice_pf *pf)
 	txq->ice_vsi = pf->fdir.fdir_vsi;
 
 	txq->tx_ring_dma = tz->iova;
-	txq->ice_tx_ring = (struct ice_tx_desc *)tz->addr;
+	txq->ice_tx_ring = (struct ci_tx_desc *)tz->addr;
 	/*
 	 * don't need to allocate software ring and reset for the fdir
 	 * program queue just set the queue has been configured.
@@ -3027,7 +3027,7 @@ static inline int
 ice_xmit_cleanup(struct ci_tx_queue *txq)
 {
 	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	volatile struct ice_tx_desc *txd = txq->ice_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->ice_tx_ring;
 	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
 	uint16_t nb_tx_desc = txq->nb_tx_desc;
 	uint16_t desc_to_clean_to;
@@ -3148,8 +3148,8 @@ uint16_t
 ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq;
-	volatile struct ice_tx_desc *ice_tx_ring;
-	volatile struct ice_tx_desc *txd;
+	volatile struct ci_tx_desc *ice_tx_ring;
+	volatile struct ci_tx_desc *txd;
 	struct ci_tx_entry *sw_ring;
 	struct ci_tx_entry *txe, *txn;
 	struct rte_mbuf *tx_pkt;
@@ -3312,7 +3312,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 			while ((ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) &&
 				unlikely(slen > ICE_MAX_DATA_PER_TXD)) {
-				txd->buf_addr = rte_cpu_to_le_64(buf_dma_addr);
+				txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
 				txd->cmd_type_offset_bsz =
 				rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DATA |
 				((uint64_t)td_cmd << ICE_TXD_QW1_CMD_S) |
@@ -3331,7 +3331,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 				txn = &sw_ring[txe->next_id];
 			}
 
-			txd->buf_addr = rte_cpu_to_le_64(buf_dma_addr);
+			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
 			txd->cmd_type_offset_bsz =
 				rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DATA |
 				((uint64_t)td_cmd << ICE_TXD_QW1_CMD_S) |
@@ -3563,14 +3563,14 @@ ice_tx_done_cleanup(void *txq, uint32_t free_cnt)
 
 /* Populate 4 descriptors with data from 4 mbufs */
 static inline void
-tx4(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkts)
+tx4(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
 {
 	uint64_t dma_addr;
 	uint32_t i;
 
 	for (i = 0; i < 4; i++, txdp++, pkts++) {
 		dma_addr = rte_mbuf_data_iova(*pkts);
-		txdp->buf_addr = rte_cpu_to_le_64(dma_addr);
+		txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
 		txdp->cmd_type_offset_bsz =
 			ice_build_ctob((uint32_t)ICE_TD_CMD, 0,
 				       (*pkts)->data_len, 0);
@@ -3579,12 +3579,12 @@ tx4(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkts)
 
 /* Populate 1 descriptor with data from 1 mbuf */
 static inline void
-tx1(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkts)
+tx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
 {
 	uint64_t dma_addr;
 
 	dma_addr = rte_mbuf_data_iova(*pkts);
-	txdp->buf_addr = rte_cpu_to_le_64(dma_addr);
+	txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
 	txdp->cmd_type_offset_bsz =
 		ice_build_ctob((uint32_t)ICE_TD_CMD, 0,
 			       (*pkts)->data_len, 0);
@@ -3594,7 +3594,7 @@ static inline void
 ice_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
 		    uint16_t nb_pkts)
 {
-	volatile struct ice_tx_desc *txdp = &txq->ice_tx_ring[txq->tx_tail];
+	volatile struct ci_tx_desc *txdp = &txq->ice_tx_ring[txq->tx_tail];
 	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
 	const int N_PER_LOOP = 4;
 	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
@@ -3627,7 +3627,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
 	     uint16_t nb_pkts)
 {
-	volatile struct ice_tx_desc *txr = txq->ice_tx_ring;
+	volatile struct ci_tx_desc *txr = txq->ice_tx_ring;
 	uint16_t n = 0;
 
 	/**
@@ -4882,7 +4882,7 @@ ice_fdir_programming(struct ice_pf *pf, struct ice_fltr_desc *fdir_desc)
 	struct ci_tx_queue *txq = pf->fdir.txq;
 	struct ci_rx_queue *rxq = pf->fdir.rxq;
 	volatile struct ice_fltr_desc *fdirdp;
-	volatile struct ice_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	uint32_t td_cmd;
 	uint16_t i;
 
@@ -4892,7 +4892,7 @@ ice_fdir_programming(struct ice_pf *pf, struct ice_fltr_desc *fdir_desc)
 	fdirdp->dtype_cmd_vsi_fdid = fdir_desc->dtype_cmd_vsi_fdid;
 
 	txdp = &txq->ice_tx_ring[txq->tx_tail + 1];
-	txdp->buf_addr = rte_cpu_to_le_64(pf->fdir.dma_addr);
+	txdp->buffer_addr = rte_cpu_to_le_64(pf->fdir.dma_addr);
 	td_cmd = ICE_TX_DESC_CMD_EOP |
 		ICE_TX_DESC_CMD_RS  |
 		ICE_TX_DESC_CMD_DUMMY;
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
index 0ba1d557ca..bef7bb00ba 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
@@ -774,7 +774,7 @@ ice_recv_scattered_pkts_vec_avx2_offload(void *rx_queue,
 }
 
 static __rte_always_inline void
-ice_vtx1(volatile struct ice_tx_desc *txdp,
+ice_vtx1(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf *pkt, uint64_t flags, bool offload)
 {
 	uint64_t high_qw =
@@ -789,7 +789,7 @@ ice_vtx1(volatile struct ice_tx_desc *txdp,
 }
 
 static __rte_always_inline void
-ice_vtx(volatile struct ice_tx_desc *txdp,
+ice_vtx(volatile struct ci_tx_desc *txdp,
 	struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags, bool offload)
 {
 	const uint64_t hi_qw_tmpl = (ICE_TX_DESC_DTYPE_DATA |
@@ -852,7 +852,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 			      uint16_t nb_pkts, bool offload)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct ice_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = ICE_TD_CMD;
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
index 7c6fe82072..1f6bf5fc8e 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
@@ -847,7 +847,7 @@ ice_recv_scattered_pkts_vec_avx512_offload(void *rx_queue,
 }
 
 static __rte_always_inline void
-ice_vtx1(volatile struct ice_tx_desc *txdp,
+ice_vtx1(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf *pkt, uint64_t flags, bool do_offload)
 {
 	uint64_t high_qw =
@@ -863,7 +863,7 @@ ice_vtx1(volatile struct ice_tx_desc *txdp,
 }
 
 static __rte_always_inline void
-ice_vtx(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkt,
+ice_vtx(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkt,
 	uint16_t nb_pkts,  uint64_t flags, bool do_offload)
 {
 	const uint64_t hi_qw_tmpl = (ICE_TX_DESC_DTYPE_DATA |
@@ -916,7 +916,7 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 				uint16_t nb_pkts, bool do_offload)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct ice_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = ICE_TD_CMD;
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 797ee515dd..be3c1ef216 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -264,13 +264,13 @@ idpf_qc_single_tx_queue_reset(struct ci_tx_queue *txq)
 	}
 
 	txe = txq->sw_ring;
-	size = sizeof(struct idpf_base_tx_desc) * txq->nb_tx_desc;
+	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
 		((volatile char *)txq->idpf_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		txq->idpf_tx_ring[i].qw1 =
+		txq->idpf_tx_ring[i].cmd_type_offset_bsz =
 			rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
@@ -1335,14 +1335,14 @@ idpf_xmit_cleanup(struct ci_tx_queue *txq)
 	uint16_t desc_to_clean_to;
 	uint16_t nb_tx_to_clean;
 
-	volatile struct idpf_base_tx_desc *txd = txq->idpf_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->idpf_tx_ring;
 
 	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
 	if (desc_to_clean_to >= nb_tx_desc)
 		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
 
 	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-	if ((txd[desc_to_clean_to].qw1 &
+	if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
 	     rte_cpu_to_le_64(IDPF_TXD_QW1_DTYPE_M)) !=
 	    rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE)) {
 		TX_LOG(DEBUG, "TX descriptor %4u is not done "
@@ -1358,7 +1358,7 @@ idpf_xmit_cleanup(struct ci_tx_queue *txq)
 		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
 					    last_desc_cleaned);
 
-	txd[desc_to_clean_to].qw1 = 0;
+	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
 
 	txq->last_desc_cleaned = desc_to_clean_to;
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
@@ -1372,8 +1372,8 @@ uint16_t
 idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			  uint16_t nb_pkts)
 {
-	volatile struct idpf_base_tx_desc *txd;
-	volatile struct idpf_base_tx_desc *txr;
+	volatile struct ci_tx_desc *txd;
+	volatile struct ci_tx_desc *txr;
 	union idpf_tx_offload tx_offload = {0};
 	struct ci_tx_entry *txe, *txn;
 	struct ci_tx_entry *sw_ring;
@@ -1491,8 +1491,8 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			/* Setup TX Descriptor */
 			slen = m_seg->data_len;
 			buf_dma_addr = rte_mbuf_data_iova(m_seg);
-			txd->buf_addr = rte_cpu_to_le_64(buf_dma_addr);
-			txd->qw1 = rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DATA |
+			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
+			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DATA |
 				((uint64_t)td_cmd  << IDPF_TXD_QW1_CMD_S) |
 				((uint64_t)td_offset << IDPF_TXD_QW1_OFFSET_S) |
 				((uint64_t)slen << IDPF_TXD_QW1_TX_BUF_SZ_S));
@@ -1519,7 +1519,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			txq->nb_tx_used = 0;
 		}
 
-		txd->qw1 |= rte_cpu_to_le_16(td_cmd << IDPF_TXD_QW1_CMD_S);
+		txd->cmd_type_offset_bsz |= rte_cpu_to_le_16(td_cmd << IDPF_TXD_QW1_CMD_S);
 	}
 
 end_of_tx:
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.h b/drivers/net/intel/idpf/idpf_common_rxtx.h
index 7c6ff5d047..2f2fa153b2 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.h
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.h
@@ -182,7 +182,7 @@ union idpf_tx_offload {
 };
 
 union idpf_tx_desc {
-	struct idpf_base_tx_desc *tx_ring;
+	struct ci_tx_desc *tx_ring;
 	struct idpf_flex_tx_sched_desc *desc_ring;
 	struct idpf_splitq_tx_compl_desc *compl_ring;
 };
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c b/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
index 21c8f79254..5f5d538dcb 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
@@ -483,7 +483,7 @@ idpf_dp_singleq_recv_pkts_avx2(void *rx_queue, struct rte_mbuf **rx_pkts, uint16
 }
 
 static inline void
-idpf_singleq_vtx1(volatile struct idpf_base_tx_desc *txdp,
+idpf_singleq_vtx1(volatile struct ci_tx_desc *txdp,
 		  struct rte_mbuf *pkt, uint64_t flags)
 {
 	uint64_t high_qw =
@@ -497,7 +497,7 @@ idpf_singleq_vtx1(volatile struct idpf_base_tx_desc *txdp,
 }
 
 static inline void
-idpf_singleq_vtx(volatile struct idpf_base_tx_desc *txdp,
+idpf_singleq_vtx(volatile struct ci_tx_desc *txdp,
 		 struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
 	const uint64_t hi_qw_tmpl = (IDPF_TX_DESC_DTYPE_DATA |
@@ -556,7 +556,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts
 				       uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
-	volatile struct idpf_base_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = IDPF_TX_DESC_CMD_EOP;
@@ -604,7 +604,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->idpf_tx_ring[txq->tx_next_rs].qw1 |=
+		txq->idpf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)IDPF_TX_DESC_CMD_RS) <<
 					 IDPF_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
index bc2cadd738..c1ec3d1222 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
@@ -1000,7 +1000,7 @@ idpf_dp_splitq_recv_pkts_avx512(void *rx_queue, struct rte_mbuf **rx_pkts,
 }
 
 static __rte_always_inline void
-idpf_singleq_vtx1(volatile struct idpf_base_tx_desc *txdp,
+idpf_singleq_vtx1(volatile struct ci_tx_desc *txdp,
 	  struct rte_mbuf *pkt, uint64_t flags)
 {
 	uint64_t high_qw =
@@ -1016,7 +1016,7 @@ idpf_singleq_vtx1(volatile struct idpf_base_tx_desc *txdp,
 #define IDPF_TX_LEN_MASK 0xAA
 #define IDPF_TX_OFF_MASK 0x55
 static __rte_always_inline void
-idpf_singleq_vtx(volatile struct idpf_base_tx_desc *txdp,
+idpf_singleq_vtx(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
 	const uint64_t hi_qw_tmpl = (IDPF_TX_DESC_DTYPE_DATA  |
@@ -1072,7 +1072,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pk
 					 uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = tx_queue;
-	volatile struct idpf_base_tx_desc *txdp;
+	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = IDPF_TX_DESC_CMD_EOP;
@@ -1123,7 +1123,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pk
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->idpf_tx_ring[txq->tx_next_rs].qw1 |=
+		txq->idpf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)IDPF_TX_DESC_CMD_RS) <<
 					 IDPF_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/idpf/idpf_rxtx.c b/drivers/net/intel/idpf/idpf_rxtx.c
index cee454244f..8aa44585fe 100644
--- a/drivers/net/intel/idpf/idpf_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_rxtx.c
@@ -72,7 +72,7 @@ idpf_dma_zone_reserve(struct rte_eth_dev *dev, uint16_t queue_idx,
 			ring_size = RTE_ALIGN(len * sizeof(struct idpf_flex_tx_sched_desc),
 					      IDPF_DMA_MEM_ALIGN);
 		else
-			ring_size = RTE_ALIGN(len * sizeof(struct idpf_base_tx_desc),
+			ring_size = RTE_ALIGN(len * sizeof(struct ci_tx_desc),
 					      IDPF_DMA_MEM_ALIGN);
 		rte_memcpy(ring_name, "idpf Tx ring", sizeof("idpf Tx ring"));
 		break;
diff --git a/drivers/net/intel/idpf/idpf_rxtx_vec_common.h b/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
index 425f0792a1..4702061484 100644
--- a/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
+++ b/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
@@ -31,7 +31,7 @@ idpf_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 	if (txq->complq != NULL)
 		return 1;
 
-	return (txq->idpf_tx_ring[idx].qw1 &
+	return (txq->idpf_tx_ring[idx].cmd_type_offset_bsz &
 			rte_cpu_to_le_64(IDPF_TXD_QW1_DTYPE_M)) ==
 				rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE);
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v5 02/35] net/intel: fix memory leak on TX queue setup failure
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
  2026-02-11 18:12   ` [PATCH v5 01/35] net/intel: create common Tx descriptor structure Bruce Richardson
@ 2026-02-11 18:12   ` Bruce Richardson
  2026-02-12 12:14     ` Burakov, Anatoly
  2026-02-11 18:12   ` [PATCH v5 03/35] net/intel: use common Tx ring structure Bruce Richardson
                     ` (33 subsequent siblings)
  35 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 18:12 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, stable, Praveen Shetty, Jingjing Wu,
	Mingxia Liu, Beilei Xing, Qi Zhang

When TX queue setup fails after sw_ring allocation but during
completion queue setup, the allocated sw_ring memory is not freed,
causing a memory leak.

This patch adds the missing rte_free() call in the error path for
both cpfl and idpf drivers to properly clean up sw_ring before
returning from the function.

Fixes: 6c2d333cd418 ("net/cpfl: support Tx queue setup")
Fixes: c008a5e740bd ("common/idpf: add queue setup/release")
Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/cpfl/cpfl_rxtx.c | 1 +
 drivers/net/intel/idpf/idpf_rxtx.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/net/intel/cpfl/cpfl_rxtx.c b/drivers/net/intel/cpfl/cpfl_rxtx.c
index 78bc3e9b49..392a7fcc98 100644
--- a/drivers/net/intel/cpfl/cpfl_rxtx.c
+++ b/drivers/net/intel/cpfl/cpfl_rxtx.c
@@ -628,6 +628,7 @@ cpfl_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 	return 0;
 
 err_complq_setup:
+	rte_free(txq->sw_ring);
 err_sw_ring_alloc:
 	cpfl_dma_zone_release(mz);
 err_mz_reserve:
diff --git a/drivers/net/intel/idpf/idpf_rxtx.c b/drivers/net/intel/idpf/idpf_rxtx.c
index 8aa44585fe..9317c8b175 100644
--- a/drivers/net/intel/idpf/idpf_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_rxtx.c
@@ -502,6 +502,7 @@ idpf_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 	return 0;
 
 err_complq_setup:
+	rte_free(txq->sw_ring);
 err_sw_ring_alloc:
 	idpf_dma_zone_release(mz);
 err_mz_reserve:
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v5 03/35] net/intel: use common Tx ring structure
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
  2026-02-11 18:12   ` [PATCH v5 01/35] net/intel: create common Tx descriptor structure Bruce Richardson
  2026-02-11 18:12   ` [PATCH v5 02/35] net/intel: fix memory leak on TX queue setup failure Bruce Richardson
@ 2026-02-11 18:12   ` Bruce Richardson
  2026-02-11 18:12   ` [PATCH v5 04/35] net/intel: create common post-Tx cleanup function Bruce Richardson
                     ` (32 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 18:12 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Ciara Loftus, Praveen Shetty,
	Vladimir Medvedkin, Anatoly Burakov, Jingjing Wu

Rather than having separate per-driver ring pointers in a union, since
we now have a common descriptor type, we can merge all but the ixgbe
pointer into one value.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Ciara Loftus <ciara.loftus@intel.com>
---
 drivers/net/intel/common/tx.h                 |  5 +--
 drivers/net/intel/cpfl/cpfl_rxtx.c            |  2 +-
 drivers/net/intel/i40e/i40e_fdir.c            |  6 ++--
 drivers/net/intel/i40e/i40e_rxtx.c            | 22 ++++++------
 .../net/intel/i40e/i40e_rxtx_vec_altivec.c    |  6 ++--
 drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c   |  6 ++--
 drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c |  6 ++--
 drivers/net/intel/i40e/i40e_rxtx_vec_common.h |  2 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_neon.c   |  6 ++--
 drivers/net/intel/iavf/iavf_rxtx.c            | 14 ++++----
 drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c   |  6 ++--
 drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c | 12 +++----
 drivers/net/intel/iavf/iavf_rxtx_vec_common.h |  2 +-
 drivers/net/intel/ice/ice_dcf_ethdev.c        |  4 +--
 drivers/net/intel/ice/ice_rxtx.c              | 34 +++++++++----------
 drivers/net/intel/ice/ice_rxtx_vec_avx2.c     |  6 ++--
 drivers/net/intel/ice/ice_rxtx_vec_avx512.c   |  6 ++--
 drivers/net/intel/ice/ice_rxtx_vec_common.h   |  2 +-
 drivers/net/intel/idpf/idpf_common_rxtx.c     |  8 ++---
 .../net/intel/idpf/idpf_common_rxtx_avx2.c    |  6 ++--
 .../net/intel/idpf/idpf_common_rxtx_avx512.c  |  6 ++--
 drivers/net/intel/idpf/idpf_rxtx.c            |  2 +-
 drivers/net/intel/idpf/idpf_rxtx_vec_common.h |  2 +-
 23 files changed, 84 insertions(+), 87 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index d7561a2bbb..8cf63e59ab 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -41,10 +41,7 @@ typedef void (*ice_tx_release_mbufs_t)(struct ci_tx_queue *txq);
 
 struct ci_tx_queue {
 	union { /* TX ring virtual address */
-		volatile struct ci_tx_desc *i40e_tx_ring;
-		volatile struct ci_tx_desc *iavf_tx_ring;
-		volatile struct ci_tx_desc *ice_tx_ring;
-		volatile struct ci_tx_desc *idpf_tx_ring;
+		volatile struct ci_tx_desc *ci_tx_ring;
 		volatile union ixgbe_adv_tx_desc *ixgbe_tx_ring;
 	};
 	volatile uint8_t *qtx_tail;               /* register address of tail */
diff --git a/drivers/net/intel/cpfl/cpfl_rxtx.c b/drivers/net/intel/cpfl/cpfl_rxtx.c
index 392a7fcc98..a4d15b7f9c 100644
--- a/drivers/net/intel/cpfl/cpfl_rxtx.c
+++ b/drivers/net/intel/cpfl/cpfl_rxtx.c
@@ -606,7 +606,7 @@ cpfl_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 	}
 
 	if (!is_splitq) {
-		txq->idpf_tx_ring = mz->addr;
+		txq->ci_tx_ring = mz->addr;
 		idpf_qc_single_tx_queue_reset(txq);
 	} else {
 		txq->desc_ring = mz->addr;
diff --git a/drivers/net/intel/i40e/i40e_fdir.c b/drivers/net/intel/i40e/i40e_fdir.c
index 605df73c9e..8a01aec0e2 100644
--- a/drivers/net/intel/i40e/i40e_fdir.c
+++ b/drivers/net/intel/i40e/i40e_fdir.c
@@ -1380,7 +1380,7 @@ i40e_find_available_buffer(struct rte_eth_dev *dev)
 		volatile struct ci_tx_desc *tmp_txdp;
 
 		tmp_tail = txq->tx_tail;
-		tmp_txdp = &txq->i40e_tx_ring[tmp_tail + 1];
+		tmp_txdp = &txq->ci_tx_ring[tmp_tail + 1];
 
 		do {
 			if ((tmp_txdp->cmd_type_offset_bsz &
@@ -1637,7 +1637,7 @@ i40e_flow_fdir_filter_programming(struct i40e_pf *pf,
 
 	PMD_DRV_LOG(INFO, "filling filter programming descriptor.");
 	fdirdp = (volatile struct i40e_filter_program_desc *)
-				(&txq->i40e_tx_ring[txq->tx_tail]);
+				(&txq->ci_tx_ring[txq->tx_tail]);
 
 	fdirdp->qindex_flex_ptype_vsi =
 			rte_cpu_to_le_32((fdir_action->rx_queue <<
@@ -1707,7 +1707,7 @@ i40e_flow_fdir_filter_programming(struct i40e_pf *pf,
 	fdirdp->fd_id = rte_cpu_to_le_32(filter->soft_id);
 
 	PMD_DRV_LOG(INFO, "filling transmit descriptor.");
-	txdp = &txq->i40e_tx_ring[txq->tx_tail + 1];
+	txdp = &txq->ci_tx_ring[txq->tx_tail + 1];
 	txdp->buffer_addr = rte_cpu_to_le_64(pf->fdir.dma_addr[txq->tx_tail >> 1]);
 
 	td_cmd = I40E_TX_DESC_CMD_EOP |
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 92d49ccb79..210fc0201e 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -388,7 +388,7 @@ static inline int
 i40e_xmit_cleanup(struct ci_tx_queue *txq)
 {
 	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	volatile struct ci_tx_desc *txd = txq->i40e_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
 	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
 	uint16_t nb_tx_desc = txq->nb_tx_desc;
 	uint16_t desc_to_clean_to;
@@ -1112,7 +1112,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 	txq = tx_queue;
 	sw_ring = txq->sw_ring;
-	txr = txq->i40e_tx_ring;
+	txr = txq->ci_tx_ring;
 	tx_id = txq->tx_tail;
 	txe = &sw_ring[tx_id];
 
@@ -1347,7 +1347,7 @@ i40e_tx_free_bufs(struct ci_tx_queue *txq)
 	const uint16_t k = RTE_ALIGN_FLOOR(tx_rs_thresh, I40E_TX_MAX_FREE_BUF_SZ);
 	const uint16_t m = tx_rs_thresh % I40E_TX_MAX_FREE_BUF_SZ;
 
-	if ((txq->i40e_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
+	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
 			rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) !=
 			rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE))
 		return 0;
@@ -1431,7 +1431,7 @@ i40e_tx_fill_hw_ring(struct ci_tx_queue *txq,
 		     struct rte_mbuf **pkts,
 		     uint16_t nb_pkts)
 {
-	volatile struct ci_tx_desc *txdp = &txq->i40e_tx_ring[txq->tx_tail];
+	volatile struct ci_tx_desc *txdp = &txq->ci_tx_ring[txq->tx_tail];
 	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
 	const int N_PER_LOOP = 4;
 	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
@@ -1459,7 +1459,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
 	     uint16_t nb_pkts)
 {
-	volatile struct ci_tx_desc *txr = txq->i40e_tx_ring;
+	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
 	uint16_t n = 0;
 
 	/**
@@ -2421,7 +2421,7 @@ i40e_dev_tx_descriptor_status(void *tx_queue, uint16_t offset)
 			desc -= txq->nb_tx_desc;
 	}
 
-	status = &txq->i40e_tx_ring[desc].cmd_type_offset_bsz;
+	status = &txq->ci_tx_ring[desc].cmd_type_offset_bsz;
 	mask = rte_le_to_cpu_64(I40E_TXD_QW1_DTYPE_MASK);
 	expect = rte_cpu_to_le_64(
 		I40E_TX_DESC_DTYPE_DESC_DONE << I40E_TXD_QW1_DTYPE_SHIFT);
@@ -2618,7 +2618,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	/* Allocate TX hardware ring descriptors. */
 	ring_size = sizeof(struct ci_tx_desc) * I40E_MAX_RING_DESC;
 	ring_size = RTE_ALIGN(ring_size, I40E_DMA_MEM_ALIGN);
-	tz = rte_eth_dma_zone_reserve(dev, "i40e_tx_ring", queue_idx,
+	tz = rte_eth_dma_zone_reserve(dev, "ci_tx_ring", queue_idx,
 			      ring_size, I40E_RING_BASE_ALIGN, socket_id);
 	if (!tz) {
 		i40e_tx_queue_release(txq);
@@ -2640,7 +2640,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	txq->tx_deferred_start = tx_conf->tx_deferred_start;
 
 	txq->tx_ring_dma = tz->iova;
-	txq->i40e_tx_ring = (struct ci_tx_desc *)tz->addr;
+	txq->ci_tx_ring = (struct ci_tx_desc *)tz->addr;
 
 	/* Allocate software ring */
 	txq->sw_ring =
@@ -2915,11 +2915,11 @@ i40e_reset_tx_queue(struct ci_tx_queue *txq)
 	txe = txq->sw_ring;
 	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
-		((volatile char *)txq->i40e_tx_ring)[i] = 0;
+		((volatile char *)txq->ci_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		volatile struct ci_tx_desc *txd = &txq->i40e_tx_ring[i];
+		volatile struct ci_tx_desc *txd = &txq->ci_tx_ring[i];
 
 		txd->cmd_type_offset_bsz =
 			rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE);
@@ -3240,7 +3240,7 @@ i40e_fdir_setup_tx_resources(struct i40e_pf *pf)
 	txq->i40e_vsi = pf->fdir.fdir_vsi;
 
 	txq->tx_ring_dma = tz->iova;
-	txq->i40e_tx_ring = (struct ci_tx_desc *)tz->addr;
+	txq->ci_tx_ring = (struct ci_tx_desc *)tz->addr;
 
 	/*
 	 * don't need to allocate software ring and reset for the fdir
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
index ef5b252898..81e9e2bc0b 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
@@ -489,7 +489,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->i40e_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = &txq->sw_ring_vec[tx_id];
 
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
@@ -509,7 +509,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->i40e_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = &txq->sw_ring_vec[tx_id];
 	}
 
@@ -519,7 +519,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->i40e_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
 						I40E_TXD_QW1_CMD_SHIFT);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
index 137c1f9765..f054bd41bf 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
@@ -753,7 +753,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->i40e_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = &txq->sw_ring_vec[tx_id];
 
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
@@ -774,7 +774,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->i40e_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = &txq->sw_ring_vec[tx_id];
 	}
 
@@ -784,7 +784,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->i40e_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
 						I40E_TXD_QW1_CMD_SHIFT);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
index 6971488750..9a967faeee 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
@@ -821,7 +821,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->i40e_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = (void *)txq->sw_ring;
 	txep += tx_id;
 
@@ -843,7 +843,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = txq->i40e_tx_ring;
+		txdp = txq->ci_tx_ring;
 		txep = (void *)txq->sw_ring;
 	}
 
@@ -853,7 +853,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->i40e_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
 						I40E_TXD_QW1_CMD_SHIFT);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_common.h b/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
index 14651f2f06..1fd7fc75bf 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
@@ -15,7 +15,7 @@
 static inline int
 i40e_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 {
-	return (txq->i40e_tx_ring[idx].cmd_type_offset_bsz &
+	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
 			rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) ==
 				rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE);
 }
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
index 6404b70c56..0b95152232 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
@@ -638,7 +638,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict tx_queue,
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->i40e_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = &txq->sw_ring_vec[tx_id];
 
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
@@ -658,7 +658,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict tx_queue,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->i40e_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = &txq->sw_ring_vec[tx_id];
 	}
 
@@ -668,7 +668,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict tx_queue,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->i40e_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
 						I40E_TXD_QW1_CMD_SHIFT);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index e4421a9932..807bc92a45 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -269,11 +269,11 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	txe = txq->sw_ring;
 	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
-		((volatile char *)txq->iavf_tx_ring)[i] = 0;
+		((volatile char *)txq->ci_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		txq->iavf_tx_ring[i].cmd_type_offset_bsz =
+		txq->ci_tx_ring[i].cmd_type_offset_bsz =
 			rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
@@ -829,7 +829,7 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	/* Allocate TX hardware ring descriptors. */
 	ring_size = sizeof(struct ci_tx_desc) * IAVF_MAX_RING_DESC;
 	ring_size = RTE_ALIGN(ring_size, IAVF_DMA_MEM_ALIGN);
-	mz = rte_eth_dma_zone_reserve(dev, "iavf_tx_ring", queue_idx,
+	mz = rte_eth_dma_zone_reserve(dev, "ci_tx_ring", queue_idx,
 				      ring_size, IAVF_RING_BASE_ALIGN,
 				      socket_id);
 	if (!mz) {
@@ -839,7 +839,7 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
 		return -ENOMEM;
 	}
 	txq->tx_ring_dma = mz->iova;
-	txq->iavf_tx_ring = (struct ci_tx_desc *)mz->addr;
+	txq->ci_tx_ring = (struct ci_tx_desc *)mz->addr;
 
 	txq->mz = mz;
 	reset_tx_queue(txq);
@@ -2333,7 +2333,7 @@ iavf_xmit_cleanup(struct ci_tx_queue *txq)
 	uint16_t desc_to_clean_to;
 	uint16_t nb_tx_to_clean;
 
-	volatile struct ci_tx_desc *txd = txq->iavf_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
 
 	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
 	if (desc_to_clean_to >= nb_tx_desc)
@@ -2756,7 +2756,7 @@ uint16_t
 iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = tx_queue;
-	volatile struct ci_tx_desc *txr = txq->iavf_tx_ring;
+	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
 	struct ci_tx_entry *txe_ring = txq->sw_ring;
 	struct ci_tx_entry *txe, *txn;
 	struct rte_mbuf *mb, *mb_seg;
@@ -4462,7 +4462,7 @@ iavf_dev_tx_desc_status(void *tx_queue, uint16_t offset)
 			desc -= txq->nb_tx_desc;
 	}
 
-	status = &txq->iavf_tx_ring[desc].cmd_type_offset_bsz;
+	status = &txq->ci_tx_ring[desc].cmd_type_offset_bsz;
 	mask = rte_le_to_cpu_64(IAVF_TXD_QW1_DTYPE_MASK);
 	expect = rte_cpu_to_le_64(
 		 IAVF_TX_DESC_DTYPE_DESC_DONE << IAVF_TXD_QW1_DTYPE_SHIFT);
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
index 5b62d51cf7..89ce841b9e 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
@@ -1729,7 +1729,7 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	nb_commit = nb_pkts;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->iavf_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = &txq->sw_ring_vec[tx_id];
 
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
@@ -1750,7 +1750,7 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->iavf_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = &txq->sw_ring_vec[tx_id];
 	}
 
@@ -1760,7 +1760,7 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->iavf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
 					 IAVF_TXD_QW1_CMD_SHIFT);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c b/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
index d79d96c7b7..ad1b0b90cd 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
@@ -2219,7 +2219,7 @@ iavf_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	nb_commit = nb_pkts;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->iavf_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = (void *)txq->sw_ring;
 	txep += tx_id;
 
@@ -2241,7 +2241,7 @@ iavf_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->iavf_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = (void *)txq->sw_ring;
 		txep += tx_id;
 	}
@@ -2252,7 +2252,7 @@ iavf_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->iavf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
 					 IAVF_TXD_QW1_CMD_SHIFT);
 		txq->tx_next_rs =
@@ -2288,7 +2288,7 @@ iavf_xmit_fixed_burst_vec_avx512_ctx(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	nb_pkts = nb_commit >> 1;
 	tx_id = txq->tx_tail;
-	txdp = &txq->iavf_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = (void *)txq->sw_ring;
 	txep += (tx_id >> 1);
 
@@ -2309,7 +2309,7 @@ iavf_xmit_fixed_burst_vec_avx512_ctx(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 		tx_id = 0;
 		/* avoid reach the end of ring */
-		txdp = txq->iavf_tx_ring;
+		txdp = txq->ci_tx_ring;
 		txep = (void *)txq->sw_ring;
 	}
 
@@ -2320,7 +2320,7 @@ iavf_xmit_fixed_burst_vec_avx512_ctx(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 
 	if (tx_id > txq->tx_next_rs) {
-		txq->iavf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
 					 IAVF_TXD_QW1_CMD_SHIFT);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_common.h b/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
index f1ea57034f..1832b76f89 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
@@ -14,7 +14,7 @@
 static inline int
 iavf_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 {
-	return (txq->iavf_tx_ring[idx].cmd_type_offset_bsz &
+	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
 			rte_cpu_to_le_64(IAVF_TXD_QW1_DTYPE_MASK)) ==
 				rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE);
 }
diff --git a/drivers/net/intel/ice/ice_dcf_ethdev.c b/drivers/net/intel/ice/ice_dcf_ethdev.c
index ab1d499cef..5f537b4c12 100644
--- a/drivers/net/intel/ice/ice_dcf_ethdev.c
+++ b/drivers/net/intel/ice/ice_dcf_ethdev.c
@@ -401,11 +401,11 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	txe = txq->sw_ring;
 	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
-		((volatile char *)txq->ice_tx_ring)[i] = 0;
+		((volatile char *)txq->ci_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		txq->ice_tx_ring[i].cmd_type_offset_bsz =
+		txq->ci_tx_ring[i].cmd_type_offset_bsz =
 			rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 74b80e7df3..e3ffbdb587 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -1117,11 +1117,11 @@ ice_reset_tx_queue(struct ci_tx_queue *txq)
 	txe = txq->sw_ring;
 	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
-		((volatile char *)txq->ice_tx_ring)[i] = 0;
+		((volatile char *)txq->ci_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		volatile struct ci_tx_desc *txd = &txq->ice_tx_ring[i];
+		volatile struct ci_tx_desc *txd = &txq->ci_tx_ring[i];
 
 		txd->cmd_type_offset_bsz =
 			rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE);
@@ -1625,7 +1625,7 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
 	/* Allocate TX hardware ring descriptors. */
 	ring_size = sizeof(struct ci_tx_desc) * ICE_MAX_NUM_DESC_BY_MAC(hw);
 	ring_size = RTE_ALIGN(ring_size, ICE_DMA_MEM_ALIGN);
-	tz = rte_eth_dma_zone_reserve(dev, "ice_tx_ring", queue_idx,
+	tz = rte_eth_dma_zone_reserve(dev, "ci_tx_ring", queue_idx,
 				      ring_size, ICE_RING_BASE_ALIGN,
 				      socket_id);
 	if (!tz) {
@@ -1649,7 +1649,7 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
 	txq->tx_deferred_start = tx_conf->tx_deferred_start;
 
 	txq->tx_ring_dma = tz->iova;
-	txq->ice_tx_ring = tz->addr;
+	txq->ci_tx_ring = tz->addr;
 
 	/* Allocate software ring */
 	txq->sw_ring =
@@ -2555,7 +2555,7 @@ ice_tx_descriptor_status(void *tx_queue, uint16_t offset)
 			desc -= txq->nb_tx_desc;
 	}
 
-	status = &txq->ice_tx_ring[desc].cmd_type_offset_bsz;
+	status = &txq->ci_tx_ring[desc].cmd_type_offset_bsz;
 	mask = rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M);
 	expect = rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE <<
 				  ICE_TXD_QW1_DTYPE_S);
@@ -2638,7 +2638,7 @@ ice_fdir_setup_tx_resources(struct ice_pf *pf)
 	txq->ice_vsi = pf->fdir.fdir_vsi;
 
 	txq->tx_ring_dma = tz->iova;
-	txq->ice_tx_ring = (struct ci_tx_desc *)tz->addr;
+	txq->ci_tx_ring = (struct ci_tx_desc *)tz->addr;
 	/*
 	 * don't need to allocate software ring and reset for the fdir
 	 * program queue just set the queue has been configured.
@@ -3027,7 +3027,7 @@ static inline int
 ice_xmit_cleanup(struct ci_tx_queue *txq)
 {
 	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	volatile struct ci_tx_desc *txd = txq->ice_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
 	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
 	uint16_t nb_tx_desc = txq->nb_tx_desc;
 	uint16_t desc_to_clean_to;
@@ -3148,7 +3148,7 @@ uint16_t
 ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq;
-	volatile struct ci_tx_desc *ice_tx_ring;
+	volatile struct ci_tx_desc *ci_tx_ring;
 	volatile struct ci_tx_desc *txd;
 	struct ci_tx_entry *sw_ring;
 	struct ci_tx_entry *txe, *txn;
@@ -3171,7 +3171,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 	txq = tx_queue;
 	sw_ring = txq->sw_ring;
-	ice_tx_ring = txq->ice_tx_ring;
+	ci_tx_ring = txq->ci_tx_ring;
 	tx_id = txq->tx_tail;
 	txe = &sw_ring[tx_id];
 
@@ -3257,7 +3257,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			/* Setup TX context descriptor if required */
 			volatile struct ice_tx_ctx_desc *ctx_txd =
 				(volatile struct ice_tx_ctx_desc *)
-					&ice_tx_ring[tx_id];
+					&ci_tx_ring[tx_id];
 			uint16_t cd_l2tag2 = 0;
 			uint64_t cd_type_cmd_tso_mss = ICE_TX_DESC_DTYPE_CTX;
 
@@ -3299,7 +3299,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		m_seg = tx_pkt;
 
 		do {
-			txd = &ice_tx_ring[tx_id];
+			txd = &ci_tx_ring[tx_id];
 			txn = &sw_ring[txe->next_id];
 
 			if (txe->mbuf)
@@ -3327,7 +3327,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 				txe->last_id = tx_last;
 				tx_id = txe->next_id;
 				txe = txn;
-				txd = &ice_tx_ring[tx_id];
+				txd = &ci_tx_ring[tx_id];
 				txn = &sw_ring[txe->next_id];
 			}
 
@@ -3410,7 +3410,7 @@ ice_tx_free_bufs(struct ci_tx_queue *txq)
 	struct ci_tx_entry *txep;
 	uint16_t i;
 
-	if ((txq->ice_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
+	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
 	     rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M)) !=
 	    rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE))
 		return 0;
@@ -3594,7 +3594,7 @@ static inline void
 ice_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
 		    uint16_t nb_pkts)
 {
-	volatile struct ci_tx_desc *txdp = &txq->ice_tx_ring[txq->tx_tail];
+	volatile struct ci_tx_desc *txdp = &txq->ci_tx_ring[txq->tx_tail];
 	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
 	const int N_PER_LOOP = 4;
 	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
@@ -3627,7 +3627,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
 	     uint16_t nb_pkts)
 {
-	volatile struct ci_tx_desc *txr = txq->ice_tx_ring;
+	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
 	uint16_t n = 0;
 
 	/**
@@ -4887,11 +4887,11 @@ ice_fdir_programming(struct ice_pf *pf, struct ice_fltr_desc *fdir_desc)
 	uint16_t i;
 
 	fdirdp = (volatile struct ice_fltr_desc *)
-		(&txq->ice_tx_ring[txq->tx_tail]);
+		(&txq->ci_tx_ring[txq->tx_tail]);
 	fdirdp->qidx_compq_space_stat = fdir_desc->qidx_compq_space_stat;
 	fdirdp->dtype_cmd_vsi_fdid = fdir_desc->dtype_cmd_vsi_fdid;
 
-	txdp = &txq->ice_tx_ring[txq->tx_tail + 1];
+	txdp = &txq->ci_tx_ring[txq->tx_tail + 1];
 	txdp->buffer_addr = rte_cpu_to_le_64(pf->fdir.dma_addr);
 	td_cmd = ICE_TX_DESC_CMD_EOP |
 		ICE_TX_DESC_CMD_RS  |
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
index bef7bb00ba..0a1df0b2f6 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
@@ -869,7 +869,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->ice_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = &txq->sw_ring_vec[tx_id];
 
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
@@ -890,7 +890,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->ice_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = &txq->sw_ring_vec[tx_id];
 	}
 
@@ -900,7 +900,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->ice_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS) <<
 					 ICE_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
index 1f6bf5fc8e..d42f41461f 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
@@ -933,7 +933,7 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->ice_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = (void *)txq->sw_ring;
 	txep += tx_id;
 
@@ -955,7 +955,7 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = txq->ice_tx_ring;
+		txdp = txq->ci_tx_ring;
 		txep = (void *)txq->sw_ring;
 	}
 
@@ -965,7 +965,7 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->ice_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS) <<
 					 ICE_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_common.h b/drivers/net/intel/ice/ice_rxtx_vec_common.h
index ff46a8fb49..8ba591e403 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_common.h
+++ b/drivers/net/intel/ice/ice_rxtx_vec_common.h
@@ -11,7 +11,7 @@
 static inline int
 ice_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 {
-	return (txq->ice_tx_ring[idx].cmd_type_offset_bsz &
+	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
 			rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M)) ==
 				rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE);
 }
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index be3c1ef216..51074bda3a 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -266,11 +266,11 @@ idpf_qc_single_tx_queue_reset(struct ci_tx_queue *txq)
 	txe = txq->sw_ring;
 	size = sizeof(struct ci_tx_desc) * txq->nb_tx_desc;
 	for (i = 0; i < size; i++)
-		((volatile char *)txq->idpf_tx_ring)[i] = 0;
+		((volatile char *)txq->ci_tx_ring)[i] = 0;
 
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
-		txq->idpf_tx_ring[i].cmd_type_offset_bsz =
+		txq->ci_tx_ring[i].cmd_type_offset_bsz =
 			rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
@@ -1335,7 +1335,7 @@ idpf_xmit_cleanup(struct ci_tx_queue *txq)
 	uint16_t desc_to_clean_to;
 	uint16_t nb_tx_to_clean;
 
-	volatile struct ci_tx_desc *txd = txq->idpf_tx_ring;
+	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
 
 	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
 	if (desc_to_clean_to >= nb_tx_desc)
@@ -1398,7 +1398,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		return nb_tx;
 
 	sw_ring = txq->sw_ring;
-	txr = txq->idpf_tx_ring;
+	txr = txq->ci_tx_ring;
 	tx_id = txq->tx_tail;
 	txe = &sw_ring[tx_id];
 
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c b/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
index 5f5d538dcb..04efee3722 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
@@ -573,7 +573,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->idpf_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = &txq->sw_ring_vec[tx_id];
 
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
@@ -594,7 +594,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->idpf_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = &txq->sw_ring_vec[tx_id];
 	}
 
@@ -604,7 +604,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->idpf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)IDPF_TX_DESC_CMD_RS) <<
 					 IDPF_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
index c1ec3d1222..d5e5a2ca5f 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
@@ -1090,7 +1090,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pk
 		return 0;
 
 	tx_id = txq->tx_tail;
-	txdp = &txq->idpf_tx_ring[tx_id];
+	txdp = &txq->ci_tx_ring[tx_id];
 	txep = (void *)txq->sw_ring;
 	txep += tx_id;
 
@@ -1112,7 +1112,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pk
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 		/* avoid reach the end of ring */
-		txdp = &txq->idpf_tx_ring[tx_id];
+		txdp = &txq->ci_tx_ring[tx_id];
 		txep = (void *)txq->sw_ring;
 		txep += tx_id;
 	}
@@ -1123,7 +1123,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pk
 
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
-		txq->idpf_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
+		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)IDPF_TX_DESC_CMD_RS) <<
 					 IDPF_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
diff --git a/drivers/net/intel/idpf/idpf_rxtx.c b/drivers/net/intel/idpf/idpf_rxtx.c
index 9317c8b175..7d9c885458 100644
--- a/drivers/net/intel/idpf/idpf_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_rxtx.c
@@ -481,7 +481,7 @@ idpf_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 	}
 
 	if (!is_splitq) {
-		txq->idpf_tx_ring = mz->addr;
+		txq->ci_tx_ring = mz->addr;
 		idpf_qc_single_tx_queue_reset(txq);
 	} else {
 		txq->desc_ring = mz->addr;
diff --git a/drivers/net/intel/idpf/idpf_rxtx_vec_common.h b/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
index 4702061484..b5e8574667 100644
--- a/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
+++ b/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
@@ -31,7 +31,7 @@ idpf_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 	if (txq->complq != NULL)
 		return 1;
 
-	return (txq->idpf_tx_ring[idx].cmd_type_offset_bsz &
+	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
 			rte_cpu_to_le_64(IDPF_TXD_QW1_DTYPE_M)) ==
 				rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE);
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v5 04/35] net/intel: create common post-Tx cleanup function
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (2 preceding siblings ...)
  2026-02-11 18:12   ` [PATCH v5 03/35] net/intel: use common Tx ring structure Bruce Richardson
@ 2026-02-11 18:12   ` Bruce Richardson
  2026-02-11 18:12   ` [PATCH v5 05/35] net/intel: consolidate definitions for Tx desc fields Bruce Richardson
                     ` (31 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 18:12 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Anatoly Burakov, Vladimir Medvedkin,
	Jingjing Wu, Praveen Shetty

The code used in ice, iavf, idpf and i40e for doing cleanup of mbufs
after they had been transmitted was identical. Therefore deduplicate it
by moving to common and remove the driver-specific versions.

Rather than having all Tx code in the one file, which could start
getting rather long, create a new header file for scalar datapath
functions.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 drivers/net/intel/common/tx.h             |  5 ++
 drivers/net/intel/common/tx_scalar.h      | 62 +++++++++++++++++++++++
 drivers/net/intel/i40e/i40e_rxtx.c        | 49 ++----------------
 drivers/net/intel/iavf/iavf_rxtx.c        | 50 ++----------------
 drivers/net/intel/ice/ice_rxtx.c          | 60 ++--------------------
 drivers/net/intel/idpf/idpf_common_rxtx.c | 46 ++---------------
 6 files changed, 85 insertions(+), 187 deletions(-)
 create mode 100644 drivers/net/intel/common/tx_scalar.h

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 8cf63e59ab..558c861df0 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -377,4 +377,9 @@ ci_tx_path_select(const struct ci_tx_path_features *req_features,
 	return idx;
 }
 
+/* include the scalar functions at the end, so they can use the common definitions.
+ * This is done so drivers can use all functions just by including tx.h
+ */
+#include "tx_scalar.h"
+
 #endif /* _COMMON_INTEL_TX_H_ */
diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
new file mode 100644
index 0000000000..181629d856
--- /dev/null
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2025 Intel Corporation
+ */
+
+#ifndef _COMMON_INTEL_TX_SCALAR_H_
+#define _COMMON_INTEL_TX_SCALAR_H_
+
+#include <stdint.h>
+#include <rte_byteorder.h>
+
+/* depends on common Tx definitions. */
+#include "tx.h"
+
+/*
+ * Common transmit descriptor cleanup function for Intel drivers.
+ *
+ * Returns:
+ *   0 on success
+ *  -1 if cleanup cannot proceed (descriptors not yet processed by HW)
+ */
+static __rte_always_inline int
+ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
+{
+	struct ci_tx_entry *sw_ring = txq->sw_ring;
+	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
+	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
+	uint16_t nb_tx_desc = txq->nb_tx_desc;
+	uint16_t desc_to_clean_to;
+	uint16_t nb_tx_to_clean;
+
+	/* Determine the last descriptor needing to be cleaned */
+	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
+	if (desc_to_clean_to >= nb_tx_desc)
+		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
+
+	/* Check if descriptor is done - all drivers use 0xF as done value in bits 3:0 */
+	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
+	if ((txd[desc_to_clean_to].cmd_type_offset_bsz & rte_cpu_to_le_64(0x0FUL)) !=
+			rte_cpu_to_le_64(0x0FUL))
+		return -1;
+
+	/* Figure out how many descriptors will be cleaned */
+	if (last_desc_cleaned > desc_to_clean_to)
+		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) + desc_to_clean_to);
+	else
+		nb_tx_to_clean = (uint16_t)(desc_to_clean_to - last_desc_cleaned);
+
+	/* The last descriptor to clean is done, so that means all the
+	 * descriptors from the last descriptor that was cleaned
+	 * up to the last descriptor with the RS bit set
+	 * are done. Only reset the threshold descriptor.
+	 */
+	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
+
+	/* Update the txq to reflect the last descriptor that was cleaned */
+	txq->last_desc_cleaned = desc_to_clean_to;
+	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
+
+	return 0;
+}
+
+#endif /* _COMMON_INTEL_TX_SCALAR_H_ */
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 210fc0201e..2760e76e99 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -384,45 +384,6 @@ i40e_build_ctob(uint32_t td_cmd,
 			((uint64_t)td_tag  << I40E_TXD_QW1_L2TAG1_SHIFT));
 }
 
-static inline int
-i40e_xmit_cleanup(struct ci_tx_queue *txq)
-{
-	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
-	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
-	uint16_t nb_tx_desc = txq->nb_tx_desc;
-	uint16_t desc_to_clean_to;
-	uint16_t nb_tx_to_clean;
-
-	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
-	if (desc_to_clean_to >= nb_tx_desc)
-		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
-
-	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-	if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) !=
-			rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE)) {
-		PMD_TX_LOG(DEBUG, "TX descriptor %4u is not done "
-			   "(port=%d queue=%d)", desc_to_clean_to,
-			   txq->port_id, txq->queue_id);
-		return -1;
-	}
-
-	if (last_desc_cleaned > desc_to_clean_to)
-		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) +
-							desc_to_clean_to);
-	else
-		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
-					last_desc_cleaned);
-
-	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
-
-	txq->last_desc_cleaned = desc_to_clean_to;
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
-
-	return 0;
-}
-
 static inline int
 #ifdef RTE_LIBRTE_I40E_RX_ALLOW_BULK_ALLOC
 check_rx_burst_bulk_alloc_preconditions(struct ci_rx_queue *rxq)
@@ -1118,7 +1079,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 	/* Check if the descriptor ring needs to be cleaned. */
 	if (txq->nb_tx_free < txq->tx_free_thresh)
-		(void)i40e_xmit_cleanup(txq);
+		(void)ci_tx_xmit_cleanup(txq);
 
 	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
 		td_cmd = 0;
@@ -1159,14 +1120,14 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
 
 		if (nb_used > txq->nb_tx_free) {
-			if (i40e_xmit_cleanup(txq) != 0) {
+			if (ci_tx_xmit_cleanup(txq) != 0) {
 				if (nb_tx == 0)
 					return 0;
 				goto end_of_tx;
 			}
 			if (unlikely(nb_used > txq->tx_rs_thresh)) {
 				while (nb_used > txq->nb_tx_free) {
-					if (i40e_xmit_cleanup(txq) != 0) {
+					if (ci_tx_xmit_cleanup(txq) != 0) {
 						if (nb_tx == 0)
 							return 0;
 						goto end_of_tx;
@@ -2808,7 +2769,7 @@ i40e_tx_done_cleanup_full(struct ci_tx_queue *txq,
 	tx_last = txq->tx_tail;
 	tx_id  = swr_ring[tx_last].next_id;
 
-	if (txq->nb_tx_free == 0 && i40e_xmit_cleanup(txq))
+	if (txq->nb_tx_free == 0 && ci_tx_xmit_cleanup(txq))
 		return 0;
 
 	nb_tx_to_clean = txq->nb_tx_free;
@@ -2842,7 +2803,7 @@ i40e_tx_done_cleanup_full(struct ci_tx_queue *txq,
 			break;
 
 		if (pkt_cnt < free_cnt) {
-			if (i40e_xmit_cleanup(txq))
+			if (ci_tx_xmit_cleanup(txq))
 				break;
 
 			nb_tx_to_clean = txq->nb_tx_free - nb_tx_free_last;
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index 807bc92a45..560abfc1ef 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -2324,46 +2324,6 @@ iavf_recv_pkts_bulk_alloc(void *rx_queue,
 	return nb_rx;
 }
 
-static inline int
-iavf_xmit_cleanup(struct ci_tx_queue *txq)
-{
-	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
-	uint16_t nb_tx_desc = txq->nb_tx_desc;
-	uint16_t desc_to_clean_to;
-	uint16_t nb_tx_to_clean;
-
-	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
-
-	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
-	if (desc_to_clean_to >= nb_tx_desc)
-		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
-
-	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-	if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(IAVF_TXD_QW1_DTYPE_MASK)) !=
-			rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE)) {
-		PMD_TX_LOG(DEBUG, "TX descriptor %4u is not done "
-			   "(port=%d queue=%d)", desc_to_clean_to,
-			   txq->port_id, txq->queue_id);
-		return -1;
-	}
-
-	if (last_desc_cleaned > desc_to_clean_to)
-		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) +
-							desc_to_clean_to);
-	else
-		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
-					last_desc_cleaned);
-
-	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
-
-	txq->last_desc_cleaned = desc_to_clean_to;
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
-
-	return 0;
-}
-
 /* Check if the context descriptor is needed for TX offloading */
 static inline uint16_t
 iavf_calc_context_desc(struct rte_mbuf *mb, uint8_t vlan_flag)
@@ -2768,7 +2728,7 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 	/* Check if the descriptor ring needs to be cleaned. */
 	if (txq->nb_tx_free < txq->tx_free_thresh)
-		iavf_xmit_cleanup(txq);
+		ci_tx_xmit_cleanup(txq);
 
 	desc_idx = txq->tx_tail;
 	txe = &txe_ring[desc_idx];
@@ -2823,14 +2783,14 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			txq->port_id, txq->queue_id, desc_idx, desc_idx_last);
 
 		if (nb_desc_required > txq->nb_tx_free) {
-			if (iavf_xmit_cleanup(txq)) {
+			if (ci_tx_xmit_cleanup(txq)) {
 				if (idx == 0)
 					return 0;
 				goto end_of_tx;
 			}
 			if (unlikely(nb_desc_required > txq->tx_rs_thresh)) {
 				while (nb_desc_required > txq->nb_tx_free) {
-					if (iavf_xmit_cleanup(txq)) {
+					if (ci_tx_xmit_cleanup(txq)) {
 						if (idx == 0)
 							return 0;
 						goto end_of_tx;
@@ -4300,7 +4260,7 @@ iavf_tx_done_cleanup_full(struct ci_tx_queue *txq,
 	tx_id = txq->tx_tail;
 	tx_last = tx_id;
 
-	if (txq->nb_tx_free == 0 && iavf_xmit_cleanup(txq))
+	if (txq->nb_tx_free == 0 && ci_tx_xmit_cleanup(txq))
 		return 0;
 
 	nb_tx_to_clean = txq->nb_tx_free;
@@ -4332,7 +4292,7 @@ iavf_tx_done_cleanup_full(struct ci_tx_queue *txq,
 			break;
 
 		if (pkt_cnt < free_cnt) {
-			if (iavf_xmit_cleanup(txq))
+			if (ci_tx_xmit_cleanup(txq))
 				break;
 
 			nb_tx_to_clean = txq->nb_tx_free - nb_tx_free_last;
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index e3ffbdb587..7a33e1e980 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3023,56 +3023,6 @@ ice_txd_enable_checksum(uint64_t ol_flags,
 	}
 }
 
-static inline int
-ice_xmit_cleanup(struct ci_tx_queue *txq)
-{
-	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
-	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
-	uint16_t nb_tx_desc = txq->nb_tx_desc;
-	uint16_t desc_to_clean_to;
-	uint16_t nb_tx_to_clean;
-
-	/* Determine the last descriptor needing to be cleaned */
-	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
-	if (desc_to_clean_to >= nb_tx_desc)
-		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
-
-	/* Check to make sure the last descriptor to clean is done */
-	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-	if (!(txd[desc_to_clean_to].cmd_type_offset_bsz &
-	    rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE))) {
-		PMD_TX_LOG(DEBUG, "TX descriptor %4u is not done "
-			   "(port=%d queue=%d) value=0x%"PRIx64,
-			   desc_to_clean_to,
-			   txq->port_id, txq->queue_id,
-			   txd[desc_to_clean_to].cmd_type_offset_bsz);
-		/* Failed to clean any descriptors */
-		return -1;
-	}
-
-	/* Figure out how many descriptors will be cleaned */
-	if (last_desc_cleaned > desc_to_clean_to)
-		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) +
-					    desc_to_clean_to);
-	else
-		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
-					    last_desc_cleaned);
-
-	/* The last descriptor to clean is done, so that means all the
-	 * descriptors from the last descriptor that was cleaned
-	 * up to the last descriptor with the RS bit set
-	 * are done. Only reset the threshold descriptor.
-	 */
-	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
-
-	/* Update the txq to reflect the last descriptor that was cleaned */
-	txq->last_desc_cleaned = desc_to_clean_to;
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
-
-	return 0;
-}
-
 /* Construct the tx flags */
 static inline uint64_t
 ice_build_ctob(uint32_t td_cmd,
@@ -3180,7 +3130,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 	/* Check if the descriptor ring needs to be cleaned. */
 	if (txq->nb_tx_free < txq->tx_free_thresh)
-		(void)ice_xmit_cleanup(txq);
+		(void)ci_tx_xmit_cleanup(txq);
 
 	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
 		tx_pkt = *tx_pkts++;
@@ -3217,14 +3167,14 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
 
 		if (nb_used > txq->nb_tx_free) {
-			if (ice_xmit_cleanup(txq) != 0) {
+			if (ci_tx_xmit_cleanup(txq) != 0) {
 				if (nb_tx == 0)
 					return 0;
 				goto end_of_tx;
 			}
 			if (unlikely(nb_used > txq->tx_rs_thresh)) {
 				while (nb_used > txq->nb_tx_free) {
-					if (ice_xmit_cleanup(txq) != 0) {
+					if (ci_tx_xmit_cleanup(txq) != 0) {
 						if (nb_tx == 0)
 							return 0;
 						goto end_of_tx;
@@ -3459,7 +3409,7 @@ ice_tx_done_cleanup_full(struct ci_tx_queue *txq,
 	tx_last = txq->tx_tail;
 	tx_id  = swr_ring[tx_last].next_id;
 
-	if (txq->nb_tx_free == 0 && ice_xmit_cleanup(txq))
+	if (txq->nb_tx_free == 0 && ci_tx_xmit_cleanup(txq))
 		return 0;
 
 	nb_tx_to_clean = txq->nb_tx_free;
@@ -3493,7 +3443,7 @@ ice_tx_done_cleanup_full(struct ci_tx_queue *txq,
 			break;
 
 		if (pkt_cnt < free_cnt) {
-			if (ice_xmit_cleanup(txq))
+			if (ci_tx_xmit_cleanup(txq))
 				break;
 
 			nb_tx_to_clean = txq->nb_tx_free - nb_tx_free_last;
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 51074bda3a..23666539ab 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -1326,46 +1326,6 @@ idpf_dp_singleq_recv_scatter_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 	return nb_rx;
 }
 
-static inline int
-idpf_xmit_cleanup(struct ci_tx_queue *txq)
-{
-	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
-	struct ci_tx_entry *sw_ring = txq->sw_ring;
-	uint16_t nb_tx_desc = txq->nb_tx_desc;
-	uint16_t desc_to_clean_to;
-	uint16_t nb_tx_to_clean;
-
-	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
-
-	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
-	if (desc_to_clean_to >= nb_tx_desc)
-		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
-
-	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-	if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
-	     rte_cpu_to_le_64(IDPF_TXD_QW1_DTYPE_M)) !=
-	    rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE)) {
-		TX_LOG(DEBUG, "TX descriptor %4u is not done "
-		       "(port=%d queue=%d)", desc_to_clean_to,
-		       txq->port_id, txq->queue_id);
-		return -1;
-	}
-
-	if (last_desc_cleaned > desc_to_clean_to)
-		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) +
-					    desc_to_clean_to);
-	else
-		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
-					    last_desc_cleaned);
-
-	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
-
-	txq->last_desc_cleaned = desc_to_clean_to;
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
-
-	return 0;
-}
-
 /* TX function */
 RTE_EXPORT_INTERNAL_SYMBOL(idpf_dp_singleq_xmit_pkts)
 uint16_t
@@ -1404,7 +1364,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	/* Check if the descriptor ring needs to be cleaned. */
 	if (txq->nb_tx_free < txq->tx_free_thresh)
-		(void)idpf_xmit_cleanup(txq);
+		(void)ci_tx_xmit_cleanup(txq);
 
 	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
 		td_cmd = 0;
@@ -1437,14 +1397,14 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		       txq->port_id, txq->queue_id, tx_id, tx_last);
 
 		if (nb_used > txq->nb_tx_free) {
-			if (idpf_xmit_cleanup(txq) != 0) {
+			if (ci_tx_xmit_cleanup(txq) != 0) {
 				if (nb_tx == 0)
 					return 0;
 				goto end_of_tx;
 			}
 			if (unlikely(nb_used > txq->tx_rs_thresh)) {
 				while (nb_used > txq->nb_tx_free) {
-					if (idpf_xmit_cleanup(txq) != 0) {
+					if (ci_tx_xmit_cleanup(txq) != 0) {
 						if (nb_tx == 0)
 							return 0;
 						goto end_of_tx;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v5 05/35] net/intel: consolidate definitions for Tx desc fields
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (3 preceding siblings ...)
  2026-02-11 18:12   ` [PATCH v5 04/35] net/intel: create common post-Tx cleanup function Bruce Richardson
@ 2026-02-11 18:12   ` Bruce Richardson
  2026-02-11 18:12   ` [PATCH v5 06/35] net/intel: add common fn to calculate needed descriptors Bruce Richardson
                     ` (30 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 18:12 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Anatoly Burakov, Vladimir Medvedkin,
	Jingjing Wu, Praveen Shetty

The offsets of the various fields within the Tx descriptors are common
for i40e, iavf, ice and idpf, so put a single set of defines in tx.h and
use those throughout all drivers.

NOTE: there was a small difference in mask of CMD field between drivers
depending on whether reserved fields or not were included. Those can be
ignored as those bits are unused in the drivers for which they are
reserved.

Similarly, the various flag fields, such as End-of-packet (EOP) and
Report-status (RS) are the same, as are offload definitions so
consolidate them.

Original definitions are in base code, and are left in place because of
that, but are unused.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 drivers/net/intel/common/tx.h                 |  60 ++++++++
 drivers/net/intel/common/tx_scalar.h          |   6 +-
 drivers/net/intel/i40e/i40e_fdir.c            |  24 +--
 drivers/net/intel/i40e/i40e_rxtx.c            |  92 ++++++------
 drivers/net/intel/i40e/i40e_rxtx.h            |  17 +--
 .../net/intel/i40e/i40e_rxtx_vec_altivec.c    |  11 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c   |  22 ++-
 drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c |  38 ++---
 drivers/net/intel/i40e/i40e_rxtx_vec_common.h |   4 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_neon.c   |  11 +-
 drivers/net/intel/iavf/iavf_rxtx.c            |  68 +++++----
 drivers/net/intel/iavf/iavf_rxtx.h            |  20 +--
 drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c   |  41 ++----
 drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c |  80 ++++------
 drivers/net/intel/iavf/iavf_rxtx_vec_common.h |  34 ++---
 drivers/net/intel/ice/ice_dcf_ethdev.c        |   2 +-
 drivers/net/intel/ice/ice_rxtx.c              | 137 ++++++++----------
 drivers/net/intel/ice/ice_rxtx.h              |  15 +-
 drivers/net/intel/ice/ice_rxtx_vec_avx2.c     |  41 ++----
 drivers/net/intel/ice/ice_rxtx_vec_avx512.c   |  39 ++---
 drivers/net/intel/ice/ice_rxtx_vec_common.h   |  41 +++---
 drivers/net/intel/idpf/idpf_common_rxtx.c     |  22 +--
 drivers/net/intel/idpf/idpf_common_rxtx.h     |  12 --
 .../net/intel/idpf/idpf_common_rxtx_avx2.c    |  41 ++----
 .../net/intel/idpf/idpf_common_rxtx_avx512.c  |  41 ++----
 drivers/net/intel/idpf/idpf_rxtx_vec_common.h |   4 +-
 26 files changed, 409 insertions(+), 514 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 558c861df0..091f220f1c 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -10,6 +10,66 @@
 #include <rte_ethdev.h>
 #include <rte_vect.h>
 
+/* Common TX Descriptor QW1 Field Definitions */
+#define CI_TXD_QW1_DTYPE_S      0
+#define CI_TXD_QW1_DTYPE_M      (0xFUL << CI_TXD_QW1_DTYPE_S)
+#define CI_TXD_QW1_CMD_S        4
+#define CI_TXD_QW1_CMD_M        (0xFFFUL << CI_TXD_QW1_CMD_S)
+#define CI_TXD_QW1_OFFSET_S     16
+#define CI_TXD_QW1_OFFSET_M     (0x3FFFFULL << CI_TXD_QW1_OFFSET_S)
+#define CI_TXD_QW1_TX_BUF_SZ_S  34
+#define CI_TXD_QW1_TX_BUF_SZ_M  (0x3FFFULL << CI_TXD_QW1_TX_BUF_SZ_S)
+#define CI_TXD_QW1_L2TAG1_S     48
+#define CI_TXD_QW1_L2TAG1_M     (0xFFFFULL << CI_TXD_QW1_L2TAG1_S)
+
+/* Common Descriptor Types */
+#define CI_TX_DESC_DTYPE_DATA           0x0
+#define CI_TX_DESC_DTYPE_CTX            0x1
+#define CI_TX_DESC_DTYPE_DESC_DONE      0xF
+
+/* Common TX Descriptor Command Flags */
+#define CI_TX_DESC_CMD_EOP              0x0001
+#define CI_TX_DESC_CMD_RS               0x0002
+#define CI_TX_DESC_CMD_ICRC             0x0004
+#define CI_TX_DESC_CMD_IL2TAG1          0x0008
+#define CI_TX_DESC_CMD_DUMMY            0x0010
+#define CI_TX_DESC_CMD_IIPT_IPV6        0x0020
+#define CI_TX_DESC_CMD_IIPT_IPV4        0x0040
+#define CI_TX_DESC_CMD_IIPT_IPV4_CSUM   0x0060
+#define CI_TX_DESC_CMD_L4T_EOFT_TCP     0x0100
+#define CI_TX_DESC_CMD_L4T_EOFT_SCTP    0x0200
+#define CI_TX_DESC_CMD_L4T_EOFT_UDP     0x0300
+
+/* Common TX Context Descriptor Commands */
+#define CI_TX_CTX_DESC_TSO              0x01
+#define CI_TX_CTX_DESC_TSYN             0x02
+#define CI_TX_CTX_DESC_IL2TAG2          0x04
+
+/* Common TX Descriptor Length Field Shifts */
+#define CI_TX_DESC_LEN_MACLEN_S         0  /* 7 BITS */
+#define CI_TX_DESC_LEN_IPLEN_S          7  /* 7 BITS */
+#define CI_TX_DESC_LEN_L4_LEN_S         14 /* 4 BITS */
+
+/* Common maximum data per TX descriptor */
+#define CI_MAX_DATA_PER_TXD     (CI_TXD_QW1_TX_BUF_SZ_M >> CI_TXD_QW1_TX_BUF_SZ_S)
+
+/**
+ * Common TX offload union for Intel drivers.
+ * Supports both basic offloads (l2_len, l3_len, l4_len, tso_segsz) and
+ * extended offloads (outer_l2_len, outer_l3_len) for tunneling support.
+ */
+union ci_tx_offload {
+	uint64_t data;
+	struct {
+		uint64_t l2_len:7;        /**< L2 (MAC) Header Length. */
+		uint64_t l3_len:9;        /**< L3 (IP) Header Length. */
+		uint64_t l4_len:8;        /**< L4 Header Length. */
+		uint64_t tso_segsz:16;    /**< TCP TSO segment size */
+		uint64_t outer_l2_len:8;  /**< outer L2 Header Length */
+		uint64_t outer_l3_len:16; /**< outer L3 Header Length */
+	};
+};
+
 /*
  * Structure of a 16-byte Tx descriptor common across i40e, ice, iavf and idpf drivers
  */
diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index 181629d856..6f2024273b 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -33,10 +33,10 @@ ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
 	if (desc_to_clean_to >= nb_tx_desc)
 		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
 
-	/* Check if descriptor is done - all drivers use 0xF as done value in bits 3:0 */
+	/* Check if descriptor is done */
 	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-	if ((txd[desc_to_clean_to].cmd_type_offset_bsz & rte_cpu_to_le_64(0x0FUL)) !=
-			rte_cpu_to_le_64(0x0FUL))
+	if ((txd[desc_to_clean_to].cmd_type_offset_bsz & rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
 		return -1;
 
 	/* Figure out how many descriptors will be cleaned */
diff --git a/drivers/net/intel/i40e/i40e_fdir.c b/drivers/net/intel/i40e/i40e_fdir.c
index 8a01aec0e2..3b099d5a9e 100644
--- a/drivers/net/intel/i40e/i40e_fdir.c
+++ b/drivers/net/intel/i40e/i40e_fdir.c
@@ -916,11 +916,11 @@ i40e_build_ctob(uint32_t td_cmd,
 		unsigned int size,
 		uint32_t td_tag)
 {
-	return rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DATA |
-			((uint64_t)td_cmd  << I40E_TXD_QW1_CMD_SHIFT) |
-			((uint64_t)td_offset << I40E_TXD_QW1_OFFSET_SHIFT) |
-			((uint64_t)size  << I40E_TXD_QW1_TX_BUF_SZ_SHIFT) |
-			((uint64_t)td_tag  << I40E_TXD_QW1_L2TAG1_SHIFT));
+	return rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+			((uint64_t)td_cmd  << CI_TXD_QW1_CMD_S) |
+			((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+			((uint64_t)size  << CI_TXD_QW1_TX_BUF_SZ_S) |
+			((uint64_t)td_tag  << CI_TXD_QW1_L2TAG1_S));
 }
 
 /*
@@ -1384,8 +1384,8 @@ i40e_find_available_buffer(struct rte_eth_dev *dev)
 
 		do {
 			if ((tmp_txdp->cmd_type_offset_bsz &
-					rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) ==
-					rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE))
+					rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+					rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
 				fdir_info->txq_available_buf_count++;
 			else
 				break;
@@ -1710,9 +1710,9 @@ i40e_flow_fdir_filter_programming(struct i40e_pf *pf,
 	txdp = &txq->ci_tx_ring[txq->tx_tail + 1];
 	txdp->buffer_addr = rte_cpu_to_le_64(pf->fdir.dma_addr[txq->tx_tail >> 1]);
 
-	td_cmd = I40E_TX_DESC_CMD_EOP |
-		 I40E_TX_DESC_CMD_RS  |
-		 I40E_TX_DESC_CMD_DUMMY;
+	td_cmd = CI_TX_DESC_CMD_EOP |
+		 CI_TX_DESC_CMD_RS  |
+		 CI_TX_DESC_CMD_DUMMY;
 
 	txdp->cmd_type_offset_bsz =
 		i40e_build_ctob(td_cmd, 0, I40E_FDIR_PKT_LEN, 0);
@@ -1731,8 +1731,8 @@ i40e_flow_fdir_filter_programming(struct i40e_pf *pf,
 	if (wait_status) {
 		for (i = 0; i < I40E_FDIR_MAX_WAIT_US; i++) {
 			if ((txdp->cmd_type_offset_bsz &
-					rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) ==
-					rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE))
+					rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+					rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
 				break;
 			rte_delay_us(1);
 		}
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 2760e76e99..f96c5c7f1e 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -45,7 +45,7 @@
 /* Base address of the HW descriptor ring should be 128B aligned. */
 #define I40E_RING_BASE_ALIGN	128
 
-#define I40E_TXD_CMD (I40E_TX_DESC_CMD_EOP | I40E_TX_DESC_CMD_RS)
+#define I40E_TXD_CMD (CI_TX_DESC_CMD_EOP | CI_TX_DESC_CMD_RS)
 
 #ifdef RTE_LIBRTE_IEEE1588
 #define I40E_TX_IEEE1588_TMST RTE_MBUF_F_TX_IEEE1588_TMST
@@ -260,7 +260,7 @@ i40e_rxd_build_fdir(volatile union ci_rx_desc *rxdp, struct rte_mbuf *mb)
 
 static inline void
 i40e_parse_tunneling_params(uint64_t ol_flags,
-			    union i40e_tx_offload tx_offload,
+			    union ci_tx_offload tx_offload,
 			    uint32_t *cd_tunneling)
 {
 	/* EIPT: External (outer) IP header type */
@@ -319,51 +319,51 @@ static inline void
 i40e_txd_enable_checksum(uint64_t ol_flags,
 			uint32_t *td_cmd,
 			uint32_t *td_offset,
-			union i40e_tx_offload tx_offload)
+			union ci_tx_offload tx_offload)
 {
 	/* Set MACLEN */
 	if (!(ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK))
 		*td_offset |= (tx_offload.l2_len >> 1)
-			<< I40E_TX_DESC_LENGTH_MACLEN_SHIFT;
+			<< CI_TX_DESC_LEN_MACLEN_S;
 
 	/* Enable L3 checksum offloads */
 	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
-		*td_cmd |= I40E_TX_DESC_CMD_IIPT_IPV4_CSUM;
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
 		*td_offset |= (tx_offload.l3_len >> 2)
-				<< I40E_TX_DESC_LENGTH_IPLEN_SHIFT;
+				<< CI_TX_DESC_LEN_IPLEN_S;
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
-		*td_cmd |= I40E_TX_DESC_CMD_IIPT_IPV4;
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
 		*td_offset |= (tx_offload.l3_len >> 2)
-				<< I40E_TX_DESC_LENGTH_IPLEN_SHIFT;
+				<< CI_TX_DESC_LEN_IPLEN_S;
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
-		*td_cmd |= I40E_TX_DESC_CMD_IIPT_IPV6;
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
 		*td_offset |= (tx_offload.l3_len >> 2)
-				<< I40E_TX_DESC_LENGTH_IPLEN_SHIFT;
+				<< CI_TX_DESC_LEN_IPLEN_S;
 	}
 
 	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
-		*td_cmd |= I40E_TX_DESC_CMD_L4T_EOFT_TCP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
 		*td_offset |= (tx_offload.l4_len >> 2)
-			<< I40E_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+			<< CI_TX_DESC_LEN_L4_LEN_S;
 		return;
 	}
 
 	/* Enable L4 checksum offloads */
 	switch (ol_flags & RTE_MBUF_F_TX_L4_MASK) {
 	case RTE_MBUF_F_TX_TCP_CKSUM:
-		*td_cmd |= I40E_TX_DESC_CMD_L4T_EOFT_TCP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
 		*td_offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
-				I40E_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+				CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_SCTP_CKSUM:
-		*td_cmd |= I40E_TX_DESC_CMD_L4T_EOFT_SCTP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
 		*td_offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
-				I40E_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+				CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_UDP_CKSUM:
-		*td_cmd |= I40E_TX_DESC_CMD_L4T_EOFT_UDP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
 		*td_offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
-				I40E_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+				CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	default:
 		break;
@@ -377,11 +377,11 @@ i40e_build_ctob(uint32_t td_cmd,
 		unsigned int size,
 		uint32_t td_tag)
 {
-	return rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DATA |
-			((uint64_t)td_cmd  << I40E_TXD_QW1_CMD_SHIFT) |
-			((uint64_t)td_offset << I40E_TXD_QW1_OFFSET_SHIFT) |
-			((uint64_t)size  << I40E_TXD_QW1_TX_BUF_SZ_SHIFT) |
-			((uint64_t)td_tag  << I40E_TXD_QW1_L2TAG1_SHIFT));
+	return rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+			((uint64_t)td_cmd  << CI_TXD_QW1_CMD_S) |
+			((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+			((uint64_t)size  << CI_TXD_QW1_TX_BUF_SZ_S) |
+			((uint64_t)td_tag  << CI_TXD_QW1_L2TAG1_S));
 }
 
 static inline int
@@ -1004,7 +1004,7 @@ i40e_calc_context_desc(uint64_t flags)
 
 /* set i40e TSO context descriptor */
 static inline uint64_t
-i40e_set_tso_ctx(struct rte_mbuf *mbuf, union i40e_tx_offload tx_offload)
+i40e_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 {
 	uint64_t ctx_desc = 0;
 	uint32_t cd_cmd, hdr_len, cd_tso_len;
@@ -1029,9 +1029,6 @@ i40e_set_tso_ctx(struct rte_mbuf *mbuf, union i40e_tx_offload tx_offload)
 	return ctx_desc;
 }
 
-/* HW requires that Tx buffer size ranges from 1B up to (16K-1)B. */
-#define I40E_MAX_DATA_PER_TXD \
-	(I40E_TXD_QW1_TX_BUF_SZ_MASK >> I40E_TXD_QW1_TX_BUF_SZ_SHIFT)
 /* Calculate the number of TX descriptors needed for each pkt */
 static inline uint16_t
 i40e_calc_pkt_desc(struct rte_mbuf *tx_pkt)
@@ -1040,7 +1037,7 @@ i40e_calc_pkt_desc(struct rte_mbuf *tx_pkt)
 	uint16_t count = 0;
 
 	while (txd != NULL) {
-		count += DIV_ROUND_UP(txd->data_len, I40E_MAX_DATA_PER_TXD);
+		count += DIV_ROUND_UP(txd->data_len, CI_MAX_DATA_PER_TXD);
 		txd = txd->next;
 	}
 
@@ -1069,7 +1066,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	uint16_t tx_last;
 	uint16_t slen;
 	uint64_t buf_dma_addr;
-	union i40e_tx_offload tx_offload = {0};
+	union ci_tx_offload tx_offload = {0};
 
 	txq = tx_queue;
 	sw_ring = txq->sw_ring;
@@ -1138,18 +1135,18 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		/* Descriptor based VLAN insertion */
 		if (ol_flags & (RTE_MBUF_F_TX_VLAN | RTE_MBUF_F_TX_QINQ)) {
-			td_cmd |= I40E_TX_DESC_CMD_IL2TAG1;
+			td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
 			td_tag = tx_pkt->vlan_tci;
 		}
 
 		/* Always enable CRC offload insertion */
-		td_cmd |= I40E_TX_DESC_CMD_ICRC;
+		td_cmd |= CI_TX_DESC_CMD_ICRC;
 
 		/* Fill in tunneling parameters if necessary */
 		cd_tunneling_params = 0;
 		if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) {
 			td_offset |= (tx_offload.outer_l2_len >> 1)
-					<< I40E_TX_DESC_LENGTH_MACLEN_SHIFT;
+					<< CI_TX_DESC_LEN_MACLEN_S;
 			i40e_parse_tunneling_params(ol_flags, tx_offload,
 						    &cd_tunneling_params);
 		}
@@ -1229,16 +1226,16 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			buf_dma_addr = rte_mbuf_data_iova(m_seg);
 
 			while ((ol_flags & RTE_MBUF_F_TX_TCP_SEG) &&
-				unlikely(slen > I40E_MAX_DATA_PER_TXD)) {
+				unlikely(slen > CI_MAX_DATA_PER_TXD)) {
 				txd->buffer_addr =
 					rte_cpu_to_le_64(buf_dma_addr);
 				txd->cmd_type_offset_bsz =
 					i40e_build_ctob(td_cmd,
-					td_offset, I40E_MAX_DATA_PER_TXD,
+					td_offset, CI_MAX_DATA_PER_TXD,
 					td_tag);
 
-				buf_dma_addr += I40E_MAX_DATA_PER_TXD;
-				slen -= I40E_MAX_DATA_PER_TXD;
+				buf_dma_addr += CI_MAX_DATA_PER_TXD;
+				slen -= CI_MAX_DATA_PER_TXD;
 
 				txe->last_id = tx_last;
 				tx_id = txe->next_id;
@@ -1265,7 +1262,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		} while (m_seg != NULL);
 
 		/* The last packet data descriptor needs End Of Packet (EOP) */
-		td_cmd |= I40E_TX_DESC_CMD_EOP;
+		td_cmd |= CI_TX_DESC_CMD_EOP;
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
@@ -1275,15 +1272,14 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 				   "%4u (port=%d queue=%d)",
 				   tx_last, txq->port_id, txq->queue_id);
 
-			td_cmd |= I40E_TX_DESC_CMD_RS;
+			td_cmd |= CI_TX_DESC_CMD_RS;
 
 			/* Update txq RS bit counters */
 			txq->nb_tx_used = 0;
 		}
 
 		txd->cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)td_cmd) <<
-					I40E_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S);
 	}
 
 end_of_tx:
@@ -1309,8 +1305,8 @@ i40e_tx_free_bufs(struct ci_tx_queue *txq)
 	const uint16_t m = tx_rs_thresh % I40E_TX_MAX_FREE_BUF_SZ;
 
 	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) !=
-			rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE))
+			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
 		return 0;
 
 	txep = &txq->sw_ring[txq->tx_next_dd - (tx_rs_thresh - 1)];
@@ -1441,8 +1437,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 		n = (uint16_t)(txq->nb_tx_desc - txq->tx_tail);
 		i40e_tx_fill_hw_ring(txq, tx_pkts, n);
 		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
-						I40E_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 		txq->tx_tail = 0;
 	}
@@ -1454,8 +1449,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	/* Determine if RS bit needs to be set */
 	if (txq->tx_tail > txq->tx_next_rs) {
 		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
-						I40E_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 		if (txq->tx_next_rs >= txq->nb_tx_desc)
@@ -2383,9 +2377,9 @@ i40e_dev_tx_descriptor_status(void *tx_queue, uint16_t offset)
 	}
 
 	status = &txq->ci_tx_ring[desc].cmd_type_offset_bsz;
-	mask = rte_le_to_cpu_64(I40E_TXD_QW1_DTYPE_MASK);
+	mask = rte_le_to_cpu_64(CI_TXD_QW1_DTYPE_M);
 	expect = rte_cpu_to_le_64(
-		I40E_TX_DESC_DTYPE_DESC_DONE << I40E_TXD_QW1_DTYPE_SHIFT);
+		CI_TX_DESC_DTYPE_DESC_DONE << CI_TXD_QW1_DTYPE_S);
 	if ((*status & mask) == expect)
 		return RTE_ETH_TX_DESC_DONE;
 
@@ -2883,7 +2877,7 @@ i40e_reset_tx_queue(struct ci_tx_queue *txq)
 		volatile struct ci_tx_desc *txd = &txq->ci_tx_ring[i];
 
 		txd->cmd_type_offset_bsz =
-			rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
 		txe[prev].next_id = i;
diff --git a/drivers/net/intel/i40e/i40e_rxtx.h b/drivers/net/intel/i40e/i40e_rxtx.h
index ed173d8f17..307ffa3049 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.h
+++ b/drivers/net/intel/i40e/i40e_rxtx.h
@@ -47,8 +47,8 @@
 #define I40E_RX_DESC_EXT_STATUS_FLEXBL_MASK   0x03
 #define I40E_RX_DESC_EXT_STATUS_FLEXBL_FLEX   0x01
 
-#define I40E_TD_CMD (I40E_TX_DESC_CMD_ICRC |\
-		     I40E_TX_DESC_CMD_EOP)
+#define I40E_TD_CMD (CI_TX_DESC_CMD_ICRC |\
+		     CI_TX_DESC_CMD_EOP)
 
 enum i40e_header_split_mode {
 	i40e_header_split_none = 0,
@@ -110,19 +110,6 @@ enum i40e_header_split_mode {
 
 #define I40E_TX_VECTOR_OFFLOADS RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE
 
-/** Offload features */
-union i40e_tx_offload {
-	uint64_t data;
-	struct {
-		uint64_t l2_len:7; /**< L2 (MAC) Header Length. */
-		uint64_t l3_len:9; /**< L3 (IP) Header Length. */
-		uint64_t l4_len:8; /**< L4 Header Length. */
-		uint64_t tso_segsz:16; /**< TCP TSO segment size */
-		uint64_t outer_l2_len:8; /**< outer L2 Header Length */
-		uint64_t outer_l3_len:16; /**< outer L3 Header Length */
-	};
-};
-
 int i40e_dev_rx_queue_start(struct rte_eth_dev *dev, uint16_t rx_queue_id);
 int i40e_dev_rx_queue_stop(struct rte_eth_dev *dev, uint16_t rx_queue_id);
 int i40e_dev_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id);
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
index 81e9e2bc0b..4c36748d94 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
@@ -449,9 +449,9 @@ static inline void
 vtx1(volatile struct ci_tx_desc *txdp,
 	struct rte_mbuf *pkt, uint64_t flags)
 {
-	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
-		((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT) |
-		((uint64_t)pkt->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+		((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	__vector unsigned long descriptor = (__vector unsigned long){
 		pkt->buf_iova + pkt->data_off, high_qw};
@@ -477,7 +477,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
-	uint64_t rs = I40E_TX_DESC_CMD_RS | I40E_TD_CMD;
+	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
 	int i;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
@@ -520,8 +520,7 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
-						I40E_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
index f054bd41bf..502a1842c6 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
@@ -684,9 +684,9 @@ static inline void
 vtx1(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf *pkt, uint64_t flags)
 {
-	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT) |
-			((uint64_t)pkt->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+			((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+			((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	__m128i descriptor = _mm_set_epi64x(high_qw,
 				pkt->buf_iova + pkt->data_off);
@@ -697,8 +697,7 @@ static inline void
 vtx(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
-	const uint64_t hi_qw_tmpl = (I40E_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags << CI_TXD_QW1_CMD_S));
 
 	/* if unaligned on 32-bit boundary, do one to align */
 	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
@@ -709,13 +708,13 @@ vtx(volatile struct ci_tx_desc *txdp,
 	/* do two at a time while possible, in bursts */
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
 		uint64_t hi_qw3 = hi_qw_tmpl |
-				((uint64_t)pkt[3]->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
+				((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		uint64_t hi_qw2 = hi_qw_tmpl |
-				((uint64_t)pkt[2]->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
+				((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		uint64_t hi_qw1 = hi_qw_tmpl |
-				((uint64_t)pkt[1]->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
+				((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		uint64_t hi_qw0 = hi_qw_tmpl |
-				((uint64_t)pkt[0]->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
+				((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 
 		__m256i desc2_3 = _mm256_set_epi64x(
 				hi_qw3, pkt[3]->buf_iova + pkt[3]->data_off,
@@ -743,7 +742,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
-	uint64_t rs = I40E_TX_DESC_CMD_RS | I40E_TD_CMD;
+	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
 		ci_tx_free_bufs_vec(txq, i40e_tx_desc_done, false);
@@ -785,8 +784,7 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
-						I40E_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
index 9a967faeee..d48ff9f51e 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
@@ -752,9 +752,9 @@ i40e_recv_scattered_pkts_vec_avx512(void *rx_queue,
 static inline void
 vtx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags)
 {
-	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
-		((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT) |
-		((uint64_t)pkt->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+		((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	__m128i descriptor = _mm_set_epi64x(high_qw,
 				pkt->buf_iova + pkt->data_off);
@@ -765,26 +765,17 @@ static inline void
 vtx(volatile struct ci_tx_desc *txdp,
 	struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
-	const uint64_t hi_qw_tmpl = (I40E_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags << CI_TXD_QW1_CMD_S));
 
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
-		uint64_t hi_qw3 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[3]->data_len <<
-			 I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
-		uint64_t hi_qw2 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[2]->data_len <<
-			 I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
-		uint64_t hi_qw1 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[1]->data_len <<
-			 I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
-		uint64_t hi_qw0 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[0]->data_len <<
-			 I40E_TXD_QW1_TX_BUF_SZ_SHIFT);
+		uint64_t hi_qw3 = hi_qw_tmpl |
+			((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw2 = hi_qw_tmpl |
+			((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw1 = hi_qw_tmpl |
+			((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw0 = hi_qw_tmpl |
+			((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 
 		__m512i desc0_3 =
 			_mm512_set_epi64
@@ -811,7 +802,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
-	uint64_t rs = I40E_TX_DESC_CMD_RS | I40E_TD_CMD;
+	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
 		ci_tx_free_bufs_vec(txq, i40e_tx_desc_done, false);
@@ -854,8 +845,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
-						I40E_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_common.h b/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
index 1fd7fc75bf..292a39501e 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_common.h
@@ -16,8 +16,8 @@ static inline int
 i40e_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 {
 	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) ==
-				rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+				rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 }
 
 static inline void
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
index 0b95152232..be4c64942e 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
@@ -600,9 +600,9 @@ static inline void
 vtx1(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf *pkt, uint64_t flags)
 {
-	uint64_t high_qw = (I40E_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << I40E_TXD_QW1_CMD_SHIFT) |
-			((uint64_t)pkt->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+			((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+			((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off, high_qw};
 	vst1q_u64(RTE_CAST_PTR(uint64_t *, txdp), descriptor);
@@ -627,7 +627,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict tx_queue,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = I40E_TD_CMD;
-	uint64_t rs = I40E_TX_DESC_CMD_RS | I40E_TD_CMD;
+	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
 	int i;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
@@ -669,8 +669,7 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict tx_queue,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)I40E_TX_DESC_CMD_RS) <<
-						I40E_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index 560abfc1ef..947b6c24d2 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -274,7 +274,7 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
 		txq->ci_tx_ring[i].cmd_type_offset_bsz =
-			rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
 		txe[prev].next_id = i;
@@ -2351,12 +2351,12 @@ iavf_fill_ctx_desc_cmd_field(volatile uint64_t *field, struct rte_mbuf *m,
 
 	/* TSO enabled */
 	if (m->ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
-		cmd = IAVF_TX_CTX_DESC_TSO << IAVF_TXD_CTX_QW1_CMD_SHIFT;
+		cmd = CI_TX_CTX_DESC_TSO << IAVF_TXD_CTX_QW1_CMD_SHIFT;
 
 	if ((m->ol_flags & RTE_MBUF_F_TX_VLAN &&
 			vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2) ||
 			m->ol_flags & RTE_MBUF_F_TX_QINQ) {
-		cmd |= IAVF_TX_CTX_DESC_IL2TAG2
+		cmd |= CI_TX_CTX_DESC_IL2TAG2
 			<< IAVF_TXD_CTX_QW1_CMD_SHIFT;
 	}
 
@@ -2577,20 +2577,20 @@ iavf_build_data_desc_cmd_offset_fields(volatile uint64_t *qw1,
 	uint64_t offset = 0;
 	uint64_t l2tag1 = 0;
 
-	*qw1 = IAVF_TX_DESC_DTYPE_DATA;
+	*qw1 = CI_TX_DESC_DTYPE_DATA;
 
-	command = (uint64_t)IAVF_TX_DESC_CMD_ICRC;
+	command = (uint64_t)CI_TX_DESC_CMD_ICRC;
 
 	/* Descriptor based VLAN insertion */
 	if ((vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1) &&
 			m->ol_flags & RTE_MBUF_F_TX_VLAN) {
-		command |= (uint64_t)IAVF_TX_DESC_CMD_IL2TAG1;
+		command |= (uint64_t)CI_TX_DESC_CMD_IL2TAG1;
 		l2tag1 |= m->vlan_tci;
 	}
 
 	/* Descriptor based QinQ insertion. vlan_flag specifies outer tag location. */
 	if (m->ol_flags & RTE_MBUF_F_TX_QINQ) {
-		command |= (uint64_t)IAVF_TX_DESC_CMD_IL2TAG1;
+		command |= (uint64_t)CI_TX_DESC_CMD_IL2TAG1;
 		l2tag1 = vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1 ? m->vlan_tci_outer :
 									m->vlan_tci;
 	}
@@ -2603,32 +2603,32 @@ iavf_build_data_desc_cmd_offset_fields(volatile uint64_t *qw1,
 	if (m->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK &&
 			!(m->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD))
 		offset |= (m->outer_l2_len >> 1)
-			<< IAVF_TX_DESC_LENGTH_MACLEN_SHIFT;
+			<< CI_TX_DESC_LEN_MACLEN_S;
 	else
 		offset |= (m->l2_len >> 1)
-			<< IAVF_TX_DESC_LENGTH_MACLEN_SHIFT;
+			<< CI_TX_DESC_LEN_MACLEN_S;
 
 	/* Enable L3 checksum offloading inner */
 	if (m->ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
 		if (m->ol_flags & RTE_MBUF_F_TX_IPV4) {
-			command |= IAVF_TX_DESC_CMD_IIPT_IPV4_CSUM;
-			offset |= (m->l3_len >> 2) << IAVF_TX_DESC_LENGTH_IPLEN_SHIFT;
+			command |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
+			offset |= (m->l3_len >> 2) << CI_TX_DESC_LEN_IPLEN_S;
 		}
 	} else if (m->ol_flags & RTE_MBUF_F_TX_IPV4) {
-		command |= IAVF_TX_DESC_CMD_IIPT_IPV4;
-		offset |= (m->l3_len >> 2) << IAVF_TX_DESC_LENGTH_IPLEN_SHIFT;
+		command |= CI_TX_DESC_CMD_IIPT_IPV4;
+		offset |= (m->l3_len >> 2) << CI_TX_DESC_LEN_IPLEN_S;
 	} else if (m->ol_flags & RTE_MBUF_F_TX_IPV6) {
-		command |= IAVF_TX_DESC_CMD_IIPT_IPV6;
-		offset |= (m->l3_len >> 2) << IAVF_TX_DESC_LENGTH_IPLEN_SHIFT;
+		command |= CI_TX_DESC_CMD_IIPT_IPV6;
+		offset |= (m->l3_len >> 2) << CI_TX_DESC_LEN_IPLEN_S;
 	}
 
 	if (m->ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) {
 		if (m->ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-			command |= IAVF_TX_DESC_CMD_L4T_EOFT_TCP;
+			command |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
 		else
-			command |= IAVF_TX_DESC_CMD_L4T_EOFT_UDP;
+			command |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
 		offset |= (m->l4_len >> 2) <<
-			      IAVF_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+			      CI_TX_DESC_LEN_L4_LEN_S;
 
 		*qw1 = rte_cpu_to_le_64((((uint64_t)command <<
 			IAVF_TXD_DATA_QW1_CMD_SHIFT) & IAVF_TXD_DATA_QW1_CMD_MASK) |
@@ -2642,19 +2642,19 @@ iavf_build_data_desc_cmd_offset_fields(volatile uint64_t *qw1,
 	/* Enable L4 checksum offloads */
 	switch (m->ol_flags & RTE_MBUF_F_TX_L4_MASK) {
 	case RTE_MBUF_F_TX_TCP_CKSUM:
-		command |= IAVF_TX_DESC_CMD_L4T_EOFT_TCP;
+		command |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
 		offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
-				IAVF_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+				CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_SCTP_CKSUM:
-		command |= IAVF_TX_DESC_CMD_L4T_EOFT_SCTP;
+		command |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
 		offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
-				IAVF_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+				CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_UDP_CKSUM:
-		command |= IAVF_TX_DESC_CMD_L4T_EOFT_UDP;
+		command |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
 		offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
-				IAVF_TX_DESC_LENGTH_L4_FC_LEN_SHIFT;
+				CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	}
 
@@ -2674,8 +2674,7 @@ iavf_calc_pkt_desc(struct rte_mbuf *tx_pkt)
 	uint16_t count = 0;
 
 	while (txd != NULL) {
-		count += (txd->data_len + IAVF_MAX_DATA_PER_TXD - 1) /
-			IAVF_MAX_DATA_PER_TXD;
+		count += (txd->data_len + CI_MAX_DATA_PER_TXD - 1) / CI_MAX_DATA_PER_TXD;
 		txd = txd->next;
 	}
 
@@ -2881,14 +2880,14 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			buf_dma_addr = rte_mbuf_data_iova(mb_seg);
 			while ((mb_seg->ol_flags & (RTE_MBUF_F_TX_TCP_SEG |
 					RTE_MBUF_F_TX_UDP_SEG)) &&
-					unlikely(slen > IAVF_MAX_DATA_PER_TXD)) {
+					unlikely(slen > CI_MAX_DATA_PER_TXD)) {
 				iavf_fill_data_desc(ddesc, ddesc_template,
-					IAVF_MAX_DATA_PER_TXD, buf_dma_addr);
+					CI_MAX_DATA_PER_TXD, buf_dma_addr);
 
 				IAVF_DUMP_TX_DESC(txq, ddesc, desc_idx);
 
-				buf_dma_addr += IAVF_MAX_DATA_PER_TXD;
-				slen -= IAVF_MAX_DATA_PER_TXD;
+				buf_dma_addr += CI_MAX_DATA_PER_TXD;
+				slen -= CI_MAX_DATA_PER_TXD;
 
 				txe->last_id = desc_idx_last;
 				desc_idx = txe->next_id;
@@ -2909,7 +2908,7 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		} while (mb_seg);
 
 		/* The last packet data descriptor needs End Of Packet (EOP) */
-		ddesc_cmd = IAVF_TX_DESC_CMD_EOP;
+		ddesc_cmd = CI_TX_DESC_CMD_EOP;
 
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_desc_required);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_desc_required);
@@ -2919,7 +2918,7 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 				   "%4u (port=%d queue=%d)",
 				   desc_idx_last, txq->port_id, txq->queue_id);
 
-			ddesc_cmd |= IAVF_TX_DESC_CMD_RS;
+			ddesc_cmd |= CI_TX_DESC_CMD_RS;
 
 			/* Update txq RS bit counters */
 			txq->nb_tx_used = 0;
@@ -4423,9 +4422,8 @@ iavf_dev_tx_desc_status(void *tx_queue, uint16_t offset)
 	}
 
 	status = &txq->ci_tx_ring[desc].cmd_type_offset_bsz;
-	mask = rte_le_to_cpu_64(IAVF_TXD_QW1_DTYPE_MASK);
-	expect = rte_cpu_to_le_64(
-		 IAVF_TX_DESC_DTYPE_DESC_DONE << IAVF_TXD_QW1_DTYPE_SHIFT);
+	mask = rte_le_to_cpu_64(CI_TXD_QW1_DTYPE_M);
+	expect = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE << CI_TXD_QW1_DTYPE_S);
 	if ((*status & mask) == expect)
 		return RTE_ETH_TX_DESC_DONE;
 
diff --git a/drivers/net/intel/iavf/iavf_rxtx.h b/drivers/net/intel/iavf/iavf_rxtx.h
index dd6d884fc1..395d97b4ee 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.h
+++ b/drivers/net/intel/iavf/iavf_rxtx.h
@@ -162,10 +162,6 @@
 #define IAVF_TX_OFFLOAD_NOTSUP_MASK \
 		(RTE_MBUF_F_TX_OFFLOAD_MASK ^ IAVF_TX_OFFLOAD_MASK)
 
-/* HW requires that TX buffer size ranges from 1B up to (16K-1)B. */
-#define IAVF_MAX_DATA_PER_TXD \
-	(IAVF_TXD_QW1_TX_BUF_SZ_MASK >> IAVF_TXD_QW1_TX_BUF_SZ_SHIFT)
-
 #define IAVF_TX_LLDP_DYNFIELD "intel_pmd_dynfield_tx_lldp"
 #define IAVF_CHECK_TX_LLDP(m) \
 	((rte_pmd_iavf_tx_lldp_dynfield_offset > 0) && \
@@ -195,18 +191,6 @@ struct iavf_rx_queue_stats {
 	struct iavf_ipsec_crypto_stats ipsec_crypto;
 };
 
-/* Offload features */
-union iavf_tx_offload {
-	uint64_t data;
-	struct {
-		uint64_t l2_len:7; /* L2 (MAC) Header Length. */
-		uint64_t l3_len:9; /* L3 (IP) Header Length. */
-		uint64_t l4_len:8; /* L4 Header Length. */
-		uint64_t tso_segsz:16; /* TCP TSO segment size */
-		/* uint64_t unused : 24; */
-	};
-};
-
 /* Rx Flex Descriptor
  * RxDID Profile ID 16-21
  * Flex-field 0: RSS hash lower 16-bits
@@ -409,7 +393,7 @@ enum iavf_rx_flex_desc_ipsec_crypto_status {
 
 
 #define IAVF_TXD_DATA_QW1_DTYPE_SHIFT	(0)
-#define IAVF_TXD_DATA_QW1_DTYPE_MASK	(0xFUL << IAVF_TXD_QW1_DTYPE_SHIFT)
+#define IAVF_TXD_DATA_QW1_DTYPE_MASK	(0xFUL << CI_TXD_QW1_DTYPE_S)
 
 #define IAVF_TXD_DATA_QW1_CMD_SHIFT	(4)
 #define IAVF_TXD_DATA_QW1_CMD_MASK	(0x3FFUL << IAVF_TXD_DATA_QW1_CMD_SHIFT)
@@ -686,7 +670,7 @@ void iavf_dump_tx_descriptor(const struct ci_tx_queue *txq,
 		rte_le_to_cpu_64(tx_desc->cmd_type_offset_bsz &
 			rte_cpu_to_le_64(IAVF_TXD_DATA_QW1_DTYPE_MASK));
 	switch (type) {
-	case IAVF_TX_DESC_DTYPE_DATA:
+	case CI_TX_DESC_DTYPE_DATA:
 		name = "Tx_data_desc";
 		break;
 	case IAVF_TX_DESC_DTYPE_CONTEXT:
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
index 89ce841b9e..cea4ee9863 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
@@ -1633,10 +1633,9 @@ static __rte_always_inline void
 iavf_vtx1(volatile struct ci_tx_desc *txdp,
 	  struct rte_mbuf *pkt, uint64_t flags, bool offload, uint8_t vlan_flag)
 {
-	uint64_t high_qw =
-		(IAVF_TX_DESC_DTYPE_DATA |
-		 ((uint64_t)flags  << IAVF_TXD_QW1_CMD_SHIFT) |
-		 ((uint64_t)pkt->data_len << IAVF_TXD_QW1_TX_BUF_SZ_SHIFT));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		 ((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+		 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 	if (offload)
 		iavf_txd_enable_offload(pkt, &high_qw, vlan_flag);
 
@@ -1649,8 +1648,7 @@ static __rte_always_inline void
 iavf_vtx(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags, bool offload, uint8_t vlan_flag)
 {
-	const uint64_t hi_qw_tmpl = (IAVF_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << IAVF_TXD_QW1_CMD_SHIFT));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags << CI_TXD_QW1_CMD_S));
 
 	/* if unaligned on 32-bit boundary, do one to align */
 	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
@@ -1660,28 +1658,20 @@ iavf_vtx(volatile struct ci_tx_desc *txdp,
 
 	/* do two at a time while possible, in bursts */
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
-		uint64_t hi_qw3 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[3]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
+		uint64_t hi_qw3 = hi_qw_tmpl |
+			((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			iavf_txd_enable_offload(pkt[3], &hi_qw3, vlan_flag);
-		uint64_t hi_qw2 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[2]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
+		uint64_t hi_qw2 = hi_qw_tmpl |
+			((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			iavf_txd_enable_offload(pkt[2], &hi_qw2, vlan_flag);
-		uint64_t hi_qw1 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[1]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
+		uint64_t hi_qw1 = hi_qw_tmpl |
+			((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			iavf_txd_enable_offload(pkt[1], &hi_qw1, vlan_flag);
-		uint64_t hi_qw0 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[0]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
+		uint64_t hi_qw0 = hi_qw_tmpl |
+			((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			iavf_txd_enable_offload(pkt[0], &hi_qw0, vlan_flag);
 
@@ -1717,8 +1707,8 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	/* bit2 is reserved and must be set to 1 according to Spec */
-	uint64_t flags = IAVF_TX_DESC_CMD_EOP | IAVF_TX_DESC_CMD_ICRC;
-	uint64_t rs = IAVF_TX_DESC_CMD_RS | flags;
+	uint64_t flags = CI_TX_DESC_CMD_EOP | CI_TX_DESC_CMD_ICRC;
+	uint64_t rs = CI_TX_DESC_CMD_RS | flags;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
 		ci_tx_free_bufs_vec(txq, iavf_tx_desc_done, false);
@@ -1761,8 +1751,7 @@ iavf_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
-					 IAVF_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c b/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
index ad1b0b90cd..01477fd501 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx512.c
@@ -1844,10 +1844,9 @@ iavf_vtx1(volatile struct ci_tx_desc *txdp,
 	  struct rte_mbuf *pkt, uint64_t flags,
 	  bool offload, uint8_t vlan_flag)
 {
-	uint64_t high_qw =
-		(IAVF_TX_DESC_DTYPE_DATA |
-		 ((uint64_t)flags  << IAVF_TXD_QW1_CMD_SHIFT) |
-		 ((uint64_t)pkt->data_len << IAVF_TXD_QW1_TX_BUF_SZ_SHIFT));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		 ((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+		 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 	if (offload)
 		iavf_txd_enable_offload(pkt, &high_qw, vlan_flag);
 
@@ -1863,8 +1862,7 @@ iavf_vtx(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags,
 		bool offload, uint8_t vlan_flag)
 {
-	const uint64_t hi_qw_tmpl = (IAVF_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << IAVF_TXD_QW1_CMD_SHIFT));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags << CI_TXD_QW1_CMD_S));
 
 	/* if unaligned on 32-bit boundary, do one to align */
 	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
@@ -1874,22 +1872,14 @@ iavf_vtx(volatile struct ci_tx_desc *txdp,
 
 	/* do 4 at a time while possible, in bursts */
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
-		uint64_t hi_qw3 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[3]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
-		uint64_t hi_qw2 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[2]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
-		uint64_t hi_qw1 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[1]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
-		uint64_t hi_qw0 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[0]->data_len <<
-			 IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
+		uint64_t hi_qw3 = hi_qw_tmpl |
+			((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw2 = hi_qw_tmpl |
+			((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw1 = hi_qw_tmpl |
+			((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw0 = hi_qw_tmpl |
+			((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload) {
 			iavf_txd_enable_offload(pkt[3], &hi_qw3, vlan_flag);
 			iavf_txd_enable_offload(pkt[2], &hi_qw2, vlan_flag);
@@ -2093,9 +2083,9 @@ ctx_vtx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf *pkt,
 	if (IAVF_CHECK_TX_LLDP(pkt))
 		high_ctx_qw |= IAVF_TX_CTX_DESC_SWTCH_UPLINK
 			<< IAVF_TXD_CTX_QW1_CMD_SHIFT;
-	uint64_t high_data_qw = (IAVF_TX_DESC_DTYPE_DATA |
-				((uint64_t)flags  << IAVF_TXD_QW1_CMD_SHIFT) |
-				((uint64_t)pkt->data_len << IAVF_TXD_QW1_TX_BUF_SZ_SHIFT));
+	uint64_t high_data_qw = (CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+				((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 	if (offload)
 		iavf_txd_enable_offload(pkt, &high_data_qw, vlan_flag);
 
@@ -2110,8 +2100,7 @@ ctx_vtx(volatile struct ci_tx_desc *txdp,
 		struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags,
 		bool offload, uint8_t vlan_flag)
 {
-	uint64_t hi_data_qw_tmpl = (IAVF_TX_DESC_DTYPE_DATA |
-					((uint64_t)flags  << IAVF_TXD_QW1_CMD_SHIFT));
+	uint64_t hi_data_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags << CI_TXD_QW1_CMD_S));
 
 	/* if unaligned on 32-bit boundary, do one to align */
 	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
@@ -2128,11 +2117,9 @@ ctx_vtx(volatile struct ci_tx_desc *txdp,
 		uint64_t hi_data_qw0 = 0;
 
 		hi_data_qw1 = hi_data_qw_tmpl |
-				((uint64_t)pkt[1]->data_len <<
-					IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
+				((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		hi_data_qw0 = hi_data_qw_tmpl |
-				((uint64_t)pkt[0]->data_len <<
-					IAVF_TXD_QW1_TX_BUF_SZ_SHIFT);
+				((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 
 #ifdef IAVF_TX_VLAN_QINQ_OFFLOAD
 		if (offload) {
@@ -2140,13 +2127,11 @@ ctx_vtx(volatile struct ci_tx_desc *txdp,
 				uint64_t qinq_tag = vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2 ?
 					(uint64_t)pkt[1]->vlan_tci :
 					(uint64_t)pkt[1]->vlan_tci_outer;
-				hi_ctx_qw1 |= IAVF_TX_CTX_DESC_IL2TAG2 <<
-						IAVF_TXD_CTX_QW1_CMD_SHIFT;
+				hi_ctx_qw1 |= CI_TX_CTX_DESC_IL2TAG2 << CI_TXD_QW1_CMD_S;
 				low_ctx_qw1 |= qinq_tag << IAVF_TXD_CTX_QW0_L2TAG2_PARAM;
 			} else if (pkt[1]->ol_flags & RTE_MBUF_F_TX_VLAN &&
 					vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2) {
-				hi_ctx_qw1 |=
-					IAVF_TX_CTX_DESC_IL2TAG2 << IAVF_TXD_CTX_QW1_CMD_SHIFT;
+				hi_ctx_qw1 |= IAVF_TX_CTX_DESC_IL2TAG2 << CI_TXD_QW1_CMD_S;
 				low_ctx_qw1 |=
 					(uint64_t)pkt[1]->vlan_tci << IAVF_TXD_CTX_QW0_L2TAG2_PARAM;
 			}
@@ -2154,7 +2139,7 @@ ctx_vtx(volatile struct ci_tx_desc *txdp,
 #endif
 		if (IAVF_CHECK_TX_LLDP(pkt[1]))
 			hi_ctx_qw1 |= IAVF_TX_CTX_DESC_SWTCH_UPLINK
-				<< IAVF_TXD_CTX_QW1_CMD_SHIFT;
+				<< CI_TXD_QW1_CMD_S;
 
 #ifdef IAVF_TX_VLAN_QINQ_OFFLOAD
 		if (offload) {
@@ -2162,21 +2147,18 @@ ctx_vtx(volatile struct ci_tx_desc *txdp,
 				uint64_t qinq_tag = vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2 ?
 					(uint64_t)pkt[0]->vlan_tci :
 					(uint64_t)pkt[0]->vlan_tci_outer;
-				hi_ctx_qw0 |= IAVF_TX_CTX_DESC_IL2TAG2 <<
-						IAVF_TXD_CTX_QW1_CMD_SHIFT;
+				hi_ctx_qw0 |= IAVF_TX_CTX_DESC_IL2TAG2 << CI_TXD_QW1_CMD_S;
 				low_ctx_qw0 |= qinq_tag << IAVF_TXD_CTX_QW0_L2TAG2_PARAM;
 			} else if (pkt[0]->ol_flags & RTE_MBUF_F_TX_VLAN &&
 					vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2) {
-				hi_ctx_qw0 |=
-					IAVF_TX_CTX_DESC_IL2TAG2 << IAVF_TXD_CTX_QW1_CMD_SHIFT;
+				hi_ctx_qw0 |= IAVF_TX_CTX_DESC_IL2TAG2 << CI_TXD_QW1_CMD_S;
 				low_ctx_qw0 |=
 					(uint64_t)pkt[0]->vlan_tci << IAVF_TXD_CTX_QW0_L2TAG2_PARAM;
 			}
 		}
 #endif
 		if (IAVF_CHECK_TX_LLDP(pkt[0]))
-			hi_ctx_qw0 |= IAVF_TX_CTX_DESC_SWTCH_UPLINK
-				<< IAVF_TXD_CTX_QW1_CMD_SHIFT;
+			hi_ctx_qw0 |= IAVF_TX_CTX_DESC_SWTCH_UPLINK << CI_TXD_QW1_CMD_S;
 
 		if (offload) {
 			iavf_txd_enable_offload(pkt[1], &hi_data_qw1, vlan_flag);
@@ -2207,8 +2189,8 @@ iavf_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	/* bit2 is reserved and must be set to 1 according to Spec */
-	uint64_t flags = IAVF_TX_DESC_CMD_EOP | IAVF_TX_DESC_CMD_ICRC;
-	uint64_t rs = IAVF_TX_DESC_CMD_RS | flags;
+	uint64_t flags = CI_TX_DESC_CMD_EOP | CI_TX_DESC_CMD_ICRC;
+	uint64_t rs = CI_TX_DESC_CMD_RS | flags;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
 		ci_tx_free_bufs_vec(txq, iavf_tx_desc_done, false);
@@ -2253,8 +2235,7 @@ iavf_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
-					 IAVF_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
@@ -2275,8 +2256,8 @@ iavf_xmit_fixed_burst_vec_avx512_ctx(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, nb_mbuf, tx_id;
 	/* bit2 is reserved and must be set to 1 according to Spec */
-	uint64_t flags = IAVF_TX_DESC_CMD_EOP | IAVF_TX_DESC_CMD_ICRC;
-	uint64_t rs = IAVF_TX_DESC_CMD_RS | flags;
+	uint64_t flags = CI_TX_DESC_CMD_EOP | CI_TX_DESC_CMD_ICRC;
+	uint64_t rs = CI_TX_DESC_CMD_RS | flags;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
 		ci_tx_free_bufs_vec(txq, iavf_tx_desc_done, true);
@@ -2321,8 +2302,7 @@ iavf_xmit_fixed_burst_vec_avx512_ctx(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)IAVF_TX_DESC_CMD_RS) <<
-					 IAVF_TXD_QW1_CMD_SHIFT);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_common.h b/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
index 1832b76f89..1538a44892 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_common.h
@@ -15,8 +15,8 @@ static inline int
 iavf_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 {
 	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(IAVF_TXD_QW1_DTYPE_MASK)) ==
-				rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+				rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 }
 
 static inline void
@@ -147,26 +147,26 @@ iavf_txd_enable_offload(__rte_unused struct rte_mbuf *tx_pkt,
 	/* Set MACLEN */
 	if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)
 		td_offset |= (tx_pkt->outer_l2_len >> 1)
-			<< IAVF_TX_DESC_LENGTH_MACLEN_SHIFT;
+			<< CI_TX_DESC_LEN_MACLEN_S;
 	else
 		td_offset |= (tx_pkt->l2_len >> 1)
-			<< IAVF_TX_DESC_LENGTH_MACLEN_SHIFT;
+			<< CI_TX_DESC_LEN_MACLEN_S;
 
 	/* Enable L3 checksum offloads */
 	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
 		if (ol_flags & RTE_MBUF_F_TX_IPV4) {
-			td_cmd |= IAVF_TX_DESC_CMD_IIPT_IPV4_CSUM;
+			td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
 			td_offset |= (tx_pkt->l3_len >> 2) <<
-				     IAVF_TX_DESC_LENGTH_IPLEN_SHIFT;
+				     CI_TX_DESC_LEN_IPLEN_S;
 		}
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
-		td_cmd |= IAVF_TX_DESC_CMD_IIPT_IPV4;
+		td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
 		td_offset |= (tx_pkt->l3_len >> 2) <<
-			     IAVF_TX_DESC_LENGTH_IPLEN_SHIFT;
+			     CI_TX_DESC_LEN_IPLEN_S;
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
-		td_cmd |= IAVF_TX_DESC_CMD_IIPT_IPV6;
+		td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
 		td_offset |= (tx_pkt->l3_len >> 2) <<
-			     IAVF_TX_DESC_LENGTH_IPLEN_SHIFT;
+			     CI_TX_DESC_LEN_IPLEN_S;
 	}
 
 	/* Enable L4 checksum offloads */
@@ -190,7 +190,7 @@ iavf_txd_enable_offload(__rte_unused struct rte_mbuf *tx_pkt,
 		break;
 	}
 
-	*txd_hi |= ((uint64_t)td_offset) << IAVF_TXD_QW1_OFFSET_SHIFT;
+	*txd_hi |= ((uint64_t)td_offset) << CI_TXD_QW1_OFFSET_S;
 #endif
 
 #ifdef IAVF_TX_VLAN_QINQ_OFFLOAD
@@ -198,17 +198,15 @@ iavf_txd_enable_offload(__rte_unused struct rte_mbuf *tx_pkt,
 		td_cmd |= IAVF_TX_DESC_CMD_IL2TAG1;
 		/* vlan_flag specifies outer tag location for QinQ. */
 		if (vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1)
-			*txd_hi |= ((uint64_t)tx_pkt->vlan_tci_outer <<
-					IAVF_TXD_QW1_L2TAG1_SHIFT);
+			*txd_hi |= ((uint64_t)tx_pkt->vlan_tci_outer << CI_TXD_QW1_L2TAG1_S);
 		else
-			*txd_hi |= ((uint64_t)tx_pkt->vlan_tci <<
-					IAVF_TXD_QW1_L2TAG1_SHIFT);
+			*txd_hi |= ((uint64_t)tx_pkt->vlan_tci << CI_TXD_QW1_L2TAG1_S);
 	} else if (ol_flags & RTE_MBUF_F_TX_VLAN && vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1) {
-		td_cmd |= IAVF_TX_DESC_CMD_IL2TAG1;
-		*txd_hi |= ((uint64_t)tx_pkt->vlan_tci << IAVF_TXD_QW1_L2TAG1_SHIFT);
+		td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
+		*txd_hi |= ((uint64_t)tx_pkt->vlan_tci << CI_TXD_QW1_L2TAG1_S);
 	}
 #endif
 
-	*txd_hi |= ((uint64_t)td_cmd) << IAVF_TXD_QW1_CMD_SHIFT;
+	*txd_hi |= ((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S;
 }
 #endif
diff --git a/drivers/net/intel/ice/ice_dcf_ethdev.c b/drivers/net/intel/ice/ice_dcf_ethdev.c
index 5f537b4c12..4ceecc15c6 100644
--- a/drivers/net/intel/ice/ice_dcf_ethdev.c
+++ b/drivers/net/intel/ice/ice_dcf_ethdev.c
@@ -406,7 +406,7 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
 		txq->ci_tx_ring[i].cmd_type_offset_bsz =
-			rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
 		txe[prev].next_id = i;
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 7a33e1e980..52bbf95967 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -1124,7 +1124,7 @@ ice_reset_tx_queue(struct ci_tx_queue *txq)
 		volatile struct ci_tx_desc *txd = &txq->ci_tx_ring[i];
 
 		txd->cmd_type_offset_bsz =
-			rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
 		txe[prev].next_id = i;
@@ -2556,9 +2556,8 @@ ice_tx_descriptor_status(void *tx_queue, uint16_t offset)
 	}
 
 	status = &txq->ci_tx_ring[desc].cmd_type_offset_bsz;
-	mask = rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M);
-	expect = rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE <<
-				  ICE_TXD_QW1_DTYPE_S);
+	mask = rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M);
+	expect = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE << CI_TXD_QW1_DTYPE_S);
 	if ((*status & mask) == expect)
 		return RTE_ETH_TX_DESC_DONE;
 
@@ -2904,7 +2903,7 @@ ice_recv_pkts(void *rx_queue,
 
 static inline void
 ice_parse_tunneling_params(uint64_t ol_flags,
-			    union ice_tx_offload tx_offload,
+			    union ci_tx_offload tx_offload,
 			    uint32_t *cd_tunneling)
 {
 	/* EIPT: External (outer) IP header type */
@@ -2965,58 +2964,58 @@ static inline void
 ice_txd_enable_checksum(uint64_t ol_flags,
 			uint32_t *td_cmd,
 			uint32_t *td_offset,
-			union ice_tx_offload tx_offload)
+			union ci_tx_offload tx_offload)
 {
 	/* Set MACLEN */
 	if (!(ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK))
 		*td_offset |= (tx_offload.l2_len >> 1)
-			<< ICE_TX_DESC_LEN_MACLEN_S;
+			<< CI_TX_DESC_LEN_MACLEN_S;
 
 	/* Enable L3 checksum offloads */
 	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
-		*td_cmd |= ICE_TX_DESC_CMD_IIPT_IPV4_CSUM;
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
 		*td_offset |= (tx_offload.l3_len >> 2) <<
-			ICE_TX_DESC_LEN_IPLEN_S;
+			CI_TX_DESC_LEN_IPLEN_S;
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
-		*td_cmd |= ICE_TX_DESC_CMD_IIPT_IPV4;
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
 		*td_offset |= (tx_offload.l3_len >> 2) <<
-			ICE_TX_DESC_LEN_IPLEN_S;
+			CI_TX_DESC_LEN_IPLEN_S;
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
-		*td_cmd |= ICE_TX_DESC_CMD_IIPT_IPV6;
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
 		*td_offset |= (tx_offload.l3_len >> 2) <<
-			ICE_TX_DESC_LEN_IPLEN_S;
+			CI_TX_DESC_LEN_IPLEN_S;
 	}
 
 	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
-		*td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_TCP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
 		*td_offset |= (tx_offload.l4_len >> 2) <<
-			      ICE_TX_DESC_LEN_L4_LEN_S;
+			      CI_TX_DESC_LEN_L4_LEN_S;
 		return;
 	}
 
 	if (ol_flags & RTE_MBUF_F_TX_UDP_SEG) {
-		*td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_UDP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
 		*td_offset |= (tx_offload.l4_len >> 2) <<
-			      ICE_TX_DESC_LEN_L4_LEN_S;
+			      CI_TX_DESC_LEN_L4_LEN_S;
 		return;
 	}
 
 	/* Enable L4 checksum offloads */
 	switch (ol_flags & RTE_MBUF_F_TX_L4_MASK) {
 	case RTE_MBUF_F_TX_TCP_CKSUM:
-		*td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_TCP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
 		*td_offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
-			      ICE_TX_DESC_LEN_L4_LEN_S;
+			      CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_SCTP_CKSUM:
-		*td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_SCTP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
 		*td_offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
-			      ICE_TX_DESC_LEN_L4_LEN_S;
+			      CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_UDP_CKSUM:
-		*td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_UDP;
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
 		*td_offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
-			      ICE_TX_DESC_LEN_L4_LEN_S;
+			      CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	default:
 		break;
@@ -3030,11 +3029,11 @@ ice_build_ctob(uint32_t td_cmd,
 	       uint16_t size,
 	       uint32_t td_tag)
 {
-	return rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DATA |
-				((uint64_t)td_cmd << ICE_TXD_QW1_CMD_S) |
-				((uint64_t)td_offset << ICE_TXD_QW1_OFFSET_S) |
-				((uint64_t)size << ICE_TXD_QW1_TX_BUF_SZ_S) |
-				((uint64_t)td_tag << ICE_TXD_QW1_L2TAG1_S));
+	return rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
+				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+				((uint64_t)size << CI_TXD_QW1_TX_BUF_SZ_S) |
+				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
 }
 
 /* Check if the context descriptor is needed for TX offloading */
@@ -3053,7 +3052,7 @@ ice_calc_context_desc(uint64_t flags)
 
 /* set ice TSO context descriptor */
 static inline uint64_t
-ice_set_tso_ctx(struct rte_mbuf *mbuf, union ice_tx_offload tx_offload)
+ice_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 {
 	uint64_t ctx_desc = 0;
 	uint32_t cd_cmd, hdr_len, cd_tso_len;
@@ -3067,18 +3066,15 @@ ice_set_tso_ctx(struct rte_mbuf *mbuf, union ice_tx_offload tx_offload)
 	hdr_len += (mbuf->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) ?
 		   tx_offload.outer_l2_len + tx_offload.outer_l3_len : 0;
 
-	cd_cmd = ICE_TX_CTX_DESC_TSO;
+	cd_cmd = CI_TX_CTX_DESC_TSO;
 	cd_tso_len = mbuf->pkt_len - hdr_len;
-	ctx_desc |= ((uint64_t)cd_cmd << ICE_TXD_CTX_QW1_CMD_S) |
+	ctx_desc |= ((uint64_t)cd_cmd << CI_TXD_QW1_CMD_S) |
 		    ((uint64_t)cd_tso_len << ICE_TXD_CTX_QW1_TSO_LEN_S) |
 		    ((uint64_t)mbuf->tso_segsz << ICE_TXD_CTX_QW1_MSS_S);
 
 	return ctx_desc;
 }
 
-/* HW requires that TX buffer size ranges from 1B up to (16K-1)B. */
-#define ICE_MAX_DATA_PER_TXD \
-	(ICE_TXD_QW1_TX_BUF_SZ_M >> ICE_TXD_QW1_TX_BUF_SZ_S)
 /* Calculate the number of TX descriptors needed for each pkt */
 static inline uint16_t
 ice_calc_pkt_desc(struct rte_mbuf *tx_pkt)
@@ -3087,7 +3083,7 @@ ice_calc_pkt_desc(struct rte_mbuf *tx_pkt)
 	uint16_t count = 0;
 
 	while (txd != NULL) {
-		count += DIV_ROUND_UP(txd->data_len, ICE_MAX_DATA_PER_TXD);
+		count += DIV_ROUND_UP(txd->data_len, CI_MAX_DATA_PER_TXD);
 		txd = txd->next;
 	}
 
@@ -3117,7 +3113,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	uint16_t slen;
 	uint64_t buf_dma_addr;
 	uint64_t ol_flags;
-	union ice_tx_offload tx_offload = {0};
+	union ci_tx_offload tx_offload = {0};
 
 	txq = tx_queue;
 	sw_ring = txq->sw_ring;
@@ -3185,7 +3181,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		/* Descriptor based VLAN insertion */
 		if (ol_flags & (RTE_MBUF_F_TX_VLAN | RTE_MBUF_F_TX_QINQ)) {
-			td_cmd |= ICE_TX_DESC_CMD_IL2TAG1;
+			td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
 			td_tag = tx_pkt->vlan_tci;
 		}
 
@@ -3193,7 +3189,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		cd_tunneling_params = 0;
 		if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) {
 			td_offset |= (tx_offload.outer_l2_len >> 1)
-				<< ICE_TX_DESC_LEN_MACLEN_S;
+				<< CI_TX_DESC_LEN_MACLEN_S;
 			ice_parse_tunneling_params(ol_flags, tx_offload,
 						   &cd_tunneling_params);
 		}
@@ -3223,8 +3219,8 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 					ice_set_tso_ctx(tx_pkt, tx_offload);
 			else if (ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST)
 				cd_type_cmd_tso_mss |=
-					((uint64_t)ICE_TX_CTX_DESC_TSYN <<
-					ICE_TXD_CTX_QW1_CMD_S) |
+					((uint64_t)CI_TX_CTX_DESC_TSYN <<
+					CI_TXD_QW1_CMD_S) |
 					 (((uint64_t)txq->ice_vsi->adapter->ptp_tx_index <<
 					 ICE_TXD_CTX_QW1_TSYN_S) & ICE_TXD_CTX_QW1_TSYN_M);
 
@@ -3235,8 +3231,8 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			if (ol_flags & RTE_MBUF_F_TX_QINQ) {
 				cd_l2tag2 = tx_pkt->vlan_tci_outer;
 				cd_type_cmd_tso_mss |=
-					((uint64_t)ICE_TX_CTX_DESC_IL2TAG2 <<
-					 ICE_TXD_CTX_QW1_CMD_S);
+					((uint64_t)CI_TX_CTX_DESC_IL2TAG2 <<
+					 CI_TXD_QW1_CMD_S);
 			}
 			ctx_txd->l2tag2 = rte_cpu_to_le_16(cd_l2tag2);
 			ctx_txd->qw1 =
@@ -3261,18 +3257,16 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			buf_dma_addr = rte_mbuf_data_iova(m_seg);
 
 			while ((ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) &&
-				unlikely(slen > ICE_MAX_DATA_PER_TXD)) {
+					unlikely(slen > CI_MAX_DATA_PER_TXD)) {
 				txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-				txd->cmd_type_offset_bsz =
-				rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DATA |
-				((uint64_t)td_cmd << ICE_TXD_QW1_CMD_S) |
-				((uint64_t)td_offset << ICE_TXD_QW1_OFFSET_S) |
-				((uint64_t)ICE_MAX_DATA_PER_TXD <<
-				 ICE_TXD_QW1_TX_BUF_SZ_S) |
-				((uint64_t)td_tag << ICE_TXD_QW1_L2TAG1_S));
+				txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+					((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
+					((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+					((uint64_t)CI_MAX_DATA_PER_TXD << CI_TXD_QW1_TX_BUF_SZ_S) |
+					((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
 
-				buf_dma_addr += ICE_MAX_DATA_PER_TXD;
-				slen -= ICE_MAX_DATA_PER_TXD;
+				buf_dma_addr += CI_MAX_DATA_PER_TXD;
+				slen -= CI_MAX_DATA_PER_TXD;
 
 				txe->last_id = tx_last;
 				tx_id = txe->next_id;
@@ -3282,12 +3276,11 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			}
 
 			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-			txd->cmd_type_offset_bsz =
-				rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DATA |
-				((uint64_t)td_cmd << ICE_TXD_QW1_CMD_S) |
-				((uint64_t)td_offset << ICE_TXD_QW1_OFFSET_S) |
-				((uint64_t)slen << ICE_TXD_QW1_TX_BUF_SZ_S) |
-				((uint64_t)td_tag << ICE_TXD_QW1_L2TAG1_S));
+			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
+				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+				((uint64_t)slen << CI_TXD_QW1_TX_BUF_SZ_S) |
+				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
 
 			txe->last_id = tx_last;
 			tx_id = txe->next_id;
@@ -3296,7 +3289,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		} while (m_seg);
 
 		/* fill the last descriptor with End of Packet (EOP) bit */
-		td_cmd |= ICE_TX_DESC_CMD_EOP;
+		td_cmd |= CI_TX_DESC_CMD_EOP;
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
@@ -3307,14 +3300,13 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 				   "%4u (port=%d queue=%d)",
 				   tx_last, txq->port_id, txq->queue_id);
 
-			td_cmd |= ICE_TX_DESC_CMD_RS;
+			td_cmd |= CI_TX_DESC_CMD_RS;
 
 			/* Update txq RS bit counters */
 			txq->nb_tx_used = 0;
 		}
 		txd->cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)td_cmd) <<
-					 ICE_TXD_QW1_CMD_S);
+			rte_cpu_to_le_64(((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S);
 
 		if (txq->tsq != NULL && txq->tsq->ts_flag > 0) {
 			uint64_t txtime = *RTE_MBUF_DYNFIELD(tx_pkt,
@@ -3361,8 +3353,8 @@ ice_tx_free_bufs(struct ci_tx_queue *txq)
 	uint16_t i;
 
 	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
-	     rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M)) !=
-	    rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE))
+	     rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
+	    rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
 		return 0;
 
 	txep = &txq->sw_ring[txq->tx_next_dd - (txq->tx_rs_thresh - 1)];
@@ -3598,8 +3590,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 		n = (uint16_t)(txq->nb_tx_desc - txq->tx_tail);
 		ice_tx_fill_hw_ring(txq, tx_pkts, n);
 		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS) <<
-					 ICE_TXD_QW1_CMD_S);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 		txq->tx_tail = 0;
 	}
@@ -3611,8 +3602,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	/* Determine if RS bit needs to be set */
 	if (txq->tx_tail > txq->tx_next_rs) {
 		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS) <<
-					 ICE_TXD_QW1_CMD_S);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 		if (txq->tx_next_rs >= txq->nb_tx_desc)
@@ -4843,9 +4833,9 @@ ice_fdir_programming(struct ice_pf *pf, struct ice_fltr_desc *fdir_desc)
 
 	txdp = &txq->ci_tx_ring[txq->tx_tail + 1];
 	txdp->buffer_addr = rte_cpu_to_le_64(pf->fdir.dma_addr);
-	td_cmd = ICE_TX_DESC_CMD_EOP |
-		ICE_TX_DESC_CMD_RS  |
-		ICE_TX_DESC_CMD_DUMMY;
+	td_cmd = CI_TX_DESC_CMD_EOP |
+		CI_TX_DESC_CMD_RS  |
+		CI_TX_DESC_CMD_DUMMY;
 
 	txdp->cmd_type_offset_bsz =
 		ice_build_ctob(td_cmd, 0, ICE_FDIR_PKT_LEN, 0);
@@ -4856,9 +4846,8 @@ ice_fdir_programming(struct ice_pf *pf, struct ice_fltr_desc *fdir_desc)
 	/* Update the tx tail register */
 	ICE_PCI_REG_WRITE(txq->qtx_tail, txq->tx_tail);
 	for (i = 0; i < ICE_FDIR_MAX_WAIT_US; i++) {
-		if ((txdp->cmd_type_offset_bsz &
-		     rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M)) ==
-		    rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE))
+		if ((txdp->cmd_type_offset_bsz & rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+		    rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
 			break;
 		rte_delay_us(1);
 	}
diff --git a/drivers/net/intel/ice/ice_rxtx.h b/drivers/net/intel/ice/ice_rxtx.h
index c524e9f756..cd5fa93d1c 100644
--- a/drivers/net/intel/ice/ice_rxtx.h
+++ b/drivers/net/intel/ice/ice_rxtx.h
@@ -46,7 +46,7 @@
 
 #define ICE_SUPPORT_CHAIN_NUM 5
 
-#define ICE_TD_CMD                      ICE_TX_DESC_CMD_EOP
+#define ICE_TD_CMD                      CI_TX_DESC_CMD_EOP
 
 #define ICE_VPMD_RX_BURST            CI_VPMD_RX_BURST
 #define ICE_VPMD_TX_BURST            32
@@ -169,19 +169,6 @@ struct ice_txtime {
 	const struct rte_memzone *ts_mz;
 };
 
-/* Offload features */
-union ice_tx_offload {
-	uint64_t data;
-	struct {
-		uint64_t l2_len:7; /* L2 (MAC) Header Length. */
-		uint64_t l3_len:9; /* L3 (IP) Header Length. */
-		uint64_t l4_len:8; /* L4 Header Length. */
-		uint64_t tso_segsz:16; /* TCP TSO segment size */
-		uint64_t outer_l2_len:8; /* outer L2 Header Length */
-		uint64_t outer_l3_len:16; /* outer L3 Header Length */
-	};
-};
-
 /* Rx Flex Descriptor for Comms Package Profile
  * RxDID Profile ID 22 (swap Hash and FlowID)
  * Flex-field 0: Flow ID lower 16-bits
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
index 0a1df0b2f6..2922671158 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
@@ -777,10 +777,9 @@ static __rte_always_inline void
 ice_vtx1(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf *pkt, uint64_t flags, bool offload)
 {
-	uint64_t high_qw =
-		(ICE_TX_DESC_DTYPE_DATA |
-		 ((uint64_t)flags  << ICE_TXD_QW1_CMD_S) |
-		 ((uint64_t)pkt->data_len << ICE_TXD_QW1_TX_BUF_SZ_S));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		 ((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+		 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 	if (offload)
 		ice_txd_enable_offload(pkt, &high_qw);
 
@@ -792,8 +791,7 @@ static __rte_always_inline void
 ice_vtx(volatile struct ci_tx_desc *txdp,
 	struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags, bool offload)
 {
-	const uint64_t hi_qw_tmpl = (ICE_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << ICE_TXD_QW1_CMD_S));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags << CI_TXD_QW1_CMD_S));
 
 	/* if unaligned on 32-bit boundary, do one to align */
 	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
@@ -801,30 +799,22 @@ ice_vtx(volatile struct ci_tx_desc *txdp,
 		nb_pkts--, txdp++, pkt++;
 	}
 
-	/* do two at a time while possible, in bursts */
+	/* do four at a time while possible, in bursts */
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
-		uint64_t hi_qw3 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[3]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw3 = hi_qw_tmpl |
+			((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			ice_txd_enable_offload(pkt[3], &hi_qw3);
-		uint64_t hi_qw2 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[2]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw2 = hi_qw_tmpl |
+			((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			ice_txd_enable_offload(pkt[2], &hi_qw2);
-		uint64_t hi_qw1 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[1]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw1 = hi_qw_tmpl |
+			((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			ice_txd_enable_offload(pkt[1], &hi_qw1);
-		uint64_t hi_qw0 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[0]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw0 = hi_qw_tmpl |
+			((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (offload)
 			ice_txd_enable_offload(pkt[0], &hi_qw0);
 
@@ -856,7 +846,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = ICE_TD_CMD;
-	uint64_t rs = ICE_TX_DESC_CMD_RS | ICE_TD_CMD;
+	uint64_t rs = CI_TX_DESC_CMD_RS | ICE_TD_CMD;
 
 	/* cross rx_thresh boundary is not allowed */
 	nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);
@@ -901,8 +891,7 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS) <<
-					 ICE_TXD_QW1_CMD_S);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
index d42f41461f..e64b6e227b 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
@@ -850,10 +850,9 @@ static __rte_always_inline void
 ice_vtx1(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf *pkt, uint64_t flags, bool do_offload)
 {
-	uint64_t high_qw =
-		(ICE_TX_DESC_DTYPE_DATA |
-		 ((uint64_t)flags  << ICE_TXD_QW1_CMD_S) |
-		 ((uint64_t)pkt->data_len << ICE_TXD_QW1_TX_BUF_SZ_S));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		 ((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+		 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	if (do_offload)
 		ice_txd_enable_offload(pkt, &high_qw);
@@ -866,32 +865,23 @@ static __rte_always_inline void
 ice_vtx(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkt,
 	uint16_t nb_pkts,  uint64_t flags, bool do_offload)
 {
-	const uint64_t hi_qw_tmpl = (ICE_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << ICE_TXD_QW1_CMD_S));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags << CI_TXD_QW1_CMD_S));
 
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
-		uint64_t hi_qw3 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[3]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw3 = hi_qw_tmpl |
+			((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (do_offload)
 			ice_txd_enable_offload(pkt[3], &hi_qw3);
-		uint64_t hi_qw2 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[2]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw2 = hi_qw_tmpl |
+			((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (do_offload)
 			ice_txd_enable_offload(pkt[2], &hi_qw2);
-		uint64_t hi_qw1 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[1]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw1 = hi_qw_tmpl |
+			((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (do_offload)
 			ice_txd_enable_offload(pkt[1], &hi_qw1);
-		uint64_t hi_qw0 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[0]->data_len <<
-			 ICE_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw0 = hi_qw_tmpl |
+			((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 		if (do_offload)
 			ice_txd_enable_offload(pkt[0], &hi_qw0);
 
@@ -920,7 +910,7 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
 	uint64_t flags = ICE_TD_CMD;
-	uint64_t rs = ICE_TX_DESC_CMD_RS | ICE_TD_CMD;
+	uint64_t rs = CI_TX_DESC_CMD_RS | ICE_TD_CMD;
 
 	/* cross rx_thresh boundary is not allowed */
 	nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);
@@ -966,8 +956,7 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)ICE_TX_DESC_CMD_RS) <<
-					 ICE_TXD_QW1_CMD_S);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_common.h b/drivers/net/intel/ice/ice_rxtx_vec_common.h
index 8ba591e403..1d83a087cc 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_common.h
+++ b/drivers/net/intel/ice/ice_rxtx_vec_common.h
@@ -12,8 +12,8 @@ static inline int
 ice_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 {
 	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M)) ==
-				rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+				rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 }
 
 static inline void
@@ -124,53 +124,52 @@ ice_txd_enable_offload(struct rte_mbuf *tx_pkt,
 	/* Tx Checksum Offload */
 	/* SET MACLEN */
 	td_offset |= (tx_pkt->l2_len >> 1) <<
-		ICE_TX_DESC_LEN_MACLEN_S;
+		CI_TX_DESC_LEN_MACLEN_S;
 
 	/* Enable L3 checksum offload */
 	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
-		td_cmd |= ICE_TX_DESC_CMD_IIPT_IPV4_CSUM;
+		td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
 		td_offset |= (tx_pkt->l3_len >> 2) <<
-			ICE_TX_DESC_LEN_IPLEN_S;
+			CI_TX_DESC_LEN_IPLEN_S;
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
-		td_cmd |= ICE_TX_DESC_CMD_IIPT_IPV4;
+		td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
 		td_offset |= (tx_pkt->l3_len >> 2) <<
-			ICE_TX_DESC_LEN_IPLEN_S;
+			CI_TX_DESC_LEN_IPLEN_S;
 	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
-		td_cmd |= ICE_TX_DESC_CMD_IIPT_IPV6;
+		td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
 		td_offset |= (tx_pkt->l3_len >> 2) <<
-			ICE_TX_DESC_LEN_IPLEN_S;
+			CI_TX_DESC_LEN_IPLEN_S;
 	}
 
 	/* Enable L4 checksum offloads */
 	switch (ol_flags & RTE_MBUF_F_TX_L4_MASK) {
 	case RTE_MBUF_F_TX_TCP_CKSUM:
-		td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_TCP;
+		td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
 		td_offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
-			ICE_TX_DESC_LEN_L4_LEN_S;
+			CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_SCTP_CKSUM:
-		td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_SCTP;
+		td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
 		td_offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
-			ICE_TX_DESC_LEN_L4_LEN_S;
+			CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	case RTE_MBUF_F_TX_UDP_CKSUM:
-		td_cmd |= ICE_TX_DESC_CMD_L4T_EOFT_UDP;
+		td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
 		td_offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
-			ICE_TX_DESC_LEN_L4_LEN_S;
+			CI_TX_DESC_LEN_L4_LEN_S;
 		break;
 	default:
 		break;
 	}
 
-	*txd_hi |= ((uint64_t)td_offset) << ICE_TXD_QW1_OFFSET_S;
+	*txd_hi |= ((uint64_t)td_offset) << CI_TXD_QW1_OFFSET_S;
 
-	/* Tx VLAN insertion Offload */
+	/* Tx VLAN/QINQ insertion Offload */
 	if (ol_flags & RTE_MBUF_F_TX_VLAN) {
-		td_cmd |= ICE_TX_DESC_CMD_IL2TAG1;
-		*txd_hi |= ((uint64_t)tx_pkt->vlan_tci <<
-				ICE_TXD_QW1_L2TAG1_S);
+		td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
+		*txd_hi |= ((uint64_t)tx_pkt->vlan_tci << CI_TXD_QW1_L2TAG1_S);
 	}
 
-	*txd_hi |= ((uint64_t)td_cmd) << ICE_TXD_QW1_CMD_S;
+	*txd_hi |= ((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S;
 }
 #endif
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 23666539ab..587871b54a 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -271,7 +271,7 @@ idpf_qc_single_tx_queue_reset(struct ci_tx_queue *txq)
 	prev = (uint16_t)(txq->nb_tx_desc - 1);
 	for (i = 0; i < txq->nb_tx_desc; i++) {
 		txq->ci_tx_ring[i].cmd_type_offset_bsz =
-			rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
 		txe[i].last_id = i;
 		txe[prev].next_id = i;
@@ -849,7 +849,7 @@ idpf_calc_context_desc(uint64_t flags)
  */
 static inline void
 idpf_set_splitq_tso_ctx(struct rte_mbuf *mbuf,
-			union idpf_tx_offload tx_offload,
+			union ci_tx_offload tx_offload,
 			volatile union idpf_flex_tx_ctx_desc *ctx_desc)
 {
 	uint16_t cmd_dtype;
@@ -887,7 +887,7 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	volatile struct idpf_flex_tx_sched_desc *txr;
 	volatile struct idpf_flex_tx_sched_desc *txd;
 	struct ci_tx_entry *sw_ring;
-	union idpf_tx_offload tx_offload = {0};
+	union ci_tx_offload tx_offload = {0};
 	struct ci_tx_entry *txe, *txn;
 	uint16_t nb_used, tx_id, sw_id;
 	struct rte_mbuf *tx_pkt;
@@ -1334,7 +1334,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 {
 	volatile struct ci_tx_desc *txd;
 	volatile struct ci_tx_desc *txr;
-	union idpf_tx_offload tx_offload = {0};
+	union ci_tx_offload tx_offload = {0};
 	struct ci_tx_entry *txe, *txn;
 	struct ci_tx_entry *sw_ring;
 	struct ci_tx_queue *txq;
@@ -1452,10 +1452,10 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			slen = m_seg->data_len;
 			buf_dma_addr = rte_mbuf_data_iova(m_seg);
 			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DATA |
-				((uint64_t)td_cmd  << IDPF_TXD_QW1_CMD_S) |
-				((uint64_t)td_offset << IDPF_TXD_QW1_OFFSET_S) |
-				((uint64_t)slen << IDPF_TXD_QW1_TX_BUF_SZ_S));
+			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)td_cmd  << CI_TXD_QW1_CMD_S) |
+				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+				((uint64_t)slen << CI_TXD_QW1_TX_BUF_SZ_S));
 
 			txe->last_id = tx_last;
 			tx_id = txe->next_id;
@@ -1464,7 +1464,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		} while (m_seg);
 
 		/* The last packet data descriptor needs End Of Packet (EOP) */
-		td_cmd |= IDPF_TX_DESC_CMD_EOP;
+		td_cmd |= CI_TX_DESC_CMD_EOP;
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
@@ -1473,13 +1473,13 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			       "%4u (port=%d queue=%d)",
 			       tx_last, txq->port_id, txq->queue_id);
 
-			td_cmd |= IDPF_TX_DESC_CMD_RS;
+			td_cmd |= CI_TX_DESC_CMD_RS;
 
 			/* Update txq RS bit counters */
 			txq->nb_tx_used = 0;
 		}
 
-		txd->cmd_type_offset_bsz |= rte_cpu_to_le_16(td_cmd << IDPF_TXD_QW1_CMD_S);
+		txd->cmd_type_offset_bsz |= rte_cpu_to_le_16(td_cmd << CI_TXD_QW1_CMD_S);
 	}
 
 end_of_tx:
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.h b/drivers/net/intel/idpf/idpf_common_rxtx.h
index 2f2fa153b2..b88a87402d 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.h
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.h
@@ -169,18 +169,6 @@ struct idpf_rx_queue {
 	uint32_t hw_register_set;
 };
 
-/* Offload features */
-union idpf_tx_offload {
-	uint64_t data;
-	struct {
-		uint64_t l2_len:7; /* L2 (MAC) Header Length. */
-		uint64_t l3_len:9; /* L3 (IP) Header Length. */
-		uint64_t l4_len:8; /* L4 Header Length. */
-		uint64_t tso_segsz:16; /* TCP TSO segment size */
-		/* uint64_t unused : 24; */
-	};
-};
-
 union idpf_tx_desc {
 	struct ci_tx_desc *tx_ring;
 	struct idpf_flex_tx_sched_desc *desc_ring;
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c b/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
index 04efee3722..411b171b97 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx2.c
@@ -486,10 +486,9 @@ static inline void
 idpf_singleq_vtx1(volatile struct ci_tx_desc *txdp,
 		  struct rte_mbuf *pkt, uint64_t flags)
 {
-	uint64_t high_qw =
-		(IDPF_TX_DESC_DTYPE_DATA |
-		 ((uint64_t)flags  << IDPF_TXD_QW1_CMD_S) |
-		 ((uint64_t)pkt->data_len << IDPF_TXD_QW1_TX_BUF_SZ_S));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		 ((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+		 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	__m128i descriptor = _mm_set_epi64x(high_qw,
 				pkt->buf_iova + pkt->data_off);
@@ -500,8 +499,7 @@ static inline void
 idpf_singleq_vtx(volatile struct ci_tx_desc *txdp,
 		 struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
-	const uint64_t hi_qw_tmpl = (IDPF_TX_DESC_DTYPE_DATA |
-			((uint64_t)flags  << IDPF_TXD_QW1_CMD_S));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA | (flags << CI_TXD_QW1_CMD_S));
 
 	/* if unaligned on 32-bit boundary, do one to align */
 	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
@@ -511,22 +509,14 @@ idpf_singleq_vtx(volatile struct ci_tx_desc *txdp,
 
 	/* do two at a time while possible, in bursts */
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
-		uint64_t hi_qw3 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[3]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
-		uint64_t hi_qw2 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[2]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
-		uint64_t hi_qw1 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[1]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
-		uint64_t hi_qw0 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[0]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw3 = hi_qw_tmpl |
+			((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw2 = hi_qw_tmpl |
+			((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw1 = hi_qw_tmpl |
+			((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw0 = hi_qw_tmpl |
+			((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 
 		__m256i desc2_3 =
 			_mm256_set_epi64x
@@ -559,8 +549,8 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts
 	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
-	uint64_t flags = IDPF_TX_DESC_CMD_EOP;
-	uint64_t rs = IDPF_TX_DESC_CMD_RS | flags;
+	uint64_t flags = CI_TX_DESC_CMD_EOP;
+	uint64_t rs = CI_TX_DESC_CMD_RS | flags;
 
 	/* cross rx_thresh boundary is not allowed */
 	nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);
@@ -605,8 +595,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)IDPF_TX_DESC_CMD_RS) <<
-					 IDPF_TXD_QW1_CMD_S);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
index d5e5a2ca5f..49ace35615 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
@@ -1003,10 +1003,9 @@ static __rte_always_inline void
 idpf_singleq_vtx1(volatile struct ci_tx_desc *txdp,
 	  struct rte_mbuf *pkt, uint64_t flags)
 {
-	uint64_t high_qw =
-		(IDPF_TX_DESC_DTYPE_DATA |
-		 ((uint64_t)flags  << IDPF_TXD_QW1_CMD_S) |
-		 ((uint64_t)pkt->data_len << IDPF_TXD_QW1_TX_BUF_SZ_S));
+	uint64_t high_qw = (CI_TX_DESC_DTYPE_DATA |
+		 ((uint64_t)flags << CI_TXD_QW1_CMD_S) |
+		 ((uint64_t)pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 
 	__m128i descriptor = _mm_set_epi64x(high_qw,
 					    pkt->buf_iova + pkt->data_off);
@@ -1019,8 +1018,7 @@ static __rte_always_inline void
 idpf_singleq_vtx(volatile struct ci_tx_desc *txdp,
 	 struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
 {
-	const uint64_t hi_qw_tmpl = (IDPF_TX_DESC_DTYPE_DATA  |
-			((uint64_t)flags  << IDPF_TXD_QW1_CMD_S));
+	const uint64_t hi_qw_tmpl = (CI_TX_DESC_DTYPE_DATA  | (flags << CI_TXD_QW1_CMD_S));
 
 	/* if unaligned on 32-bit boundary, do one to align */
 	if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
@@ -1030,22 +1028,14 @@ idpf_singleq_vtx(volatile struct ci_tx_desc *txdp,
 
 	/* do 4 at a time while possible, in bursts */
 	for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {
-		uint64_t hi_qw3 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[3]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
-		uint64_t hi_qw2 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[2]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
-		uint64_t hi_qw1 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[1]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
-		uint64_t hi_qw0 =
-			hi_qw_tmpl |
-			((uint64_t)pkt[0]->data_len <<
-			 IDPF_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw3 = hi_qw_tmpl |
+			((uint64_t)pkt[3]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw2 = hi_qw_tmpl |
+			((uint64_t)pkt[2]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw1 = hi_qw_tmpl |
+			((uint64_t)pkt[1]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
+		uint64_t hi_qw0 = hi_qw_tmpl |
+			((uint64_t)pkt[0]->data_len << CI_TXD_QW1_TX_BUF_SZ_S);
 
 		__m512i desc0_3 =
 			_mm512_set_epi64
@@ -1075,8 +1065,8 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pk
 	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
-	uint64_t flags = IDPF_TX_DESC_CMD_EOP;
-	uint64_t rs = IDPF_TX_DESC_CMD_RS | flags;
+	uint64_t flags = CI_TX_DESC_CMD_EOP;
+	uint64_t rs = CI_TX_DESC_CMD_RS | flags;
 
 	/* cross rx_thresh boundary is not allowed */
 	nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);
@@ -1124,8 +1114,7 @@ idpf_singleq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pk
 	tx_id = (uint16_t)(tx_id + nb_commit);
 	if (tx_id > txq->tx_next_rs) {
 		txq->ci_tx_ring[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)IDPF_TX_DESC_CMD_RS) <<
-					 IDPF_TXD_QW1_CMD_S);
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs =
 			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
 	}
diff --git a/drivers/net/intel/idpf/idpf_rxtx_vec_common.h b/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
index b5e8574667..a43d8f78e2 100644
--- a/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
+++ b/drivers/net/intel/idpf/idpf_rxtx_vec_common.h
@@ -32,8 +32,8 @@ idpf_tx_desc_done(struct ci_tx_queue *txq, uint16_t idx)
 		return 1;
 
 	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(IDPF_TXD_QW1_DTYPE_M)) ==
-				rte_cpu_to_le_64(IDPF_TX_DESC_DTYPE_DESC_DONE);
+			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+				rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 }
 
 static inline int
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v5 06/35] net/intel: add common fn to calculate needed descriptors
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (4 preceding siblings ...)
  2026-02-11 18:12   ` [PATCH v5 05/35] net/intel: consolidate definitions for Tx desc fields Bruce Richardson
@ 2026-02-11 18:12   ` Bruce Richardson
  2026-02-11 18:12   ` [PATCH v5 07/35] net/ice: refactor context descriptor handling Bruce Richardson
                     ` (29 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 18:12 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Anatoly Burakov, Vladimir Medvedkin,
	Jingjing Wu, Praveen Shetty

Multiple drivers used the same logic to calculate how many Tx data
descriptors were needed. Move that calculation to common code. In the
process of updating drivers, fix idpf driver calculation for the TSO
case.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 drivers/net/intel/common/tx_scalar.h      | 21 +++++++++++++++++++++
 drivers/net/intel/i40e/i40e_rxtx.c        | 18 +-----------------
 drivers/net/intel/iavf/iavf_rxtx.c        | 17 +----------------
 drivers/net/intel/ice/ice_rxtx.c          | 18 +-----------------
 drivers/net/intel/idpf/idpf_common_rxtx.c | 21 +++++++++++++++++----
 5 files changed, 41 insertions(+), 54 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index 6f2024273b..573f5136a9 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -59,4 +59,25 @@ ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
 	return 0;
 }
 
+static inline uint16_t
+ci_div_roundup16(uint16_t x, uint16_t y)
+{
+	return (uint16_t)((x + y - 1) / y);
+}
+
+/* Calculate the number of TX descriptors needed for each pkt */
+static inline uint16_t
+ci_calc_pkt_desc(const struct rte_mbuf *tx_pkt)
+{
+	uint16_t count = 0;
+
+	while (tx_pkt != NULL) {
+		count += ci_div_roundup16(tx_pkt->data_len, CI_MAX_DATA_PER_TXD);
+		tx_pkt = tx_pkt->next;
+	}
+
+	return count;
+}
+
+
 #endif /* _COMMON_INTEL_TX_SCALAR_H_ */
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index f96c5c7f1e..b75306931a 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -1029,21 +1029,6 @@ i40e_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 	return ctx_desc;
 }
 
-/* Calculate the number of TX descriptors needed for each pkt */
-static inline uint16_t
-i40e_calc_pkt_desc(struct rte_mbuf *tx_pkt)
-{
-	struct rte_mbuf *txd = tx_pkt;
-	uint16_t count = 0;
-
-	while (txd != NULL) {
-		count += DIV_ROUND_UP(txd->data_len, CI_MAX_DATA_PER_TXD);
-		txd = txd->next;
-	}
-
-	return count;
-}
-
 uint16_t
 i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
@@ -1106,8 +1091,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		 * per tx desc.
 		 */
 		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-			nb_used = (uint16_t)(i40e_calc_pkt_desc(tx_pkt) +
-					     nb_ctx);
+			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
 		else
 			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
 		tx_last = (uint16_t)(tx_id + nb_used - 1);
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index 947b6c24d2..885d9309cc 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -2666,21 +2666,6 @@ iavf_build_data_desc_cmd_offset_fields(volatile uint64_t *qw1,
 		((uint64_t)l2tag1 << IAVF_TXD_DATA_QW1_L2TAG1_SHIFT));
 }
 
-/* Calculate the number of TX descriptors needed for each pkt */
-static inline uint16_t
-iavf_calc_pkt_desc(struct rte_mbuf *tx_pkt)
-{
-	struct rte_mbuf *txd = tx_pkt;
-	uint16_t count = 0;
-
-	while (txd != NULL) {
-		count += (txd->data_len + CI_MAX_DATA_PER_TXD - 1) / CI_MAX_DATA_PER_TXD;
-		txd = txd->next;
-	}
-
-	return count;
-}
-
 static inline void
 iavf_fill_data_desc(volatile struct ci_tx_desc *desc,
 	uint64_t desc_template,	uint16_t buffsz,
@@ -2766,7 +2751,7 @@ iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		 * per tx desc.
 		 */
 		if (mb->ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-			nb_desc_required = iavf_calc_pkt_desc(mb) + nb_desc_ctx + nb_desc_ipsec;
+			nb_desc_required = ci_calc_pkt_desc(mb) + nb_desc_ctx + nb_desc_ipsec;
 		else
 			nb_desc_required = nb_desc_data + nb_desc_ctx + nb_desc_ipsec;
 
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 52bbf95967..2a53b614b2 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3075,21 +3075,6 @@ ice_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 	return ctx_desc;
 }
 
-/* Calculate the number of TX descriptors needed for each pkt */
-static inline uint16_t
-ice_calc_pkt_desc(struct rte_mbuf *tx_pkt)
-{
-	struct rte_mbuf *txd = tx_pkt;
-	uint16_t count = 0;
-
-	while (txd != NULL) {
-		count += DIV_ROUND_UP(txd->data_len, CI_MAX_DATA_PER_TXD);
-		txd = txd->next;
-	}
-
-	return count;
-}
-
 uint16_t
 ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
@@ -3152,8 +3137,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		 * per tx desc.
 		 */
 		if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
-			nb_used = (uint16_t)(ice_calc_pkt_desc(tx_pkt) +
-					     nb_ctx);
+			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
 		else
 			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
 		tx_last = (uint16_t)(tx_id + nb_used - 1);
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 587871b54a..11d6848430 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -934,7 +934,16 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		tx_offload.tso_segsz = tx_pkt->tso_segsz;
 		/* Calculate the number of context descriptors needed. */
 		nb_ctx = idpf_calc_context_desc(ol_flags);
-		nb_used = tx_pkt->nb_segs + nb_ctx;
+
+		/* Calculate the number of TX descriptors needed for
+		 * each packet. For TSO packets, use ci_calc_pkt_desc as
+		 * the mbuf data size might exceed max data size that hw allows
+		 * per tx desc.
+		 */
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+			nb_used = ci_calc_pkt_desc(tx_pkt) + nb_ctx;
+		else
+			nb_used = tx_pkt->nb_segs + nb_ctx;
 
 		if (ol_flags & IDPF_TX_CKSUM_OFFLOAD_MASK)
 			cmd_dtype = IDPF_TXD_FLEX_FLOW_CMD_CS_EN;
@@ -1382,10 +1391,14 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		nb_ctx = idpf_calc_context_desc(ol_flags);
 
 		/* The number of descriptors that must be allocated for
-		 * a packet equals to the number of the segments of that
-		 * packet plus 1 context descriptor if needed.
+		 * a packet. For TSO packets, use ci_calc_pkt_desc as
+		 * the mbuf data size might exceed max data size that hw allows
+		 * per tx desc.
 		 */
-		nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
+		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
+		else
+			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
 		tx_last = (uint16_t)(tx_id + nb_used - 1);
 
 		/* Circular ring */
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v5 07/35] net/ice: refactor context descriptor handling
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (5 preceding siblings ...)
  2026-02-11 18:12   ` [PATCH v5 06/35] net/intel: add common fn to calculate needed descriptors Bruce Richardson
@ 2026-02-11 18:12   ` Bruce Richardson
  2026-02-12 12:16     ` Burakov, Anatoly
  2026-02-11 18:12   ` [PATCH v5 08/35] net/i40e: " Bruce Richardson
                     ` (28 subsequent siblings)
  35 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 18:12 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

Create a single function to manage all context descriptor handling,
which returns either 0 or 1 depending on whether a descriptor is needed
or not, as well as returning directly the descriptor contents if
relevant.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/ice/ice_rxtx.c | 104 +++++++++++++++++--------------
 1 file changed, 57 insertions(+), 47 deletions(-)

diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 2a53b614b2..1c789d45da 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -2966,10 +2966,6 @@ ice_txd_enable_checksum(uint64_t ol_flags,
 			uint32_t *td_offset,
 			union ci_tx_offload tx_offload)
 {
-	/* Set MACLEN */
-	if (!(ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK))
-		*td_offset |= (tx_offload.l2_len >> 1)
-			<< CI_TX_DESC_LEN_MACLEN_S;
 
 	/* Enable L3 checksum offloads */
 	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
@@ -3052,7 +3048,7 @@ ice_calc_context_desc(uint64_t flags)
 
 /* set ice TSO context descriptor */
 static inline uint64_t
-ice_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
+ice_set_tso_ctx(uint64_t ol_flags, const struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 {
 	uint64_t ctx_desc = 0;
 	uint32_t cd_cmd, hdr_len, cd_tso_len;
@@ -3063,7 +3059,7 @@ ice_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 	}
 
 	hdr_len = tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len;
-	hdr_len += (mbuf->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) ?
+	hdr_len += (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) ?
 		   tx_offload.outer_l2_len + tx_offload.outer_l3_len : 0;
 
 	cd_cmd = CI_TX_CTX_DESC_TSO;
@@ -3075,6 +3071,49 @@ ice_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 	return ctx_desc;
 }
 
+/* compute a context descriptor if one is necessary based on the ol_flags
+ *
+ * Returns 0 if no descriptor is necessary.
+ * Returns 1 if one is necessary and the contents of the descriptor are returned
+ *   in the values pointed to by qw0 and qw1.
+ */
+static __rte_always_inline uint16_t
+get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
+	const union ci_tx_offload *tx_offload, const struct ci_tx_queue *txq,
+	uint64_t *qw0, uint64_t *qw1)
+{
+	uint16_t cd_l2tag2 = 0;
+	uint64_t cd_type_cmd_tso_mss = ICE_TX_DESC_DTYPE_CTX;
+	uint32_t cd_tunneling_params = 0;
+	uint64_t ptp_tx_index = txq->ice_vsi->adapter->ptp_tx_index;
+
+	if (ice_calc_context_desc(ol_flags) == 0)
+		return 0;
+
+	if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)
+		ice_parse_tunneling_params(ol_flags, *tx_offload, &cd_tunneling_params);
+
+	if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
+		cd_type_cmd_tso_mss |= ice_set_tso_ctx(ol_flags, tx_pkt, *tx_offload);
+	else if (ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST)
+		cd_type_cmd_tso_mss |=
+			((uint64_t)CI_TX_CTX_DESC_TSYN << CI_TXD_QW1_CMD_S) |
+			((ptp_tx_index << ICE_TXD_CTX_QW1_TSYN_S) & ICE_TXD_CTX_QW1_TSYN_M);
+
+
+	/* TX context descriptor based double VLAN insert */
+	if (ol_flags & RTE_MBUF_F_TX_QINQ) {
+		cd_l2tag2 = tx_pkt->vlan_tci_outer;
+		cd_type_cmd_tso_mss |= ((uint64_t)CI_TX_CTX_DESC_IL2TAG2 << CI_TXD_QW1_CMD_S);
+	}
+
+	*qw0 = rte_cpu_to_le_32(cd_tunneling_params) |
+		((uint64_t)rte_cpu_to_le_16(cd_l2tag2) << 32);
+	*qw1 = rte_cpu_to_le_64(cd_type_cmd_tso_mss);
+
+	return 1;
+}
+
 uint16_t
 ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
@@ -3085,7 +3124,6 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	struct ci_tx_entry *txe, *txn;
 	struct rte_mbuf *tx_pkt;
 	struct rte_mbuf *m_seg;
-	uint32_t cd_tunneling_params;
 	uint16_t tx_id;
 	uint16_t ts_id = -1;
 	uint16_t nb_tx;
@@ -3096,6 +3134,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	uint32_t td_tag = 0;
 	uint16_t tx_last;
 	uint16_t slen;
+	uint16_t l2_len;
 	uint64_t buf_dma_addr;
 	uint64_t ol_flags;
 	union ci_tx_offload tx_offload = {0};
@@ -3114,20 +3153,25 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		(void)ci_tx_xmit_cleanup(txq);
 
 	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
+		uint64_t cd_qw0, cd_qw1;
 		tx_pkt = *tx_pkts++;
 
+		ol_flags = tx_pkt->ol_flags;
 		td_cmd = 0;
 		td_tag = 0;
-		td_offset = 0;
-		ol_flags = tx_pkt->ol_flags;
+		l2_len = ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK ?
+				tx_pkt->outer_l2_len : tx_pkt->l2_len;
+		td_offset = (l2_len >> 1) << CI_TX_DESC_LEN_MACLEN_S;
+
 		tx_offload.l2_len = tx_pkt->l2_len;
 		tx_offload.l3_len = tx_pkt->l3_len;
 		tx_offload.outer_l2_len = tx_pkt->outer_l2_len;
 		tx_offload.outer_l3_len = tx_pkt->outer_l3_len;
 		tx_offload.l4_len = tx_pkt->l4_len;
 		tx_offload.tso_segsz = tx_pkt->tso_segsz;
+
 		/* Calculate the number of context descriptors needed. */
-		nb_ctx = ice_calc_context_desc(ol_flags);
+		nb_ctx = get_context_desc(ol_flags, tx_pkt, &tx_offload, txq, &cd_qw0, &cd_qw1);
 
 		/* The number of descriptors that must be allocated for
 		 * a packet equals to the number of the segments of that
@@ -3169,15 +3213,6 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			td_tag = tx_pkt->vlan_tci;
 		}
 
-		/* Fill in tunneling parameters if necessary */
-		cd_tunneling_params = 0;
-		if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) {
-			td_offset |= (tx_offload.outer_l2_len >> 1)
-				<< CI_TX_DESC_LEN_MACLEN_S;
-			ice_parse_tunneling_params(ol_flags, tx_offload,
-						   &cd_tunneling_params);
-		}
-
 		/* Enable checksum offloading */
 		if (ol_flags & ICE_TX_CKSUM_OFFLOAD_MASK)
 			ice_txd_enable_checksum(ol_flags, &td_cmd,
@@ -3185,11 +3220,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		if (nb_ctx) {
 			/* Setup TX context descriptor if required */
-			volatile struct ice_tx_ctx_desc *ctx_txd =
-				(volatile struct ice_tx_ctx_desc *)
-					&ci_tx_ring[tx_id];
-			uint16_t cd_l2tag2 = 0;
-			uint64_t cd_type_cmd_tso_mss = ICE_TX_DESC_DTYPE_CTX;
+			uint64_t *ctx_txd = RTE_CAST_PTR(uint64_t *, &ci_tx_ring[tx_id]);
 
 			txn = &sw_ring[txe->next_id];
 			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
@@ -3198,29 +3229,8 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 				txe->mbuf = NULL;
 			}
 
-			if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
-				cd_type_cmd_tso_mss |=
-					ice_set_tso_ctx(tx_pkt, tx_offload);
-			else if (ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST)
-				cd_type_cmd_tso_mss |=
-					((uint64_t)CI_TX_CTX_DESC_TSYN <<
-					CI_TXD_QW1_CMD_S) |
-					 (((uint64_t)txq->ice_vsi->adapter->ptp_tx_index <<
-					 ICE_TXD_CTX_QW1_TSYN_S) & ICE_TXD_CTX_QW1_TSYN_M);
-
-			ctx_txd->tunneling_params =
-				rte_cpu_to_le_32(cd_tunneling_params);
-
-			/* TX context descriptor based double VLAN insert */
-			if (ol_flags & RTE_MBUF_F_TX_QINQ) {
-				cd_l2tag2 = tx_pkt->vlan_tci_outer;
-				cd_type_cmd_tso_mss |=
-					((uint64_t)CI_TX_CTX_DESC_IL2TAG2 <<
-					 CI_TXD_QW1_CMD_S);
-			}
-			ctx_txd->l2tag2 = rte_cpu_to_le_16(cd_l2tag2);
-			ctx_txd->qw1 =
-				rte_cpu_to_le_64(cd_type_cmd_tso_mss);
+			ctx_txd[0] = cd_qw0;
+			ctx_txd[1] = cd_qw1;
 
 			txe->last_id = tx_last;
 			tx_id = txe->next_id;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v5 08/35] net/i40e: refactor context descriptor handling
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (6 preceding siblings ...)
  2026-02-11 18:12   ` [PATCH v5 07/35] net/ice: refactor context descriptor handling Bruce Richardson
@ 2026-02-11 18:12   ` Bruce Richardson
  2026-02-12 12:19     ` Burakov, Anatoly
  2026-02-11 18:12   ` [PATCH v5 09/35] net/idpf: " Bruce Richardson
                     ` (27 subsequent siblings)
  35 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 18:12 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

Move all context descriptor handling to a single function, as with the
ice driver, and use the same function signature as that driver.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/i40e/i40e_rxtx.c | 123 +++++++++++++++--------------
 1 file changed, 63 insertions(+), 60 deletions(-)

diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index b75306931a..601d4b98f2 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -321,11 +321,6 @@ i40e_txd_enable_checksum(uint64_t ol_flags,
 			uint32_t *td_offset,
 			union ci_tx_offload tx_offload)
 {
-	/* Set MACLEN */
-	if (!(ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK))
-		*td_offset |= (tx_offload.l2_len >> 1)
-			<< CI_TX_DESC_LEN_MACLEN_S;
-
 	/* Enable L3 checksum offloads */
 	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
 		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
@@ -1004,7 +999,7 @@ i40e_calc_context_desc(uint64_t flags)
 
 /* set i40e TSO context descriptor */
 static inline uint64_t
-i40e_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
+i40e_set_tso_ctx(uint64_t ol_flags, const struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 {
 	uint64_t ctx_desc = 0;
 	uint32_t cd_cmd, hdr_len, cd_tso_len;
@@ -1015,7 +1010,7 @@ i40e_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 	}
 
 	hdr_len = tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len;
-	hdr_len += (mbuf->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) ?
+	hdr_len += (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) ?
 		   tx_offload.outer_l2_len + tx_offload.outer_l3_len : 0;
 
 	cd_cmd = I40E_TX_CTX_DESC_TSO;
@@ -1029,6 +1024,52 @@ i40e_set_tso_ctx(struct rte_mbuf *mbuf, union ci_tx_offload tx_offload)
 	return ctx_desc;
 }
 
+/* compute a context descriptor if one is necessary based on the ol_flags
+ *
+ * Returns 0 if no descriptor is necessary.
+ * Returns 1 if one is necessary and the contents of the descriptor are returned
+ *   in the values pointed to by qw0 and qw1.
+ */
+static __rte_always_inline uint16_t
+get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
+		 const union ci_tx_offload *tx_offload,
+		 const struct ci_tx_queue *txq __rte_unused,
+		 uint64_t *qw0, uint64_t *qw1)
+{
+	uint16_t cd_l2tag2 = 0;
+	uint64_t cd_type_cmd_tso_mss = I40E_TX_DESC_DTYPE_CONTEXT;
+	uint32_t cd_tunneling_params = 0;
+
+	if (i40e_calc_context_desc(ol_flags) == 0)
+		return 0;
+
+	if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)
+		i40e_parse_tunneling_params(ol_flags, *tx_offload, &cd_tunneling_params);
+
+	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+		cd_type_cmd_tso_mss |= i40e_set_tso_ctx(ol_flags, tx_pkt, *tx_offload);
+	} else {
+#ifdef RTE_LIBRTE_IEEE1588
+		if (ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST)
+			cd_type_cmd_tso_mss |=
+				((uint64_t)I40E_TX_CTX_DESC_TSYN << I40E_TXD_CTX_QW1_CMD_SHIFT);
+#endif
+	}
+
+	/* TX context descriptor based double VLAN insert */
+	if (ol_flags & RTE_MBUF_F_TX_QINQ) {
+		cd_l2tag2 = tx_pkt->vlan_tci_outer;
+		cd_type_cmd_tso_mss |=
+				((uint64_t)I40E_TX_CTX_DESC_IL2TAG2 << I40E_TXD_CTX_QW1_CMD_SHIFT);
+	}
+
+	*qw0 = rte_cpu_to_le_32(cd_tunneling_params) |
+		((uint64_t)rte_cpu_to_le_16(cd_l2tag2) << 32);
+	*qw1 = rte_cpu_to_le_64(cd_type_cmd_tso_mss);
+
+	return 1;
+}
+
 uint16_t
 i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
@@ -1039,7 +1080,6 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	volatile struct ci_tx_desc *txr;
 	struct rte_mbuf *tx_pkt;
 	struct rte_mbuf *m_seg;
-	uint32_t cd_tunneling_params;
 	uint16_t tx_id;
 	uint16_t nb_tx;
 	uint32_t td_cmd;
@@ -1050,6 +1090,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	uint16_t nb_ctx;
 	uint16_t tx_last;
 	uint16_t slen;
+	uint16_t l2_len;
 	uint64_t buf_dma_addr;
 	union ci_tx_offload tx_offload = {0};
 
@@ -1064,14 +1105,15 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		(void)ci_tx_xmit_cleanup(txq);
 
 	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
-		td_cmd = 0;
-		td_tag = 0;
-		td_offset = 0;
-
 		tx_pkt = *tx_pkts++;
 		RTE_MBUF_PREFETCH_TO_FREE(txe->mbuf);
 
 		ol_flags = tx_pkt->ol_flags;
+		td_cmd = 0;
+		td_tag = 0;
+		l2_len = ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK ?
+				tx_pkt->outer_l2_len : tx_pkt->l2_len;
+		td_offset = (l2_len >> 1) << CI_TX_DESC_LEN_MACLEN_S;
 		tx_offload.l2_len = tx_pkt->l2_len;
 		tx_offload.l3_len = tx_pkt->l3_len;
 		tx_offload.outer_l2_len = tx_pkt->outer_l2_len;
@@ -1080,7 +1122,9 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		tx_offload.tso_segsz = tx_pkt->tso_segsz;
 
 		/* Calculate the number of context descriptors needed. */
-		nb_ctx = i40e_calc_context_desc(ol_flags);
+		uint64_t cd_qw0 = 0, cd_qw1 = 0;
+		nb_ctx = get_context_desc(ol_flags, tx_pkt, &tx_offload, txq,
+				&cd_qw0, &cd_qw1);
 
 		/**
 		 * The number of descriptors that must be allocated for
@@ -1126,14 +1170,6 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		/* Always enable CRC offload insertion */
 		td_cmd |= CI_TX_DESC_CMD_ICRC;
 
-		/* Fill in tunneling parameters if necessary */
-		cd_tunneling_params = 0;
-		if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) {
-			td_offset |= (tx_offload.outer_l2_len >> 1)
-					<< CI_TX_DESC_LEN_MACLEN_S;
-			i40e_parse_tunneling_params(ol_flags, tx_offload,
-						    &cd_tunneling_params);
-		}
 		/* Enable checksum offloading */
 		if (ol_flags & I40E_TX_CKSUM_OFFLOAD_MASK)
 			i40e_txd_enable_checksum(ol_flags, &td_cmd,
@@ -1141,12 +1177,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		if (nb_ctx) {
 			/* Setup TX context descriptor if required */
-			volatile struct i40e_tx_context_desc *ctx_txd =
-				(volatile struct i40e_tx_context_desc *)\
-							&txr[tx_id];
-			uint16_t cd_l2tag2 = 0;
-			uint64_t cd_type_cmd_tso_mss =
-				I40E_TX_DESC_DTYPE_CONTEXT;
+			uint64_t *desc = RTE_CAST_PTR(uint64_t *, &txr[tx_id]);
 
 			txn = &sw_ring[txe->next_id];
 			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
@@ -1155,41 +1186,13 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 				txe->mbuf = NULL;
 			}
 
-			/* TSO enabled means no timestamp */
-			if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-				cd_type_cmd_tso_mss |=
-					i40e_set_tso_ctx(tx_pkt, tx_offload);
-			else {
-#ifdef RTE_LIBRTE_IEEE1588
-				if (ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST)
-					cd_type_cmd_tso_mss |=
-						((uint64_t)I40E_TX_CTX_DESC_TSYN <<
-						 I40E_TXD_CTX_QW1_CMD_SHIFT);
-#endif
-			}
-
-			ctx_txd->tunneling_params =
-				rte_cpu_to_le_32(cd_tunneling_params);
-			if (ol_flags & RTE_MBUF_F_TX_QINQ) {
-				cd_l2tag2 = tx_pkt->vlan_tci_outer;
-				cd_type_cmd_tso_mss |=
-					((uint64_t)I40E_TX_CTX_DESC_IL2TAG2 <<
-						I40E_TXD_CTX_QW1_CMD_SHIFT);
-			}
-			ctx_txd->l2tag2 = rte_cpu_to_le_16(cd_l2tag2);
-			ctx_txd->type_cmd_tso_mss =
-				rte_cpu_to_le_64(cd_type_cmd_tso_mss);
+			desc[0] = cd_qw0;
+			desc[1] = cd_qw1;
 
 			PMD_TX_LOG(DEBUG, "mbuf: %p, TCD[%u]: "
-				"tunneling_params: %#x; "
-				"l2tag2: %#hx; "
-				"rsvd: %#hx; "
-				"type_cmd_tso_mss: %#"PRIx64";",
-				tx_pkt, tx_id,
-				ctx_txd->tunneling_params,
-				ctx_txd->l2tag2,
-				ctx_txd->rsvd,
-				ctx_txd->type_cmd_tso_mss);
+				"qw0: %#"PRIx64"; "
+				"qw1: %#"PRIx64";",
+				tx_pkt, tx_id, cd_qw0, cd_qw1);
 
 			txe->last_id = tx_last;
 			tx_id = txe->next_id;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v5 09/35] net/idpf: refactor context descriptor handling
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (7 preceding siblings ...)
  2026-02-11 18:12   ` [PATCH v5 08/35] net/i40e: " Bruce Richardson
@ 2026-02-11 18:12   ` Bruce Richardson
  2026-02-11 18:12   ` [PATCH v5 10/35] net/intel: consolidate checksum mask definition Bruce Richardson
                     ` (26 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 18:12 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Ciara Loftus, Anatoly Burakov, Jingjing Wu,
	Praveen Shetty

Move all context descriptor handling to a single function, as with the
ice and i40e drivers.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 drivers/net/intel/idpf/idpf_common_rxtx.c | 61 +++++++++++------------
 1 file changed, 28 insertions(+), 33 deletions(-)

diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 11d6848430..383cfc8567 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -845,37 +845,36 @@ idpf_calc_context_desc(uint64_t flags)
 	return 0;
 }
 
-/* set TSO context descriptor
+/* set TSO context descriptor, returns 0 if no context needed, 1 if context set
  */
-static inline void
-idpf_set_splitq_tso_ctx(struct rte_mbuf *mbuf,
+static inline uint16_t
+idpf_set_tso_ctx(uint64_t ol_flags, struct rte_mbuf *mbuf,
 			union ci_tx_offload tx_offload,
-			volatile union idpf_flex_tx_ctx_desc *ctx_desc)
+			uint64_t *qw0, uint64_t *qw1)
 {
-	uint16_t cmd_dtype;
+	uint16_t cmd_dtype = IDPF_TX_DESC_DTYPE_FLEX_TSO_CTX | IDPF_TX_FLEX_CTX_DESC_CMD_TSO;
+	uint16_t tso_segsz = mbuf->tso_segsz;
 	uint32_t tso_len;
 	uint8_t hdr_len;
 
+	if (idpf_calc_context_desc(ol_flags) == 0)
+		return 0;
+
+	/* TSO context descriptor setup */
 	if (tx_offload.l4_len == 0) {
 		TX_LOG(DEBUG, "L4 length set to 0");
-		return;
+		return 0;
 	}
 
-	hdr_len = tx_offload.l2_len +
-		tx_offload.l3_len +
-		tx_offload.l4_len;
-	cmd_dtype = IDPF_TX_DESC_DTYPE_FLEX_TSO_CTX |
-		IDPF_TX_FLEX_CTX_DESC_CMD_TSO;
+	hdr_len = tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len;
 	tso_len = mbuf->pkt_len - hdr_len;
 
-	ctx_desc->tso.qw1.cmd_dtype = rte_cpu_to_le_16(cmd_dtype);
-	ctx_desc->tso.qw0.hdr_len = hdr_len;
-	ctx_desc->tso.qw0.mss_rt =
-		rte_cpu_to_le_16((uint16_t)mbuf->tso_segsz &
-				 IDPF_TXD_FLEX_CTX_MSS_RT_M);
-	ctx_desc->tso.qw0.flex_tlen =
-		rte_cpu_to_le_32(tso_len &
-				 IDPF_TXD_FLEX_CTX_MSS_RT_M);
+	*qw0 = rte_cpu_to_le_32(tso_len & IDPF_TXD_FLEX_CTX_MSS_RT_M) |
+	       ((uint64_t)rte_cpu_to_le_16(tso_segsz & IDPF_TXD_FLEX_CTX_MSS_RT_M) << 32) |
+	       ((uint64_t)hdr_len << 48);
+	*qw1 = rte_cpu_to_le_16(cmd_dtype);
+
+	return 1;
 }
 
 RTE_EXPORT_INTERNAL_SYMBOL(idpf_dp_splitq_xmit_pkts)
@@ -933,7 +932,8 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		tx_offload.l4_len = tx_pkt->l4_len;
 		tx_offload.tso_segsz = tx_pkt->tso_segsz;
 		/* Calculate the number of context descriptors needed. */
-		nb_ctx = idpf_calc_context_desc(ol_flags);
+		uint64_t cd_qw0 = 0, cd_qw1 = 0;
+		nb_ctx = idpf_set_tso_ctx(ol_flags, tx_pkt, tx_offload, &cd_qw0, &cd_qw1);
 
 		/* Calculate the number of TX descriptors needed for
 		 * each packet. For TSO packets, use ci_calc_pkt_desc as
@@ -950,12 +950,10 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 		/* context descriptor */
 		if (nb_ctx != 0) {
-			volatile union idpf_flex_tx_ctx_desc *ctx_desc =
-				(volatile union idpf_flex_tx_ctx_desc *)&txr[tx_id];
+			uint64_t *ctx_desc = RTE_CAST_PTR(uint64_t *, &txr[tx_id]);
 
-			if ((ol_flags & RTE_MBUF_F_TX_TCP_SEG) != 0)
-				idpf_set_splitq_tso_ctx(tx_pkt, tx_offload,
-							ctx_desc);
+			ctx_desc[0] = cd_qw0;
+			ctx_desc[1] = cd_qw1;
 
 			tx_id++;
 			if (tx_id == txq->nb_tx_desc)
@@ -1388,7 +1386,8 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		tx_offload.l4_len = tx_pkt->l4_len;
 		tx_offload.tso_segsz = tx_pkt->tso_segsz;
 		/* Calculate the number of context descriptors needed. */
-		nb_ctx = idpf_calc_context_desc(ol_flags);
+		uint64_t cd_qw0, cd_qw1;
+		nb_ctx = idpf_set_tso_ctx(ol_flags, tx_pkt, tx_offload, &cd_qw0, &cd_qw1);
 
 		/* The number of descriptors that must be allocated for
 		 * a packet. For TSO packets, use ci_calc_pkt_desc as
@@ -1431,9 +1430,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 		if (nb_ctx != 0) {
 			/* Setup TX context descriptor if required */
-			volatile union idpf_flex_tx_ctx_desc *ctx_txd =
-				(volatile union idpf_flex_tx_ctx_desc *)
-				&txr[tx_id];
+			uint64_t *ctx_txd = RTE_CAST_PTR(uint64_t *, &txr[tx_id]);
 
 			txn = &sw_ring[txe->next_id];
 			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
@@ -1442,10 +1439,8 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 				txe->mbuf = NULL;
 			}
 
-			/* TSO enabled */
-			if ((ol_flags & RTE_MBUF_F_TX_TCP_SEG) != 0)
-				idpf_set_splitq_tso_ctx(tx_pkt, tx_offload,
-							ctx_txd);
+			ctx_txd[0] = cd_qw0;
+			ctx_txd[1] = cd_qw1;
 
 			txe->last_id = tx_last;
 			tx_id = txe->next_id;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v5 10/35] net/intel: consolidate checksum mask definition
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (8 preceding siblings ...)
  2026-02-11 18:12   ` [PATCH v5 09/35] net/idpf: " Bruce Richardson
@ 2026-02-11 18:12   ` Bruce Richardson
  2026-02-11 18:12   ` [PATCH v5 11/35] net/intel: create common checksum Tx offload function Bruce Richardson
                     ` (25 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 18:12 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Anatoly Burakov, Vladimir Medvedkin,
	Jingjing Wu, Praveen Shetty

Create a common definition for checksum masks across iavf, idpf, i40e
and ice drivers.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 drivers/net/intel/common/tx.h             | 8 ++++++++
 drivers/net/intel/i40e/i40e_rxtx.c        | 7 +------
 drivers/net/intel/iavf/iavf_rxtx.c        | 2 +-
 drivers/net/intel/iavf/iavf_rxtx.h        | 8 --------
 drivers/net/intel/ice/ice_rxtx.c          | 8 +-------
 drivers/net/intel/idpf/idpf_common_rxtx.c | 4 ++--
 drivers/net/intel/idpf/idpf_common_rxtx.h | 7 +------
 7 files changed, 14 insertions(+), 30 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 091f220f1c..23deabc5d1 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -53,6 +53,14 @@
 /* Common maximum data per TX descriptor */
 #define CI_MAX_DATA_PER_TXD     (CI_TXD_QW1_TX_BUF_SZ_M >> CI_TXD_QW1_TX_BUF_SZ_S)
 
+/* Checksum offload mask to identify packets requesting offload */
+#define CI_TX_CKSUM_OFFLOAD_MASK (RTE_MBUF_F_TX_IP_CKSUM |		 \
+				   RTE_MBUF_F_TX_L4_MASK |		 \
+				   RTE_MBUF_F_TX_TCP_SEG |		 \
+				   RTE_MBUF_F_TX_UDP_SEG |		 \
+				   RTE_MBUF_F_TX_OUTER_IP_CKSUM |	 \
+				   RTE_MBUF_F_TX_OUTER_UDP_CKSUM)
+
 /**
  * Common TX offload union for Intel drivers.
  * Supports both basic offloads (l2_len, l3_len, l4_len, tso_segsz) and
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 601d4b98f2..12a21407c5 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -53,11 +53,6 @@
 #define I40E_TX_IEEE1588_TMST 0
 #endif
 
-#define I40E_TX_CKSUM_OFFLOAD_MASK (RTE_MBUF_F_TX_IP_CKSUM |		 \
-		RTE_MBUF_F_TX_L4_MASK |		 \
-		RTE_MBUF_F_TX_TCP_SEG |		 \
-		RTE_MBUF_F_TX_OUTER_IP_CKSUM)
-
 #define I40E_TX_OFFLOAD_MASK (RTE_MBUF_F_TX_OUTER_IPV4 |	\
 		RTE_MBUF_F_TX_OUTER_IPV6 |	\
 		RTE_MBUF_F_TX_IPV4 |		\
@@ -1171,7 +1166,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		td_cmd |= CI_TX_DESC_CMD_ICRC;
 
 		/* Enable checksum offloading */
-		if (ol_flags & I40E_TX_CKSUM_OFFLOAD_MASK)
+		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
 			i40e_txd_enable_checksum(ol_flags, &td_cmd,
 						 &td_offset, tx_offload);
 
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index 885d9309cc..3dbcfd5355 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -2596,7 +2596,7 @@ iavf_build_data_desc_cmd_offset_fields(volatile uint64_t *qw1,
 	}
 
 	if ((m->ol_flags &
-	    (IAVF_TX_CKSUM_OFFLOAD_MASK | RTE_MBUF_F_TX_SEC_OFFLOAD)) == 0)
+	    (CI_TX_CKSUM_OFFLOAD_MASK | RTE_MBUF_F_TX_SEC_OFFLOAD)) == 0)
 		goto skip_cksum;
 
 	/* Set MACLEN */
diff --git a/drivers/net/intel/iavf/iavf_rxtx.h b/drivers/net/intel/iavf/iavf_rxtx.h
index 395d97b4ee..cca5c25119 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.h
+++ b/drivers/net/intel/iavf/iavf_rxtx.h
@@ -136,14 +136,6 @@
 
 #define IAVF_TX_MIN_PKT_LEN 17
 
-#define IAVF_TX_CKSUM_OFFLOAD_MASK (		 \
-		RTE_MBUF_F_TX_IP_CKSUM |		 \
-		RTE_MBUF_F_TX_L4_MASK |		 \
-		RTE_MBUF_F_TX_TCP_SEG |          \
-		RTE_MBUF_F_TX_UDP_SEG |          \
-		RTE_MBUF_F_TX_OUTER_IP_CKSUM |   \
-		RTE_MBUF_F_TX_OUTER_UDP_CKSUM)
-
 #define IAVF_TX_OFFLOAD_MASK (  \
 		RTE_MBUF_F_TX_OUTER_IPV6 |		 \
 		RTE_MBUF_F_TX_OUTER_IPV4 |		 \
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 1c789d45da..63bce7bd9e 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -13,12 +13,6 @@
 #include "../common/rx_vec_x86.h"
 #endif
 
-#define ICE_TX_CKSUM_OFFLOAD_MASK (RTE_MBUF_F_TX_IP_CKSUM |		 \
-		RTE_MBUF_F_TX_L4_MASK |		 \
-		RTE_MBUF_F_TX_TCP_SEG |		 \
-		RTE_MBUF_F_TX_UDP_SEG |		 \
-		RTE_MBUF_F_TX_OUTER_IP_CKSUM)
-
 /**
  * The mbuf dynamic field pointer for protocol extraction metadata.
  */
@@ -3214,7 +3208,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		}
 
 		/* Enable checksum offloading */
-		if (ol_flags & ICE_TX_CKSUM_OFFLOAD_MASK)
+		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
 			ice_txd_enable_checksum(ol_flags, &td_cmd,
 						&td_offset, tx_offload);
 
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 383cfc8567..c34dde2796 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -945,7 +945,7 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		else
 			nb_used = tx_pkt->nb_segs + nb_ctx;
 
-		if (ol_flags & IDPF_TX_CKSUM_OFFLOAD_MASK)
+		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
 			cmd_dtype = IDPF_TXD_FLEX_FLOW_CMD_CS_EN;
 
 		/* context descriptor */
@@ -1425,7 +1425,7 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			}
 		}
 
-		if (ol_flags & IDPF_TX_CKSUM_OFFLOAD_MASK)
+		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
 			td_cmd |= IDPF_TX_FLEX_DESC_CMD_CS_EN;
 
 		if (nb_ctx != 0) {
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.h b/drivers/net/intel/idpf/idpf_common_rxtx.h
index b88a87402d..fe7094d434 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.h
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.h
@@ -39,13 +39,8 @@
 #define IDPF_RLAN_CTX_DBUF_S	7
 #define IDPF_RX_MAX_DATA_BUF_SIZE	(16 * 1024 - 128)
 
-#define IDPF_TX_CKSUM_OFFLOAD_MASK (		\
-		RTE_MBUF_F_TX_IP_CKSUM |	\
-		RTE_MBUF_F_TX_L4_MASK |		\
-		RTE_MBUF_F_TX_TCP_SEG)
-
 #define IDPF_TX_OFFLOAD_MASK (			\
-		IDPF_TX_CKSUM_OFFLOAD_MASK |	\
+		CI_TX_CKSUM_OFFLOAD_MASK |	\
 		RTE_MBUF_F_TX_IPV4 |		\
 		RTE_MBUF_F_TX_IPV6)
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v5 11/35] net/intel: create common checksum Tx offload function
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (9 preceding siblings ...)
  2026-02-11 18:12   ` [PATCH v5 10/35] net/intel: consolidate checksum mask definition Bruce Richardson
@ 2026-02-11 18:12   ` Bruce Richardson
  2026-02-11 18:12   ` [PATCH v5 12/35] net/intel: create a common scalar Tx function Bruce Richardson
                     ` (24 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 18:12 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

Since i40e and ice have the same checksum offload logic, merge their
functions into one. Future rework should enable this to be used by more
drivers also.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 drivers/net/intel/common/tx_scalar.h | 50 +++++++++++++++++++++++
 drivers/net/intel/i40e/i40e_rxtx.c   | 52 +-----------------------
 drivers/net/intel/i40e/i40e_rxtx.h   |  1 +
 drivers/net/intel/ice/ice_rxtx.c     | 60 +---------------------------
 drivers/net/intel/ice/ice_rxtx.h     |  1 +
 5 files changed, 54 insertions(+), 110 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index 573f5136a9..9b19d56ea6 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -59,6 +59,56 @@ ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
 	return 0;
 }
 
+/* Common checksum enable function for Intel drivers (ice, i40e, etc.) */
+static inline void
+ci_txd_enable_checksum(uint64_t ol_flags,
+		       uint32_t *td_cmd,
+		       uint32_t *td_offset,
+		       union ci_tx_offload tx_offload)
+{
+	/* Enable L3 checksum offloads */
+	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
+		*td_offset |= (tx_offload.l3_len >> 2) << CI_TX_DESC_LEN_IPLEN_S;
+	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
+		*td_offset |= (tx_offload.l3_len >> 2) << CI_TX_DESC_LEN_IPLEN_S;
+	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
+		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
+		*td_offset |= (tx_offload.l3_len >> 2) << CI_TX_DESC_LEN_IPLEN_S;
+	}
+
+	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
+		*td_offset |= (tx_offload.l4_len >> 2) << CI_TX_DESC_LEN_L4_LEN_S;
+		return;
+	}
+
+	if (ol_flags & RTE_MBUF_F_TX_UDP_SEG) {
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
+		*td_offset |= (tx_offload.l4_len >> 2) << CI_TX_DESC_LEN_L4_LEN_S;
+		return;
+	}
+
+	/* Enable L4 checksum offloads */
+	switch (ol_flags & RTE_MBUF_F_TX_L4_MASK) {
+	case RTE_MBUF_F_TX_TCP_CKSUM:
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
+		*td_offset |= (sizeof(struct rte_tcp_hdr) >> 2) << CI_TX_DESC_LEN_L4_LEN_S;
+		break;
+	case RTE_MBUF_F_TX_SCTP_CKSUM:
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
+		*td_offset |= (sizeof(struct rte_sctp_hdr) >> 2) << CI_TX_DESC_LEN_L4_LEN_S;
+		break;
+	case RTE_MBUF_F_TX_UDP_CKSUM:
+		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
+		*td_offset |= (sizeof(struct rte_udp_hdr) >> 2) << CI_TX_DESC_LEN_L4_LEN_S;
+		break;
+	default:
+		break;
+	}
+}
+
 static inline uint16_t
 ci_div_roundup16(uint16_t x, uint16_t y)
 {
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 12a21407c5..c318b4c84e 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -310,56 +310,6 @@ i40e_parse_tunneling_params(uint64_t ol_flags,
 		*cd_tunneling |= I40E_TXD_CTX_QW0_L4T_CS_MASK;
 }
 
-static inline void
-i40e_txd_enable_checksum(uint64_t ol_flags,
-			uint32_t *td_cmd,
-			uint32_t *td_offset,
-			union ci_tx_offload tx_offload)
-{
-	/* Enable L3 checksum offloads */
-	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
-		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
-		*td_offset |= (tx_offload.l3_len >> 2)
-				<< CI_TX_DESC_LEN_IPLEN_S;
-	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
-		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
-		*td_offset |= (tx_offload.l3_len >> 2)
-				<< CI_TX_DESC_LEN_IPLEN_S;
-	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
-		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
-		*td_offset |= (tx_offload.l3_len >> 2)
-				<< CI_TX_DESC_LEN_IPLEN_S;
-	}
-
-	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
-		*td_offset |= (tx_offload.l4_len >> 2)
-			<< CI_TX_DESC_LEN_L4_LEN_S;
-		return;
-	}
-
-	/* Enable L4 checksum offloads */
-	switch (ol_flags & RTE_MBUF_F_TX_L4_MASK) {
-	case RTE_MBUF_F_TX_TCP_CKSUM:
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
-		*td_offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
-				CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	case RTE_MBUF_F_TX_SCTP_CKSUM:
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
-		*td_offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
-				CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	case RTE_MBUF_F_TX_UDP_CKSUM:
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
-		*td_offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
-				CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	default:
-		break;
-	}
-}
-
 /* Construct the tx flags */
 static inline uint64_t
 i40e_build_ctob(uint32_t td_cmd,
@@ -1167,7 +1117,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		/* Enable checksum offloading */
 		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
-			i40e_txd_enable_checksum(ol_flags, &td_cmd,
+			ci_txd_enable_checksum(ol_flags, &td_cmd,
 						 &td_offset, tx_offload);
 
 		if (nb_ctx) {
diff --git a/drivers/net/intel/i40e/i40e_rxtx.h b/drivers/net/intel/i40e/i40e_rxtx.h
index 307ffa3049..db8525d52d 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.h
+++ b/drivers/net/intel/i40e/i40e_rxtx.h
@@ -100,6 +100,7 @@ enum i40e_header_split_mode {
 		RTE_ETH_TX_OFFLOAD_SCTP_CKSUM |		\
 		RTE_ETH_TX_OFFLOAD_OUTER_IPV4_CKSUM |	\
 		RTE_ETH_TX_OFFLOAD_TCP_TSO |		\
+		RTE_ETH_TX_OFFLOAD_UDP_TSO |		\
 		RTE_ETH_TX_OFFLOAD_VXLAN_TNL_TSO |	\
 		RTE_ETH_TX_OFFLOAD_GRE_TNL_TSO |	\
 		RTE_ETH_TX_OFFLOAD_IPIP_TNL_TSO |	\
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 63bce7bd9e..4792aa9a8b 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -2954,64 +2954,6 @@ ice_parse_tunneling_params(uint64_t ol_flags,
 		*cd_tunneling |= ICE_TXD_CTX_QW0_L4T_CS_M;
 }
 
-static inline void
-ice_txd_enable_checksum(uint64_t ol_flags,
-			uint32_t *td_cmd,
-			uint32_t *td_offset,
-			union ci_tx_offload tx_offload)
-{
-
-	/* Enable L3 checksum offloads */
-	if (ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
-		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
-		*td_offset |= (tx_offload.l3_len >> 2) <<
-			CI_TX_DESC_LEN_IPLEN_S;
-	} else if (ol_flags & RTE_MBUF_F_TX_IPV4) {
-		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV4;
-		*td_offset |= (tx_offload.l3_len >> 2) <<
-			CI_TX_DESC_LEN_IPLEN_S;
-	} else if (ol_flags & RTE_MBUF_F_TX_IPV6) {
-		*td_cmd |= CI_TX_DESC_CMD_IIPT_IPV6;
-		*td_offset |= (tx_offload.l3_len >> 2) <<
-			CI_TX_DESC_LEN_IPLEN_S;
-	}
-
-	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
-		*td_offset |= (tx_offload.l4_len >> 2) <<
-			      CI_TX_DESC_LEN_L4_LEN_S;
-		return;
-	}
-
-	if (ol_flags & RTE_MBUF_F_TX_UDP_SEG) {
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
-		*td_offset |= (tx_offload.l4_len >> 2) <<
-			      CI_TX_DESC_LEN_L4_LEN_S;
-		return;
-	}
-
-	/* Enable L4 checksum offloads */
-	switch (ol_flags & RTE_MBUF_F_TX_L4_MASK) {
-	case RTE_MBUF_F_TX_TCP_CKSUM:
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
-		*td_offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
-			      CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	case RTE_MBUF_F_TX_SCTP_CKSUM:
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
-		*td_offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
-			      CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	case RTE_MBUF_F_TX_UDP_CKSUM:
-		*td_cmd |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
-		*td_offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
-			      CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	default:
-		break;
-	}
-}
-
 /* Construct the tx flags */
 static inline uint64_t
 ice_build_ctob(uint32_t td_cmd,
@@ -3209,7 +3151,7 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 
 		/* Enable checksum offloading */
 		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
-			ice_txd_enable_checksum(ol_flags, &td_cmd,
+			ci_txd_enable_checksum(ol_flags, &td_cmd,
 						&td_offset, tx_offload);
 
 		if (nb_ctx) {
diff --git a/drivers/net/intel/ice/ice_rxtx.h b/drivers/net/intel/ice/ice_rxtx.h
index cd5fa93d1c..7d6480b410 100644
--- a/drivers/net/intel/ice/ice_rxtx.h
+++ b/drivers/net/intel/ice/ice_rxtx.h
@@ -112,6 +112,7 @@
 #define ICE_TX_SCALAR_OFFLOADS (		\
 	RTE_ETH_TX_OFFLOAD_VLAN_INSERT |	\
 	RTE_ETH_TX_OFFLOAD_TCP_TSO |		\
+	RTE_ETH_TX_OFFLOAD_UDP_TSO |		\
 	RTE_ETH_TX_OFFLOAD_MULTI_SEGS |		\
 	RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE |	\
 	RTE_ETH_TX_OFFLOAD_QINQ_INSERT |	\
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v5 12/35] net/intel: create a common scalar Tx function
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (10 preceding siblings ...)
  2026-02-11 18:12   ` [PATCH v5 11/35] net/intel: create common checksum Tx offload function Bruce Richardson
@ 2026-02-11 18:12   ` Bruce Richardson
  2026-02-11 18:12   ` [PATCH v5 13/35] net/i40e: use " Bruce Richardson
                     ` (23 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 18:12 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

Given the similarities between the transmit functions across various
Intel drivers, make a start on consolidating them by moving the ice Tx
function into common, for reuse by other drivers.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 drivers/net/intel/common/tx_scalar.h | 218 +++++++++++++++++++++
 drivers/net/intel/ice/ice_rxtx.c     | 270 ++++++---------------------
 2 files changed, 270 insertions(+), 218 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index 9b19d56ea6..3616600a2d 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -6,6 +6,7 @@
 #define _COMMON_INTEL_TX_SCALAR_H_
 
 #include <stdint.h>
+#include <rte_io.h>
 #include <rte_byteorder.h>
 
 /* depends on common Tx definitions. */
@@ -129,5 +130,222 @@ ci_calc_pkt_desc(const struct rte_mbuf *tx_pkt)
 	return count;
 }
 
+typedef uint16_t (*ci_get_ctx_desc_fn)(uint64_t ol_flags, const struct rte_mbuf *mbuf,
+		const union ci_tx_offload *tx_offload, const struct ci_tx_queue *txq,
+		uint64_t *qw0, uint64_t *qw1);
+
+/* gets current timestamp tail index */
+typedef uint16_t (*get_ts_tail_t)(struct ci_tx_queue *txq);
+/* writes a timestamp descriptor and returns new tail index */
+typedef uint16_t (*write_ts_desc_t)(struct ci_tx_queue *txq, struct rte_mbuf *mbuf,
+		uint16_t tx_id, uint16_t ts_id);
+/* writes a timestamp tail index - doorbell */
+typedef void (*write_ts_tail_t)(struct ci_tx_queue *txq, uint16_t ts_id);
+
+struct ci_timestamp_queue_fns {
+	get_ts_tail_t get_ts_tail;
+	write_ts_desc_t write_ts_desc;
+	write_ts_tail_t write_ts_tail;
+};
+
+static inline uint16_t
+ci_xmit_pkts(struct ci_tx_queue *txq,
+	     struct rte_mbuf **tx_pkts,
+	     uint16_t nb_pkts,
+	     ci_get_ctx_desc_fn get_ctx_desc,
+	     const struct ci_timestamp_queue_fns *ts_fns)
+{
+	volatile struct ci_tx_desc *ci_tx_ring;
+	volatile struct ci_tx_desc *txd;
+	struct ci_tx_entry *sw_ring;
+	struct ci_tx_entry *txe, *txn;
+	struct rte_mbuf *tx_pkt;
+	struct rte_mbuf *m_seg;
+	uint16_t tx_id;
+	uint16_t ts_id = -1;
+	uint16_t nb_tx;
+	uint16_t nb_used;
+	uint16_t nb_ctx;
+	uint32_t td_cmd = 0;
+	uint32_t td_offset = 0;
+	uint32_t td_tag = 0;
+	uint16_t tx_last;
+	uint16_t slen;
+	uint16_t l2_len;
+	uint64_t buf_dma_addr;
+	uint64_t ol_flags;
+	union ci_tx_offload tx_offload = {0};
+
+	sw_ring = txq->sw_ring;
+	ci_tx_ring = txq->ci_tx_ring;
+	tx_id = txq->tx_tail;
+	txe = &sw_ring[tx_id];
+
+	if (ts_fns != NULL)
+		ts_id = ts_fns->get_ts_tail(txq);
+
+	/* Check if the descriptor ring needs to be cleaned. */
+	if (txq->nb_tx_free < txq->tx_free_thresh)
+		(void)ci_tx_xmit_cleanup(txq);
+
+	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
+		uint64_t cd_qw0 = 0, cd_qw1 = 0;
+		tx_pkt = *tx_pkts++;
+
+		ol_flags = tx_pkt->ol_flags;
+		td_cmd = CI_TX_DESC_CMD_ICRC;
+		td_tag = 0;
+		l2_len = ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK ?
+				tx_pkt->outer_l2_len : tx_pkt->l2_len;
+		td_offset = (l2_len >> 1) << CI_TX_DESC_LEN_MACLEN_S;
+
+
+		tx_offload.l2_len = tx_pkt->l2_len;
+		tx_offload.l3_len = tx_pkt->l3_len;
+		tx_offload.outer_l2_len = tx_pkt->outer_l2_len;
+		tx_offload.outer_l3_len = tx_pkt->outer_l3_len;
+		tx_offload.l4_len = tx_pkt->l4_len;
+		tx_offload.tso_segsz = tx_pkt->tso_segsz;
+
+		/* Calculate the number of context descriptors needed. */
+		nb_ctx = get_ctx_desc(ol_flags, tx_pkt, &tx_offload, txq, &cd_qw0, &cd_qw1);
+
+		/* The number of descriptors that must be allocated for
+		 * a packet equals to the number of the segments of that
+		 * packet plus the number of context descriptor if needed.
+		 * Recalculate the needed tx descs when TSO enabled in case
+		 * the mbuf data size exceeds max data size that hw allows
+		 * per tx desc.
+		 */
+		if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
+			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
+		else
+			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
+		tx_last = (uint16_t)(tx_id + nb_used - 1);
+
+		/* Circular ring */
+		if (tx_last >= txq->nb_tx_desc)
+			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
+
+		if (nb_used > txq->nb_tx_free) {
+			if (ci_tx_xmit_cleanup(txq) != 0) {
+				if (nb_tx == 0)
+					return 0;
+				goto end_of_tx;
+			}
+			if (unlikely(nb_used > txq->tx_rs_thresh)) {
+				while (nb_used > txq->nb_tx_free) {
+					if (ci_tx_xmit_cleanup(txq) != 0) {
+						if (nb_tx == 0)
+							return 0;
+						goto end_of_tx;
+					}
+				}
+			}
+		}
+
+		/* Descriptor based VLAN insertion */
+		if (ol_flags & (RTE_MBUF_F_TX_VLAN | RTE_MBUF_F_TX_QINQ)) {
+			td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
+			td_tag = tx_pkt->vlan_tci;
+		}
+
+		/* Enable checksum offloading */
+		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
+			ci_txd_enable_checksum(ol_flags, &td_cmd,
+						&td_offset, tx_offload);
+
+		if (nb_ctx) {
+			/* Setup TX context descriptor if required */
+			uint64_t *ctx_txd = RTE_CAST_PTR(uint64_t *, &ci_tx_ring[tx_id]);
+
+			txn = &sw_ring[txe->next_id];
+			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
+			if (txe->mbuf) {
+				rte_pktmbuf_free_seg(txe->mbuf);
+				txe->mbuf = NULL;
+			}
+
+			ctx_txd[0] = cd_qw0;
+			ctx_txd[1] = cd_qw1;
+
+			txe->last_id = tx_last;
+			tx_id = txe->next_id;
+			txe = txn;
+		}
+		m_seg = tx_pkt;
+
+		do {
+			txd = &ci_tx_ring[tx_id];
+			txn = &sw_ring[txe->next_id];
+
+			if (txe->mbuf)
+				rte_pktmbuf_free_seg(txe->mbuf);
+			txe->mbuf = m_seg;
+
+			/* Setup TX Descriptor */
+			slen = m_seg->data_len;
+			buf_dma_addr = rte_mbuf_data_iova(m_seg);
+
+			while ((ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) &&
+					unlikely(slen > CI_MAX_DATA_PER_TXD)) {
+				txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
+				txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+					((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
+					((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+					((uint64_t)CI_MAX_DATA_PER_TXD << CI_TXD_QW1_TX_BUF_SZ_S) |
+					((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
+
+				buf_dma_addr += CI_MAX_DATA_PER_TXD;
+				slen -= CI_MAX_DATA_PER_TXD;
+
+				txe->last_id = tx_last;
+				tx_id = txe->next_id;
+				txe = txn;
+				txd = &ci_tx_ring[tx_id];
+				txn = &sw_ring[txe->next_id];
+			}
+
+			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
+			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
+				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+				((uint64_t)slen << CI_TXD_QW1_TX_BUF_SZ_S) |
+				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
+
+			txe->last_id = tx_last;
+			tx_id = txe->next_id;
+			txe = txn;
+			m_seg = m_seg->next;
+		} while (m_seg);
+
+		/* fill the last descriptor with End of Packet (EOP) bit */
+		td_cmd |= CI_TX_DESC_CMD_EOP;
+		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
+		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
+
+		/* set RS bit on the last descriptor of one packet */
+		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
+			td_cmd |= CI_TX_DESC_CMD_RS;
+
+			/* Update txq RS bit counters */
+			txq->nb_tx_used = 0;
+		}
+		txd->cmd_type_offset_bsz |=
+				rte_cpu_to_le_64(((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S);
+
+		if (ts_fns != NULL)
+			ts_id = ts_fns->write_ts_desc(txq, tx_pkt, tx_id, ts_id);
+	}
+end_of_tx:
+	/* update Tail register */
+	if (ts_fns != NULL)
+		ts_fns->write_ts_tail(txq, ts_id);
+	else
+		rte_write32_wc(tx_id, txq->qtx_tail);
+	txq->tx_tail = tx_id;
+
+	return nb_tx;
+}
 
 #endif /* _COMMON_INTEL_TX_SCALAR_H_ */
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 4792aa9a8b..561a6617a6 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3050,230 +3050,64 @@ get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
 	return 1;
 }
 
-uint16_t
-ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+static uint16_t
+ice_get_ts_tail(struct ci_tx_queue *txq)
 {
-	struct ci_tx_queue *txq;
-	volatile struct ci_tx_desc *ci_tx_ring;
-	volatile struct ci_tx_desc *txd;
-	struct ci_tx_entry *sw_ring;
-	struct ci_tx_entry *txe, *txn;
-	struct rte_mbuf *tx_pkt;
-	struct rte_mbuf *m_seg;
-	uint16_t tx_id;
-	uint16_t ts_id = -1;
-	uint16_t nb_tx;
-	uint16_t nb_used;
-	uint16_t nb_ctx;
-	uint32_t td_cmd = 0;
-	uint32_t td_offset = 0;
-	uint32_t td_tag = 0;
-	uint16_t tx_last;
-	uint16_t slen;
-	uint16_t l2_len;
-	uint64_t buf_dma_addr;
-	uint64_t ol_flags;
-	union ci_tx_offload tx_offload = {0};
-
-	txq = tx_queue;
-	sw_ring = txq->sw_ring;
-	ci_tx_ring = txq->ci_tx_ring;
-	tx_id = txq->tx_tail;
-	txe = &sw_ring[tx_id];
-
-	if (txq->tsq != NULL && txq->tsq->ts_flag > 0)
-		ts_id = txq->tsq->ts_tail;
-
-	/* Check if the descriptor ring needs to be cleaned. */
-	if (txq->nb_tx_free < txq->tx_free_thresh)
-		(void)ci_tx_xmit_cleanup(txq);
-
-	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
-		uint64_t cd_qw0, cd_qw1;
-		tx_pkt = *tx_pkts++;
-
-		ol_flags = tx_pkt->ol_flags;
-		td_cmd = 0;
-		td_tag = 0;
-		l2_len = ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK ?
-				tx_pkt->outer_l2_len : tx_pkt->l2_len;
-		td_offset = (l2_len >> 1) << CI_TX_DESC_LEN_MACLEN_S;
-
-		tx_offload.l2_len = tx_pkt->l2_len;
-		tx_offload.l3_len = tx_pkt->l3_len;
-		tx_offload.outer_l2_len = tx_pkt->outer_l2_len;
-		tx_offload.outer_l3_len = tx_pkt->outer_l3_len;
-		tx_offload.l4_len = tx_pkt->l4_len;
-		tx_offload.tso_segsz = tx_pkt->tso_segsz;
-
-		/* Calculate the number of context descriptors needed. */
-		nb_ctx = get_context_desc(ol_flags, tx_pkt, &tx_offload, txq, &cd_qw0, &cd_qw1);
-
-		/* The number of descriptors that must be allocated for
-		 * a packet equals to the number of the segments of that
-		 * packet plus the number of context descriptor if needed.
-		 * Recalculate the needed tx descs when TSO enabled in case
-		 * the mbuf data size exceeds max data size that hw allows
-		 * per tx desc.
-		 */
-		if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
-			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
-		else
-			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
-		tx_last = (uint16_t)(tx_id + nb_used - 1);
-
-		/* Circular ring */
-		if (tx_last >= txq->nb_tx_desc)
-			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
-
-		if (nb_used > txq->nb_tx_free) {
-			if (ci_tx_xmit_cleanup(txq) != 0) {
-				if (nb_tx == 0)
-					return 0;
-				goto end_of_tx;
-			}
-			if (unlikely(nb_used > txq->tx_rs_thresh)) {
-				while (nb_used > txq->nb_tx_free) {
-					if (ci_tx_xmit_cleanup(txq) != 0) {
-						if (nb_tx == 0)
-							return 0;
-						goto end_of_tx;
-					}
-				}
-			}
-		}
-
-		/* Descriptor based VLAN insertion */
-		if (ol_flags & (RTE_MBUF_F_TX_VLAN | RTE_MBUF_F_TX_QINQ)) {
-			td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
-			td_tag = tx_pkt->vlan_tci;
-		}
-
-		/* Enable checksum offloading */
-		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
-			ci_txd_enable_checksum(ol_flags, &td_cmd,
-						&td_offset, tx_offload);
-
-		if (nb_ctx) {
-			/* Setup TX context descriptor if required */
-			uint64_t *ctx_txd = RTE_CAST_PTR(uint64_t *, &ci_tx_ring[tx_id]);
-
-			txn = &sw_ring[txe->next_id];
-			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
-			if (txe->mbuf) {
-				rte_pktmbuf_free_seg(txe->mbuf);
-				txe->mbuf = NULL;
-			}
-
-			ctx_txd[0] = cd_qw0;
-			ctx_txd[1] = cd_qw1;
-
-			txe->last_id = tx_last;
-			tx_id = txe->next_id;
-			txe = txn;
-		}
-		m_seg = tx_pkt;
-
-		do {
-			txd = &ci_tx_ring[tx_id];
-			txn = &sw_ring[txe->next_id];
-
-			if (txe->mbuf)
-				rte_pktmbuf_free_seg(txe->mbuf);
-			txe->mbuf = m_seg;
-
-			/* Setup TX Descriptor */
-			slen = m_seg->data_len;
-			buf_dma_addr = rte_mbuf_data_iova(m_seg);
-
-			while ((ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) &&
-					unlikely(slen > CI_MAX_DATA_PER_TXD)) {
-				txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-				txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
-					((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
-					((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
-					((uint64_t)CI_MAX_DATA_PER_TXD << CI_TXD_QW1_TX_BUF_SZ_S) |
-					((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
-
-				buf_dma_addr += CI_MAX_DATA_PER_TXD;
-				slen -= CI_MAX_DATA_PER_TXD;
-
-				txe->last_id = tx_last;
-				tx_id = txe->next_id;
-				txe = txn;
-				txd = &ci_tx_ring[tx_id];
-				txn = &sw_ring[txe->next_id];
-			}
-
-			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
-				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
-				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
-				((uint64_t)slen << CI_TXD_QW1_TX_BUF_SZ_S) |
-				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
-
-			txe->last_id = tx_last;
-			tx_id = txe->next_id;
-			txe = txn;
-			m_seg = m_seg->next;
-		} while (m_seg);
+	return txq->tsq->ts_tail;
+}
 
-		/* fill the last descriptor with End of Packet (EOP) bit */
-		td_cmd |= CI_TX_DESC_CMD_EOP;
-		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
-		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
+static uint16_t
+ice_write_ts_desc(struct ci_tx_queue *txq,
+		  struct rte_mbuf *tx_pkt,
+		  uint16_t tx_id,
+		  uint16_t ts_id)
+{
+	uint64_t txtime = *RTE_MBUF_DYNFIELD(tx_pkt, txq->tsq->ts_offset, uint64_t *);
+	uint32_t tstamp = (uint32_t)(txtime % NS_PER_S) >> ICE_TXTIME_CTX_RESOLUTION_128NS;
+	const uint32_t desc_tx_id = (tx_id == 0) ? txq->nb_tx_desc : tx_id;
+	__le32 ts_desc = rte_cpu_to_le_32(FIELD_PREP(ICE_TXTIME_TX_DESC_IDX_M, desc_tx_id) |
+			FIELD_PREP(ICE_TXTIME_STAMP_M, tstamp));
+
+	txq->tsq->ice_ts_ring[ts_id].tx_desc_idx_tstamp = ts_desc;
+	ts_id++;
+
+	/* To prevent an MDD, when wrapping the tstamp
+	 * ring create additional TS descriptors equal
+	 * to the number of the fetch TS descriptors
+	 * value. HW will merge the TS descriptors with
+	 * the same timestamp value into a single
+	 * descriptor.
+	 */
+	if (ts_id == txq->tsq->nb_ts_desc) {
+		uint16_t fetch = txq->tsq->nb_ts_desc - txq->nb_tx_desc;
+		ts_id = 0;
+		for (; ts_id < fetch; ts_id++)
+			txq->tsq->ice_ts_ring[ts_id].tx_desc_idx_tstamp = ts_desc;
+	}
+	return ts_id;
+}
 
-		/* set RS bit on the last descriptor of one packet */
-		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
-			PMD_TX_LOG(DEBUG,
-				   "Setting RS bit on TXD id="
-				   "%4u (port=%d queue=%d)",
-				   tx_last, txq->port_id, txq->queue_id);
+static void
+ice_write_ts_tail(struct ci_tx_queue *txq, uint16_t ts_tail)
+{
+	ICE_PCI_REG_WRITE(txq->qtx_tail, ts_tail);
+	txq->tsq->ts_tail = ts_tail;
+}
 
-			td_cmd |= CI_TX_DESC_CMD_RS;
+uint16_t
+ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	const struct ci_timestamp_queue_fns ts_fns = {
+		.get_ts_tail = ice_get_ts_tail,
+		.write_ts_desc = ice_write_ts_desc,
+		.write_ts_tail = ice_write_ts_tail,
+	};
+	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
 
-			/* Update txq RS bit counters */
-			txq->nb_tx_used = 0;
-		}
-		txd->cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S);
-
-		if (txq->tsq != NULL && txq->tsq->ts_flag > 0) {
-			uint64_t txtime = *RTE_MBUF_DYNFIELD(tx_pkt,
-					txq->tsq->ts_offset, uint64_t *);
-			uint32_t tstamp = (uint32_t)(txtime % NS_PER_S) >>
-						ICE_TXTIME_CTX_RESOLUTION_128NS;
-			const uint32_t desc_tx_id = (tx_id == 0) ? txq->nb_tx_desc : tx_id;
-			__le32 ts_desc = rte_cpu_to_le_32(FIELD_PREP(ICE_TXTIME_TX_DESC_IDX_M,
-					desc_tx_id) | FIELD_PREP(ICE_TXTIME_STAMP_M, tstamp));
-			txq->tsq->ice_ts_ring[ts_id].tx_desc_idx_tstamp = ts_desc;
-			ts_id++;
-			/* To prevent an MDD, when wrapping the tstamp
-			 * ring create additional TS descriptors equal
-			 * to the number of the fetch TS descriptors
-			 * value. HW will merge the TS descriptors with
-			 * the same timestamp value into a single
-			 * descriptor.
-			 */
-			if (ts_id == txq->tsq->nb_ts_desc) {
-				uint16_t fetch = txq->tsq->nb_ts_desc - txq->nb_tx_desc;
-				ts_id = 0;
-				for (; ts_id < fetch; ts_id++)
-					txq->tsq->ice_ts_ring[ts_id].tx_desc_idx_tstamp = ts_desc;
-			}
-		}
-	}
-end_of_tx:
-	/* update Tail register */
-	if (txq->tsq != NULL && txq->tsq->ts_flag > 0) {
-		ICE_PCI_REG_WRITE(txq->qtx_tail, ts_id);
-		txq->tsq->ts_tail = ts_id;
-	} else {
-		ICE_PCI_REG_WRITE(txq->qtx_tail, tx_id);
-	}
-	txq->tx_tail = tx_id;
+	if (txq->tsq != NULL && txq->tsq->ts_flag > 0)
+		return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, &ts_fns);
 
-	return nb_tx;
+	return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, NULL);
 }
 
 static __rte_always_inline int
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v5 13/35] net/i40e: use common scalar Tx function
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (11 preceding siblings ...)
  2026-02-11 18:12   ` [PATCH v5 12/35] net/intel: create a common scalar Tx function Bruce Richardson
@ 2026-02-11 18:12   ` Bruce Richardson
  2026-02-11 18:12   ` [PATCH v5 14/35] net/intel: add IPsec hooks to common " Bruce Richardson
                     ` (22 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 18:12 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

Following earlier rework, the scalar transmit function for i40e can use
the common function previously moved over from ice driver. This saves
hundreds of duplicated lines of code.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 drivers/net/intel/i40e/i40e_rxtx.c | 208 +----------------------------
 1 file changed, 2 insertions(+), 206 deletions(-)

diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index c318b4c84e..41310b4c6c 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -1018,212 +1018,8 @@ get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
 uint16_t
 i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
-	struct ci_tx_queue *txq;
-	struct ci_tx_entry *sw_ring;
-	struct ci_tx_entry *txe, *txn;
-	volatile struct ci_tx_desc *txd;
-	volatile struct ci_tx_desc *txr;
-	struct rte_mbuf *tx_pkt;
-	struct rte_mbuf *m_seg;
-	uint16_t tx_id;
-	uint16_t nb_tx;
-	uint32_t td_cmd;
-	uint32_t td_offset;
-	uint32_t td_tag;
-	uint64_t ol_flags;
-	uint16_t nb_used;
-	uint16_t nb_ctx;
-	uint16_t tx_last;
-	uint16_t slen;
-	uint16_t l2_len;
-	uint64_t buf_dma_addr;
-	union ci_tx_offload tx_offload = {0};
-
-	txq = tx_queue;
-	sw_ring = txq->sw_ring;
-	txr = txq->ci_tx_ring;
-	tx_id = txq->tx_tail;
-	txe = &sw_ring[tx_id];
-
-	/* Check if the descriptor ring needs to be cleaned. */
-	if (txq->nb_tx_free < txq->tx_free_thresh)
-		(void)ci_tx_xmit_cleanup(txq);
-
-	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
-		tx_pkt = *tx_pkts++;
-		RTE_MBUF_PREFETCH_TO_FREE(txe->mbuf);
-
-		ol_flags = tx_pkt->ol_flags;
-		td_cmd = 0;
-		td_tag = 0;
-		l2_len = ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK ?
-				tx_pkt->outer_l2_len : tx_pkt->l2_len;
-		td_offset = (l2_len >> 1) << CI_TX_DESC_LEN_MACLEN_S;
-		tx_offload.l2_len = tx_pkt->l2_len;
-		tx_offload.l3_len = tx_pkt->l3_len;
-		tx_offload.outer_l2_len = tx_pkt->outer_l2_len;
-		tx_offload.outer_l3_len = tx_pkt->outer_l3_len;
-		tx_offload.l4_len = tx_pkt->l4_len;
-		tx_offload.tso_segsz = tx_pkt->tso_segsz;
-
-		/* Calculate the number of context descriptors needed. */
-		uint64_t cd_qw0 = 0, cd_qw1 = 0;
-		nb_ctx = get_context_desc(ol_flags, tx_pkt, &tx_offload, txq,
-				&cd_qw0, &cd_qw1);
-
-		/**
-		 * The number of descriptors that must be allocated for
-		 * a packet equals to the number of the segments of that
-		 * packet plus 1 context descriptor if needed.
-		 * Recalculate the needed tx descs when TSO enabled in case
-		 * the mbuf data size exceeds max data size that hw allows
-		 * per tx desc.
-		 */
-		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
-		else
-			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
-		tx_last = (uint16_t)(tx_id + nb_used - 1);
-
-		/* Circular ring */
-		if (tx_last >= txq->nb_tx_desc)
-			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
-
-		if (nb_used > txq->nb_tx_free) {
-			if (ci_tx_xmit_cleanup(txq) != 0) {
-				if (nb_tx == 0)
-					return 0;
-				goto end_of_tx;
-			}
-			if (unlikely(nb_used > txq->tx_rs_thresh)) {
-				while (nb_used > txq->nb_tx_free) {
-					if (ci_tx_xmit_cleanup(txq) != 0) {
-						if (nb_tx == 0)
-							return 0;
-						goto end_of_tx;
-					}
-				}
-			}
-		}
-
-		/* Descriptor based VLAN insertion */
-		if (ol_flags & (RTE_MBUF_F_TX_VLAN | RTE_MBUF_F_TX_QINQ)) {
-			td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
-			td_tag = tx_pkt->vlan_tci;
-		}
-
-		/* Always enable CRC offload insertion */
-		td_cmd |= CI_TX_DESC_CMD_ICRC;
-
-		/* Enable checksum offloading */
-		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
-			ci_txd_enable_checksum(ol_flags, &td_cmd,
-						 &td_offset, tx_offload);
-
-		if (nb_ctx) {
-			/* Setup TX context descriptor if required */
-			uint64_t *desc = RTE_CAST_PTR(uint64_t *, &txr[tx_id]);
-
-			txn = &sw_ring[txe->next_id];
-			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
-			if (txe->mbuf != NULL) {
-				rte_pktmbuf_free_seg(txe->mbuf);
-				txe->mbuf = NULL;
-			}
-
-			desc[0] = cd_qw0;
-			desc[1] = cd_qw1;
-
-			PMD_TX_LOG(DEBUG, "mbuf: %p, TCD[%u]: "
-				"qw0: %#"PRIx64"; "
-				"qw1: %#"PRIx64";",
-				tx_pkt, tx_id, cd_qw0, cd_qw1);
-
-			txe->last_id = tx_last;
-			tx_id = txe->next_id;
-			txe = txn;
-		}
-
-		m_seg = tx_pkt;
-		do {
-			txd = &txr[tx_id];
-			txn = &sw_ring[txe->next_id];
-
-			if (txe->mbuf)
-				rte_pktmbuf_free_seg(txe->mbuf);
-			txe->mbuf = m_seg;
-
-			/* Setup TX Descriptor */
-			slen = m_seg->data_len;
-			buf_dma_addr = rte_mbuf_data_iova(m_seg);
-
-			while ((ol_flags & RTE_MBUF_F_TX_TCP_SEG) &&
-				unlikely(slen > CI_MAX_DATA_PER_TXD)) {
-				txd->buffer_addr =
-					rte_cpu_to_le_64(buf_dma_addr);
-				txd->cmd_type_offset_bsz =
-					i40e_build_ctob(td_cmd,
-					td_offset, CI_MAX_DATA_PER_TXD,
-					td_tag);
-
-				buf_dma_addr += CI_MAX_DATA_PER_TXD;
-				slen -= CI_MAX_DATA_PER_TXD;
-
-				txe->last_id = tx_last;
-				tx_id = txe->next_id;
-				txe = txn;
-				txd = &txr[tx_id];
-				txn = &sw_ring[txe->next_id];
-			}
-			PMD_TX_LOG(DEBUG, "mbuf: %p, TDD[%u]: "
-				"buf_dma_addr: %#"PRIx64"; "
-				"td_cmd: %#x; "
-				"td_offset: %#x; "
-				"td_len: %u; "
-				"td_tag: %#x;",
-				tx_pkt, tx_id, buf_dma_addr,
-				td_cmd, td_offset, slen, td_tag);
-
-			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-			txd->cmd_type_offset_bsz = i40e_build_ctob(td_cmd,
-						td_offset, slen, td_tag);
-			txe->last_id = tx_last;
-			tx_id = txe->next_id;
-			txe = txn;
-			m_seg = m_seg->next;
-		} while (m_seg != NULL);
-
-		/* The last packet data descriptor needs End Of Packet (EOP) */
-		td_cmd |= CI_TX_DESC_CMD_EOP;
-		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
-		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
-
-		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
-			PMD_TX_LOG(DEBUG,
-				   "Setting RS bit on TXD id="
-				   "%4u (port=%d queue=%d)",
-				   tx_last, txq->port_id, txq->queue_id);
-
-			td_cmd |= CI_TX_DESC_CMD_RS;
-
-			/* Update txq RS bit counters */
-			txq->nb_tx_used = 0;
-		}
-
-		txd->cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S);
-	}
-
-end_of_tx:
-	PMD_TX_LOG(DEBUG, "port_id=%u queue_id=%u tx_tail=%u nb_tx=%u",
-		   (unsigned) txq->port_id, (unsigned) txq->queue_id,
-		   (unsigned) tx_id, (unsigned) nb_tx);
-
-	rte_io_wmb();
-	I40E_PCI_REG_WC_WRITE_RELAXED(txq->qtx_tail, tx_id);
-	txq->tx_tail = tx_id;
-
-	return nb_tx;
+	/* i40e does not support timestamp queues, so pass NULL for ts_fns */
+	return ci_xmit_pkts(tx_queue, tx_pkts, nb_pkts, get_context_desc, NULL);
 }
 
 static __rte_always_inline int
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v5 14/35] net/intel: add IPsec hooks to common Tx function
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (12 preceding siblings ...)
  2026-02-11 18:12   ` [PATCH v5 13/35] net/i40e: use " Bruce Richardson
@ 2026-02-11 18:12   ` Bruce Richardson
  2026-02-11 18:12   ` [PATCH v5 15/35] net/intel: support configurable VLAN tag insertion on Tx Bruce Richardson
                     ` (21 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 18:12 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

The iavf driver has IPsec offload support on Tx, so add hooks to the
common Tx function to support that. Do so in a way that has zero
performance impact for drivers which do not have IPSec support, by
passing in compile-time NULL constants for the function pointers, which
can be optimized away by the compiler.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 drivers/net/intel/common/tx_scalar.h | 60 ++++++++++++++++++++++++++--
 drivers/net/intel/i40e/i40e_rxtx.c   |  4 +-
 drivers/net/intel/ice/ice_rxtx.c     |  4 +-
 3 files changed, 60 insertions(+), 8 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index 3616600a2d..cd0abe4179 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -134,6 +134,24 @@ typedef uint16_t (*ci_get_ctx_desc_fn)(uint64_t ol_flags, const struct rte_mbuf
 		const union ci_tx_offload *tx_offload, const struct ci_tx_queue *txq,
 		uint64_t *qw0, uint64_t *qw1);
 
+/* gets IPsec descriptor information and returns number of descriptors needed (0 or 1) */
+typedef uint16_t (*get_ipsec_desc_t)(const struct rte_mbuf *mbuf,
+		const struct ci_tx_queue *txq,
+		void **ipsec_metadata,
+		uint64_t *qw0,
+		uint64_t *qw1);
+/* calculates segment length for IPsec + TSO combinations */
+typedef uint16_t (*calc_ipsec_segment_len_t)(const struct rte_mbuf *mb_seg,
+		uint64_t ol_flags,
+		const void *ipsec_metadata,
+		uint16_t tlen);
+
+/** IPsec descriptor operations for drivers that support inline IPsec crypto. */
+struct ci_ipsec_ops {
+	get_ipsec_desc_t get_ipsec_desc;
+	calc_ipsec_segment_len_t calc_segment_len;
+};
+
 /* gets current timestamp tail index */
 typedef uint16_t (*get_ts_tail_t)(struct ci_tx_queue *txq);
 /* writes a timestamp descriptor and returns new tail index */
@@ -153,6 +171,7 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
 	     uint16_t nb_pkts,
 	     ci_get_ctx_desc_fn get_ctx_desc,
+	     const struct ci_ipsec_ops *ipsec_ops,
 	     const struct ci_timestamp_queue_fns *ts_fns)
 {
 	volatile struct ci_tx_desc *ci_tx_ring;
@@ -189,6 +208,9 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 		(void)ci_tx_xmit_cleanup(txq);
 
 	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
+		void *ipsec_md = NULL;
+		uint16_t nb_ipsec = 0;
+		uint64_t ipsec_qw0 = 0, ipsec_qw1 = 0;
 		uint64_t cd_qw0 = 0, cd_qw1 = 0;
 		tx_pkt = *tx_pkts++;
 
@@ -210,17 +232,22 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 		/* Calculate the number of context descriptors needed. */
 		nb_ctx = get_ctx_desc(ol_flags, tx_pkt, &tx_offload, txq, &cd_qw0, &cd_qw1);
 
+		/* Get IPsec descriptor information if IPsec ops provided */
+		if (ipsec_ops != NULL)
+			nb_ipsec = ipsec_ops->get_ipsec_desc(tx_pkt, txq, &ipsec_md,
+					&ipsec_qw0, &ipsec_qw1);
+
 		/* The number of descriptors that must be allocated for
 		 * a packet equals to the number of the segments of that
-		 * packet plus the number of context descriptor if needed.
+		 * packet plus the number of context and IPsec descriptors if needed.
 		 * Recalculate the needed tx descs when TSO enabled in case
 		 * the mbuf data size exceeds max data size that hw allows
 		 * per tx desc.
 		 */
 		if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
-			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
+			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx + nb_ipsec);
 		else
-			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
+			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx + nb_ipsec);
 		tx_last = (uint16_t)(tx_id + nb_used - 1);
 
 		/* Circular ring */
@@ -273,6 +300,26 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			tx_id = txe->next_id;
 			txe = txn;
 		}
+
+		if (ipsec_ops != NULL && nb_ipsec > 0) {
+			/* Setup TX IPsec descriptor if required */
+			uint64_t *ipsec_txd = RTE_CAST_PTR(uint64_t *, &ci_tx_ring[tx_id]);
+
+			txn = &sw_ring[txe->next_id];
+			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
+			if (txe->mbuf) {
+				rte_pktmbuf_free_seg(txe->mbuf);
+				txe->mbuf = NULL;
+			}
+
+			ipsec_txd[0] = ipsec_qw0;
+			ipsec_txd[1] = ipsec_qw1;
+
+			txe->last_id = tx_last;
+			tx_id = txe->next_id;
+			txe = txn;
+		}
+
 		m_seg = tx_pkt;
 
 		do {
@@ -284,7 +331,12 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			txe->mbuf = m_seg;
 
 			/* Setup TX Descriptor */
-			slen = m_seg->data_len;
+			/* Calculate segment length, using IPsec callback if provided */
+			if (ipsec_ops != NULL)
+				slen = ipsec_ops->calc_segment_len(m_seg, ol_flags, ipsec_md, 0);
+			else
+				slen = m_seg->data_len;
+
 			buf_dma_addr = rte_mbuf_data_iova(m_seg);
 
 			while ((ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) &&
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 41310b4c6c..4e362e737e 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -1018,8 +1018,8 @@ get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
 uint16_t
 i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
-	/* i40e does not support timestamp queues, so pass NULL for ts_fns */
-	return ci_xmit_pkts(tx_queue, tx_pkts, nb_pkts, get_context_desc, NULL);
+	/* i40e does not support IPsec or timestamp queues, so pass NULL for both */
+	return ci_xmit_pkts(tx_queue, tx_pkts, nb_pkts, get_context_desc, NULL, NULL);
 }
 
 static __rte_always_inline int
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 561a6617a6..9643dd3817 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3105,9 +3105,9 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
 
 	if (txq->tsq != NULL && txq->tsq->ts_flag > 0)
-		return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, &ts_fns);
+		return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, NULL, &ts_fns);
 
-	return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, NULL);
+	return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, NULL, NULL);
 }
 
 static __rte_always_inline int
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v5 15/35] net/intel: support configurable VLAN tag insertion on Tx
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (13 preceding siblings ...)
  2026-02-11 18:12   ` [PATCH v5 14/35] net/intel: add IPsec hooks to common " Bruce Richardson
@ 2026-02-11 18:12   ` Bruce Richardson
  2026-02-12 12:20     ` Burakov, Anatoly
  2026-02-11 18:12   ` [PATCH v5 16/35] net/iavf: use common scalar Tx function Bruce Richardson
                     ` (20 subsequent siblings)
  35 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 18:12 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

Make the VLAN tag insertion logic configurable in the common code, as to
where inner/outer tags get placed.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx.h        | 17 +++++++++++++++++
 drivers/net/intel/common/tx_scalar.h |  9 +++++++--
 drivers/net/intel/i40e/i40e_rxtx.c   |  6 +++---
 drivers/net/intel/ice/ice_rxtx.c     |  5 +++--
 4 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 23deabc5d1..5da6c7c15d 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -45,6 +45,23 @@
 #define CI_TX_CTX_DESC_TSYN             0x02
 #define CI_TX_CTX_DESC_IL2TAG2          0x04
 
+/**
+ * L2TAG1 Field Source Selection
+ * Specifies which mbuf VLAN field to use for the L2TAG1 field in data descriptors.
+ * Context descriptor VLAN handling (L2TAG2) is managed by driver-specific callbacks.
+ */
+enum ci_tx_l2tag1_field {
+	/** For VLAN (not QinQ), use L2Tag1 field in data desc */
+	CI_VLAN_IN_L2TAG1,
+
+	/** For VLAN (not QinQ), use L2Tag2 field in ctx desc.
+	 * NOTE: When set, drivers must set the VLAN tag in the context
+	 * descriptor callback function, rather than relying on the
+	 * common Tx code to insert it.
+	 */
+	CI_VLAN_IN_L2TAG2,
+};
+
 /* Common TX Descriptor Length Field Shifts */
 #define CI_TX_DESC_LEN_MACLEN_S         0  /* 7 BITS */
 #define CI_TX_DESC_LEN_IPLEN_S          7  /* 7 BITS */
diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index cd0abe4179..db002aad21 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -170,6 +170,7 @@ static inline uint16_t
 ci_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
 	     uint16_t nb_pkts,
+	     enum ci_tx_l2tag1_field l2tag1_field,
 	     ci_get_ctx_desc_fn get_ctx_desc,
 	     const struct ci_ipsec_ops *ipsec_ops,
 	     const struct ci_timestamp_queue_fns *ts_fns)
@@ -271,8 +272,12 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			}
 		}
 
-		/* Descriptor based VLAN insertion */
-		if (ol_flags & (RTE_MBUF_F_TX_VLAN | RTE_MBUF_F_TX_QINQ)) {
+		/* Descriptor based VLAN/QinQ insertion */
+		/* for single vlan offload, only insert in data desc with VLAN_IN_L2TAG1 is set
+		 * for qinq offload, we always put inner tag in L2Tag1
+		 */
+		if (((ol_flags & RTE_MBUF_F_TX_VLAN) && l2tag1_field == CI_VLAN_IN_L2TAG1) ||
+				(ol_flags & RTE_MBUF_F_TX_QINQ)) {
 			td_cmd |= CI_TX_DESC_CMD_IL2TAG1;
 			td_tag = tx_pkt->vlan_tci;
 		}
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 4e362e737e..35c1b53c1e 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -1004,8 +1004,7 @@ get_context_desc(uint64_t ol_flags, const struct rte_mbuf *tx_pkt,
 	/* TX context descriptor based double VLAN insert */
 	if (ol_flags & RTE_MBUF_F_TX_QINQ) {
 		cd_l2tag2 = tx_pkt->vlan_tci_outer;
-		cd_type_cmd_tso_mss |=
-				((uint64_t)I40E_TX_CTX_DESC_IL2TAG2 << I40E_TXD_CTX_QW1_CMD_SHIFT);
+		cd_type_cmd_tso_mss |= (I40E_TX_CTX_DESC_IL2TAG2 << I40E_TXD_CTX_QW1_CMD_SHIFT);
 	}
 
 	*qw0 = rte_cpu_to_le_32(cd_tunneling_params) |
@@ -1019,7 +1018,8 @@ uint16_t
 i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
 	/* i40e does not support IPsec or timestamp queues, so pass NULL for both */
-	return ci_xmit_pkts(tx_queue, tx_pkts, nb_pkts, get_context_desc, NULL, NULL);
+	return ci_xmit_pkts(tx_queue, tx_pkts, nb_pkts, CI_VLAN_IN_L2TAG1,
+			get_context_desc, NULL, NULL);
 }
 
 static __rte_always_inline int
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 9643dd3817..111cb5e37f 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3105,9 +3105,10 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
 
 	if (txq->tsq != NULL && txq->tsq->ts_flag > 0)
-		return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, NULL, &ts_fns);
+		return ci_xmit_pkts(txq, tx_pkts, nb_pkts, CI_VLAN_IN_L2TAG1,
+				get_context_desc, NULL, &ts_fns);
 
-	return ci_xmit_pkts(txq, tx_pkts, nb_pkts, get_context_desc, NULL, NULL);
+	return ci_xmit_pkts(txq, tx_pkts, nb_pkts, CI_VLAN_IN_L2TAG1, get_context_desc, NULL, NULL);
 }
 
 static __rte_always_inline int
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v5 16/35] net/iavf: use common scalar Tx function
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (14 preceding siblings ...)
  2026-02-11 18:12   ` [PATCH v5 15/35] net/intel: support configurable VLAN tag insertion on Tx Bruce Richardson
@ 2026-02-11 18:12   ` Bruce Richardson
  2026-02-11 18:12   ` [PATCH v5 17/35] net/i40e: document requirement for QinQ support Bruce Richardson
                     ` (19 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 18:12 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov, Vladimir Medvedkin

Now that the common scalar Tx function has all necessary hooks for the
features supported by the iavf driver, use the common function to avoid
duplicated code.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 drivers/net/intel/common/tx_scalar.h |   3 +-
 drivers/net/intel/iavf/iavf_rxtx.c   | 529 ++++++---------------------
 drivers/net/intel/iavf/iavf_rxtx.h   |   1 +
 3 files changed, 108 insertions(+), 425 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index db002aad21..15dc3dfa59 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -218,7 +218,8 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 		ol_flags = tx_pkt->ol_flags;
 		td_cmd = CI_TX_DESC_CMD_ICRC;
 		td_tag = 0;
-		l2_len = ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK ?
+		l2_len = (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK &&
+					!(ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD)) ?
 				tx_pkt->outer_l2_len : tx_pkt->l2_len;
 		td_offset = (l2_len >> 1) << CI_TX_DESC_LEN_MACLEN_S;
 
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index 3dbcfd5355..67906841da 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -2326,7 +2326,7 @@ iavf_recv_pkts_bulk_alloc(void *rx_queue,
 
 /* Check if the context descriptor is needed for TX offloading */
 static inline uint16_t
-iavf_calc_context_desc(struct rte_mbuf *mb, uint8_t vlan_flag)
+iavf_calc_context_desc(const struct rte_mbuf *mb, uint8_t vlan_flag)
 {
 	uint64_t flags = mb->ol_flags;
 	if (flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG |
@@ -2344,44 +2344,7 @@ iavf_calc_context_desc(struct rte_mbuf *mb, uint8_t vlan_flag)
 }
 
 static inline void
-iavf_fill_ctx_desc_cmd_field(volatile uint64_t *field, struct rte_mbuf *m,
-		uint8_t vlan_flag)
-{
-	uint64_t cmd = 0;
-
-	/* TSO enabled */
-	if (m->ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
-		cmd = CI_TX_CTX_DESC_TSO << IAVF_TXD_CTX_QW1_CMD_SHIFT;
-
-	if ((m->ol_flags & RTE_MBUF_F_TX_VLAN &&
-			vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2) ||
-			m->ol_flags & RTE_MBUF_F_TX_QINQ) {
-		cmd |= CI_TX_CTX_DESC_IL2TAG2
-			<< IAVF_TXD_CTX_QW1_CMD_SHIFT;
-	}
-
-	if (IAVF_CHECK_TX_LLDP(m))
-		cmd |= IAVF_TX_CTX_DESC_SWTCH_UPLINK
-			<< IAVF_TXD_CTX_QW1_CMD_SHIFT;
-
-	*field |= cmd;
-}
-
-static inline void
-iavf_fill_ctx_desc_ipsec_field(volatile uint64_t *field,
-	struct iavf_ipsec_crypto_pkt_metadata *ipsec_md)
-{
-	uint64_t ipsec_field =
-		(uint64_t)ipsec_md->ctx_desc_ipsec_params <<
-			IAVF_TXD_CTX_QW1_IPSEC_PARAMS_CIPHERBLK_SHIFT;
-
-	*field |= ipsec_field;
-}
-
-
-static inline void
-iavf_fill_ctx_desc_tunnelling_field(volatile uint64_t *qw0,
-		const struct rte_mbuf *m)
+iavf_fill_ctx_desc_tunnelling_field(uint64_t *qw0, const struct rte_mbuf *m)
 {
 	uint64_t eip_typ = IAVF_TX_CTX_DESC_EIPT_NONE;
 	uint64_t eip_len = 0;
@@ -2456,7 +2419,7 @@ iavf_fill_ctx_desc_tunnelling_field(volatile uint64_t *qw0,
 
 static inline uint16_t
 iavf_fill_ctx_desc_segmentation_field(volatile uint64_t *field,
-	struct rte_mbuf *m, struct iavf_ipsec_crypto_pkt_metadata *ipsec_md)
+	const struct rte_mbuf *m, struct iavf_ipsec_crypto_pkt_metadata *ipsec_md)
 {
 	uint64_t segmentation_field = 0;
 	uint64_t total_length = 0;
@@ -2495,59 +2458,31 @@ struct iavf_tx_context_desc_qws {
 	__le64 qw1;
 };
 
-static inline void
-iavf_fill_context_desc(volatile struct iavf_tx_context_desc *desc,
-	struct rte_mbuf *m, struct iavf_ipsec_crypto_pkt_metadata *ipsec_md,
-	uint16_t *tlen, uint8_t vlan_flag)
+/* IPsec callback for ci_xmit_pkts - gets IPsec descriptor information */
+static uint16_t
+iavf_get_ipsec_desc(const struct rte_mbuf *mbuf, const struct ci_tx_queue *txq,
+		    void **ipsec_metadata, uint64_t *qw0, uint64_t *qw1)
 {
-	volatile struct iavf_tx_context_desc_qws *desc_qws =
-			(volatile struct iavf_tx_context_desc_qws *)desc;
-	/* fill descriptor type field */
-	desc_qws->qw1 = IAVF_TX_DESC_DTYPE_CONTEXT;
-
-	/* fill command field */
-	iavf_fill_ctx_desc_cmd_field(&desc_qws->qw1, m, vlan_flag);
-
-	/* fill segmentation field */
-	if (m->ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) {
-		/* fill IPsec field */
-		if (m->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD)
-			iavf_fill_ctx_desc_ipsec_field(&desc_qws->qw1,
-				ipsec_md);
-
-		*tlen = iavf_fill_ctx_desc_segmentation_field(&desc_qws->qw1,
-				m, ipsec_md);
-	}
-
-	/* fill tunnelling field */
-	if (m->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)
-		iavf_fill_ctx_desc_tunnelling_field(&desc_qws->qw0, m);
-	else
-		desc_qws->qw0 = 0;
+	struct iavf_ipsec_crypto_pkt_metadata *md;
 
-	desc_qws->qw0 = rte_cpu_to_le_64(desc_qws->qw0);
-	desc_qws->qw1 = rte_cpu_to_le_64(desc_qws->qw1);
+	if (!(mbuf->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD))
+		return 0;
 
-	/* vlan_flag specifies VLAN tag location for VLAN, and outer tag location for QinQ. */
-	if (m->ol_flags & RTE_MBUF_F_TX_QINQ)
-		desc->l2tag2 = vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2 ? m->vlan_tci_outer :
-						m->vlan_tci;
-	else if (m->ol_flags & RTE_MBUF_F_TX_VLAN && vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2)
-		desc->l2tag2 = m->vlan_tci;
-}
+	md = RTE_MBUF_DYNFIELD(mbuf, txq->ipsec_crypto_pkt_md_offset,
+				     struct iavf_ipsec_crypto_pkt_metadata *);
+	if (!md)
+		return 0;
 
+	*ipsec_metadata = md;
 
-static inline void
-iavf_fill_ipsec_desc(volatile struct iavf_tx_ipsec_desc *desc,
-	const struct iavf_ipsec_crypto_pkt_metadata *md, uint16_t *ipsec_len)
-{
-	desc->qw0 = rte_cpu_to_le_64(((uint64_t)md->l4_payload_len <<
+	/* Fill IPsec descriptor using existing logic */
+	*qw0 = rte_cpu_to_le_64(((uint64_t)md->l4_payload_len <<
 		IAVF_IPSEC_TX_DESC_QW0_L4PAYLEN_SHIFT) |
 		((uint64_t)md->esn << IAVF_IPSEC_TX_DESC_QW0_IPSECESN_SHIFT) |
 		((uint64_t)md->esp_trailer_len <<
 				IAVF_IPSEC_TX_DESC_QW0_TRAILERLEN_SHIFT));
 
-	desc->qw1 = rte_cpu_to_le_64(((uint64_t)md->sa_idx <<
+	*qw1 = rte_cpu_to_le_64(((uint64_t)md->sa_idx <<
 		IAVF_IPSEC_TX_DESC_QW1_IPSECSA_SHIFT) |
 		((uint64_t)md->next_proto <<
 				IAVF_IPSEC_TX_DESC_QW1_IPSECNH_SHIFT) |
@@ -2556,143 +2491,103 @@ iavf_fill_ipsec_desc(volatile struct iavf_tx_ipsec_desc *desc,
 		((uint64_t)(md->ol_flags & IAVF_IPSEC_CRYPTO_OL_FLAGS_NATT ?
 				1ULL : 0ULL) <<
 				IAVF_IPSEC_TX_DESC_QW1_UDP_SHIFT) |
-		(uint64_t)IAVF_TX_DESC_DTYPE_IPSEC);
+		((uint64_t)IAVF_TX_DESC_DTYPE_IPSEC <<
+				CI_TXD_QW1_DTYPE_S));
 
-	/**
-	 * TODO: Pre-calculate this in the Session initialization
-	 *
-	 * Calculate IPsec length required in data descriptor func when TSO
-	 * offload is enabled
-	 */
-	*ipsec_len = sizeof(struct rte_esp_hdr) + (md->len_iv >> 2) +
-			(md->ol_flags & IAVF_IPSEC_CRYPTO_OL_FLAGS_NATT ?
-			sizeof(struct rte_udp_hdr) : 0);
+	return 1; /* One IPsec descriptor needed */
 }
 
-static inline void
-iavf_build_data_desc_cmd_offset_fields(volatile uint64_t *qw1,
-		struct rte_mbuf *m, uint8_t vlan_flag)
+/* IPsec callback for ci_xmit_pkts - calculates segment length for IPsec+TSO */
+static uint16_t
+iavf_calc_ipsec_segment_len(const struct rte_mbuf *mb_seg, uint64_t ol_flags,
+			    const void *ipsec_metadata, uint16_t tlen)
 {
-	uint64_t command = 0;
-	uint64_t offset = 0;
-	uint64_t l2tag1 = 0;
-
-	*qw1 = CI_TX_DESC_DTYPE_DATA;
-
-	command = (uint64_t)CI_TX_DESC_CMD_ICRC;
-
-	/* Descriptor based VLAN insertion */
-	if ((vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1) &&
-			m->ol_flags & RTE_MBUF_F_TX_VLAN) {
-		command |= (uint64_t)CI_TX_DESC_CMD_IL2TAG1;
-		l2tag1 |= m->vlan_tci;
+	const struct iavf_ipsec_crypto_pkt_metadata *ipsec_md = ipsec_metadata;
+
+	if ((ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD) &&
+	    (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))) {
+		uint16_t ipseclen = ipsec_md ? (ipsec_md->esp_trailer_len +
+						ipsec_md->len_iv) : 0;
+		uint16_t slen = tlen + mb_seg->l2_len + mb_seg->l3_len +
+				mb_seg->outer_l3_len + ipseclen;
+		if (ol_flags & RTE_MBUF_F_TX_L4_MASK)
+			slen += mb_seg->l4_len;
+		return slen;
 	}
 
-	/* Descriptor based QinQ insertion. vlan_flag specifies outer tag location. */
-	if (m->ol_flags & RTE_MBUF_F_TX_QINQ) {
-		command |= (uint64_t)CI_TX_DESC_CMD_IL2TAG1;
-		l2tag1 = vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1 ? m->vlan_tci_outer :
-									m->vlan_tci;
-	}
+	return mb_seg->data_len;
+}
 
-	if ((m->ol_flags &
-	    (CI_TX_CKSUM_OFFLOAD_MASK | RTE_MBUF_F_TX_SEC_OFFLOAD)) == 0)
-		goto skip_cksum;
+/* Context descriptor callback for ci_xmit_pkts */
+static uint16_t
+iavf_get_context_desc(uint64_t ol_flags, const struct rte_mbuf *mbuf,
+		      const union ci_tx_offload *tx_offload __rte_unused,
+		      const struct ci_tx_queue *txq,
+		      uint64_t *qw0, uint64_t *qw1)
+{
+	uint8_t iavf_vlan_flag;
+	uint16_t cd_l2tag2 = 0;
+	uint64_t cd_type_cmd = IAVF_TX_DESC_DTYPE_CONTEXT;
+	uint64_t cd_tunneling_params = 0;
+	struct iavf_ipsec_crypto_pkt_metadata *ipsec_md = NULL;
 
-	/* Set MACLEN */
-	if (m->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK &&
-			!(m->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD))
-		offset |= (m->outer_l2_len >> 1)
-			<< CI_TX_DESC_LEN_MACLEN_S;
-	else
-		offset |= (m->l2_len >> 1)
-			<< CI_TX_DESC_LEN_MACLEN_S;
+	/* Use IAVF-specific vlan_flag from txq */
+	iavf_vlan_flag = txq->vlan_flag;
 
-	/* Enable L3 checksum offloading inner */
-	if (m->ol_flags & RTE_MBUF_F_TX_IP_CKSUM) {
-		if (m->ol_flags & RTE_MBUF_F_TX_IPV4) {
-			command |= CI_TX_DESC_CMD_IIPT_IPV4_CSUM;
-			offset |= (m->l3_len >> 2) << CI_TX_DESC_LEN_IPLEN_S;
-		}
-	} else if (m->ol_flags & RTE_MBUF_F_TX_IPV4) {
-		command |= CI_TX_DESC_CMD_IIPT_IPV4;
-		offset |= (m->l3_len >> 2) << CI_TX_DESC_LEN_IPLEN_S;
-	} else if (m->ol_flags & RTE_MBUF_F_TX_IPV6) {
-		command |= CI_TX_DESC_CMD_IIPT_IPV6;
-		offset |= (m->l3_len >> 2) << CI_TX_DESC_LEN_IPLEN_S;
+	/* Check if context descriptor is needed using existing IAVF logic */
+	if (!iavf_calc_context_desc(mbuf, iavf_vlan_flag))
+		return 0;
+
+	/* Get IPsec metadata if needed */
+	if (ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD) {
+		ipsec_md = RTE_MBUF_DYNFIELD(mbuf, txq->ipsec_crypto_pkt_md_offset,
+					     struct iavf_ipsec_crypto_pkt_metadata *);
 	}
 
-	if (m->ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) {
-		if (m->ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-			command |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
-		else
-			command |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
-		offset |= (m->l4_len >> 2) <<
-			      CI_TX_DESC_LEN_L4_LEN_S;
+	/* TSO command field */
+	if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) {
+		cd_type_cmd |= (uint64_t)CI_TX_CTX_DESC_TSO << IAVF_TXD_CTX_QW1_CMD_SHIFT;
 
-		*qw1 = rte_cpu_to_le_64((((uint64_t)command <<
-			IAVF_TXD_DATA_QW1_CMD_SHIFT) & IAVF_TXD_DATA_QW1_CMD_MASK) |
-			(((uint64_t)offset << IAVF_TXD_DATA_QW1_OFFSET_SHIFT) &
-			IAVF_TXD_DATA_QW1_OFFSET_MASK) |
-			((uint64_t)l2tag1 << IAVF_TXD_DATA_QW1_L2TAG1_SHIFT));
+		/* IPsec field for TSO */
+		if (ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD && ipsec_md) {
+			uint64_t ipsec_field = (uint64_t)ipsec_md->ctx_desc_ipsec_params <<
+				IAVF_TXD_CTX_QW1_IPSEC_PARAMS_CIPHERBLK_SHIFT;
+			cd_type_cmd |= ipsec_field;
+		}
 
-		return;
+		/* TSO segmentation field */
+		iavf_fill_ctx_desc_segmentation_field(&cd_type_cmd, mbuf, ipsec_md);
 	}
 
-	/* Enable L4 checksum offloads */
-	switch (m->ol_flags & RTE_MBUF_F_TX_L4_MASK) {
-	case RTE_MBUF_F_TX_TCP_CKSUM:
-		command |= CI_TX_DESC_CMD_L4T_EOFT_TCP;
-		offset |= (sizeof(struct rte_tcp_hdr) >> 2) <<
-				CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	case RTE_MBUF_F_TX_SCTP_CKSUM:
-		command |= CI_TX_DESC_CMD_L4T_EOFT_SCTP;
-		offset |= (sizeof(struct rte_sctp_hdr) >> 2) <<
-				CI_TX_DESC_LEN_L4_LEN_S;
-		break;
-	case RTE_MBUF_F_TX_UDP_CKSUM:
-		command |= CI_TX_DESC_CMD_L4T_EOFT_UDP;
-		offset |= (sizeof(struct rte_udp_hdr) >> 2) <<
-				CI_TX_DESC_LEN_L4_LEN_S;
-		break;
+	/* VLAN field for L2TAG2 */
+	if ((ol_flags & RTE_MBUF_F_TX_VLAN &&
+	     iavf_vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2) ||
+	    ol_flags & RTE_MBUF_F_TX_QINQ) {
+		cd_type_cmd |= (uint64_t)CI_TX_CTX_DESC_IL2TAG2 << IAVF_TXD_CTX_QW1_CMD_SHIFT;
 	}
 
-skip_cksum:
-	*qw1 = rte_cpu_to_le_64((((uint64_t)command <<
-		IAVF_TXD_DATA_QW1_CMD_SHIFT) & IAVF_TXD_DATA_QW1_CMD_MASK) |
-		(((uint64_t)offset << IAVF_TXD_DATA_QW1_OFFSET_SHIFT) &
-		IAVF_TXD_DATA_QW1_OFFSET_MASK) |
-		((uint64_t)l2tag1 << IAVF_TXD_DATA_QW1_L2TAG1_SHIFT));
-}
-
-static inline void
-iavf_fill_data_desc(volatile struct ci_tx_desc *desc,
-	uint64_t desc_template,	uint16_t buffsz,
-	uint64_t buffer_addr)
-{
-	/* fill data descriptor qw1 from template */
-	desc->cmd_type_offset_bsz = desc_template;
-
-	/* set data buffer size */
-	desc->cmd_type_offset_bsz |=
-		(((uint64_t)buffsz << IAVF_TXD_DATA_QW1_TX_BUF_SZ_SHIFT) &
-		IAVF_TXD_DATA_QW1_TX_BUF_SZ_MASK);
-
-	desc->buffer_addr = rte_cpu_to_le_64(buffer_addr);
-	desc->cmd_type_offset_bsz = rte_cpu_to_le_64(desc->cmd_type_offset_bsz);
-}
-
+	/* LLDP switching field */
+	if (IAVF_CHECK_TX_LLDP(mbuf))
+		cd_type_cmd |= IAVF_TX_CTX_DESC_SWTCH_UPLINK << IAVF_TXD_CTX_QW1_CMD_SHIFT;
+
+	/* Tunneling field */
+	if (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)
+		iavf_fill_ctx_desc_tunnelling_field((uint64_t *)&cd_tunneling_params, mbuf);
+
+	/* L2TAG2 field (VLAN) */
+	if (ol_flags & RTE_MBUF_F_TX_QINQ) {
+		cd_l2tag2 = iavf_vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2 ?
+			    mbuf->vlan_tci_outer : mbuf->vlan_tci;
+	} else if (ol_flags & RTE_MBUF_F_TX_VLAN &&
+		   iavf_vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG2) {
+		cd_l2tag2 = mbuf->vlan_tci;
+	}
 
-static struct iavf_ipsec_crypto_pkt_metadata *
-iavf_ipsec_crypto_get_pkt_metadata(const struct ci_tx_queue *txq,
-		struct rte_mbuf *m)
-{
-	if (m->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD)
-		return RTE_MBUF_DYNFIELD(m, txq->ipsec_crypto_pkt_md_offset,
-				struct iavf_ipsec_crypto_pkt_metadata *);
+	/* Set outputs */
+	*qw0 = rte_cpu_to_le_64(cd_tunneling_params | ((uint64_t)cd_l2tag2 << 32));
+	*qw1 = rte_cpu_to_le_64(cd_type_cmd);
 
-	return NULL;
+	return 1; /* One context descriptor needed */
 }
 
 /* TX function */
@@ -2700,231 +2595,17 @@ uint16_t
 iavf_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
 	struct ci_tx_queue *txq = tx_queue;
-	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
-	struct ci_tx_entry *txe_ring = txq->sw_ring;
-	struct ci_tx_entry *txe, *txn;
-	struct rte_mbuf *mb, *mb_seg;
-	uint64_t buf_dma_addr;
-	uint16_t desc_idx, desc_idx_last;
-	uint16_t idx;
-	uint16_t slen;
-
-
-	/* Check if the descriptor ring needs to be cleaned. */
-	if (txq->nb_tx_free < txq->tx_free_thresh)
-		ci_tx_xmit_cleanup(txq);
-
-	desc_idx = txq->tx_tail;
-	txe = &txe_ring[desc_idx];
-
-	for (idx = 0; idx < nb_pkts; idx++) {
-		volatile struct ci_tx_desc *ddesc;
-		struct iavf_ipsec_crypto_pkt_metadata *ipsec_md;
-
-		uint16_t nb_desc_ctx, nb_desc_ipsec;
-		uint16_t nb_desc_data, nb_desc_required;
-		uint16_t tlen = 0, ipseclen = 0;
-		uint64_t ddesc_template = 0;
-		uint64_t ddesc_cmd = 0;
-
-		mb = tx_pkts[idx];
 
-		RTE_MBUF_PREFETCH_TO_FREE(txe->mbuf);
-
-		/**
-		 * Get metadata for ipsec crypto from mbuf dynamic fields if
-		 * security offload is specified.
-		 */
-		ipsec_md = iavf_ipsec_crypto_get_pkt_metadata(txq, mb);
-
-		nb_desc_data = mb->nb_segs;
-		nb_desc_ctx =
-			iavf_calc_context_desc(mb, txq->vlan_flag);
-		nb_desc_ipsec = !!(mb->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD);
-
-		/**
-		 * The number of descriptors that must be allocated for
-		 * a packet equals to the number of the segments of that
-		 * packet plus the context and ipsec descriptors if needed.
-		 * Recalculate the needed tx descs when TSO enabled in case
-		 * the mbuf data size exceeds max data size that hw allows
-		 * per tx desc.
-		 */
-		if (mb->ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-			nb_desc_required = ci_calc_pkt_desc(mb) + nb_desc_ctx + nb_desc_ipsec;
-		else
-			nb_desc_required = nb_desc_data + nb_desc_ctx + nb_desc_ipsec;
-
-		desc_idx_last = (uint16_t)(desc_idx + nb_desc_required - 1);
-
-		/* wrap descriptor ring */
-		if (desc_idx_last >= txq->nb_tx_desc)
-			desc_idx_last =
-				(uint16_t)(desc_idx_last - txq->nb_tx_desc);
-
-		PMD_TX_LOG(DEBUG,
-			"port_id=%u queue_id=%u tx_first=%u tx_last=%u",
-			txq->port_id, txq->queue_id, desc_idx, desc_idx_last);
-
-		if (nb_desc_required > txq->nb_tx_free) {
-			if (ci_tx_xmit_cleanup(txq)) {
-				if (idx == 0)
-					return 0;
-				goto end_of_tx;
-			}
-			if (unlikely(nb_desc_required > txq->tx_rs_thresh)) {
-				while (nb_desc_required > txq->nb_tx_free) {
-					if (ci_tx_xmit_cleanup(txq)) {
-						if (idx == 0)
-							return 0;
-						goto end_of_tx;
-					}
-				}
-			}
-		}
-
-		iavf_build_data_desc_cmd_offset_fields(&ddesc_template, mb,
-			txq->vlan_flag);
-
-			/* Setup TX context descriptor if required */
-		if (nb_desc_ctx) {
-			volatile struct iavf_tx_context_desc *ctx_desc =
-				(volatile struct iavf_tx_context_desc *)
-					&txr[desc_idx];
-
-			/* clear QW0 or the previous writeback value
-			 * may impact next write
-			 */
-			*(volatile uint64_t *)ctx_desc = 0;
-
-			txn = &txe_ring[txe->next_id];
-			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
-
-			if (txe->mbuf) {
-				rte_pktmbuf_free_seg(txe->mbuf);
-				txe->mbuf = NULL;
-			}
-
-			iavf_fill_context_desc(ctx_desc, mb, ipsec_md, &tlen,
-				txq->vlan_flag);
-			IAVF_DUMP_TX_DESC(txq, ctx_desc, desc_idx);
-
-			txe->last_id = desc_idx_last;
-			desc_idx = txe->next_id;
-			txe = txn;
-		}
-
-		if (nb_desc_ipsec) {
-			volatile struct iavf_tx_ipsec_desc *ipsec_desc =
-				(volatile struct iavf_tx_ipsec_desc *)
-					&txr[desc_idx];
-
-			txn = &txe_ring[txe->next_id];
-			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
-
-			if (txe->mbuf) {
-				rte_pktmbuf_free_seg(txe->mbuf);
-				txe->mbuf = NULL;
-			}
-
-			iavf_fill_ipsec_desc(ipsec_desc, ipsec_md, &ipseclen);
-
-			IAVF_DUMP_TX_DESC(txq, ipsec_desc, desc_idx);
-
-			txe->last_id = desc_idx_last;
-			desc_idx = txe->next_id;
-			txe = txn;
-		}
-
-		mb_seg = mb;
-
-		do {
-			ddesc = (volatile struct ci_tx_desc *)
-					&txr[desc_idx];
-
-			txn = &txe_ring[txe->next_id];
-			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
-
-			if (txe->mbuf)
-				rte_pktmbuf_free_seg(txe->mbuf);
-
-			txe->mbuf = mb_seg;
-
-			if ((mb_seg->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD) &&
-					(mb_seg->ol_flags &
-						(RTE_MBUF_F_TX_TCP_SEG |
-						RTE_MBUF_F_TX_UDP_SEG))) {
-				slen = tlen + mb_seg->l2_len + mb_seg->l3_len +
-						mb_seg->outer_l3_len + ipseclen;
-				if (mb_seg->ol_flags & RTE_MBUF_F_TX_L4_MASK)
-					slen += mb_seg->l4_len;
-			} else {
-				slen = mb_seg->data_len;
-			}
-
-			buf_dma_addr = rte_mbuf_data_iova(mb_seg);
-			while ((mb_seg->ol_flags & (RTE_MBUF_F_TX_TCP_SEG |
-					RTE_MBUF_F_TX_UDP_SEG)) &&
-					unlikely(slen > CI_MAX_DATA_PER_TXD)) {
-				iavf_fill_data_desc(ddesc, ddesc_template,
-					CI_MAX_DATA_PER_TXD, buf_dma_addr);
-
-				IAVF_DUMP_TX_DESC(txq, ddesc, desc_idx);
-
-				buf_dma_addr += CI_MAX_DATA_PER_TXD;
-				slen -= CI_MAX_DATA_PER_TXD;
-
-				txe->last_id = desc_idx_last;
-				desc_idx = txe->next_id;
-				txe = txn;
-				ddesc = &txr[desc_idx];
-				txn = &txe_ring[txe->next_id];
-			}
-
-			iavf_fill_data_desc(ddesc, ddesc_template,
-					slen, buf_dma_addr);
-
-			IAVF_DUMP_TX_DESC(txq, ddesc, desc_idx);
-
-			txe->last_id = desc_idx_last;
-			desc_idx = txe->next_id;
-			txe = txn;
-			mb_seg = mb_seg->next;
-		} while (mb_seg);
-
-		/* The last packet data descriptor needs End Of Packet (EOP) */
-		ddesc_cmd = CI_TX_DESC_CMD_EOP;
-
-		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_desc_required);
-		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_desc_required);
-
-		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
-			PMD_TX_LOG(DEBUG, "Setting RS bit on TXD id="
-				   "%4u (port=%d queue=%d)",
-				   desc_idx_last, txq->port_id, txq->queue_id);
-
-			ddesc_cmd |= CI_TX_DESC_CMD_RS;
-
-			/* Update txq RS bit counters */
-			txq->nb_tx_used = 0;
-		}
-
-		ddesc->cmd_type_offset_bsz |= rte_cpu_to_le_64(ddesc_cmd <<
-				IAVF_TXD_DATA_QW1_CMD_SHIFT);
-
-		IAVF_DUMP_TX_DESC(txq, ddesc, desc_idx - 1);
-	}
-
-end_of_tx:
-	rte_wmb();
-
-	PMD_TX_LOG(DEBUG, "port_id=%u queue_id=%u tx_tail=%u nb_tx=%u",
-		   txq->port_id, txq->queue_id, desc_idx, idx);
-
-	IAVF_PCI_REG_WRITE_RELAXED(txq->qtx_tail, desc_idx);
-	txq->tx_tail = desc_idx;
+	const struct ci_ipsec_ops ipsec_ops = {
+		.get_ipsec_desc = iavf_get_ipsec_desc,
+		.calc_segment_len = iavf_calc_ipsec_segment_len,
+	};
 
-	return idx;
+	/* IAVF does not support timestamp queues, so pass NULL for ts_fns */
+	return ci_xmit_pkts(txq, tx_pkts, nb_pkts,
+			    (txq->vlan_flag & IAVF_TX_FLAGS_VLAN_TAG_LOC_L2TAG1) ?
+				CI_VLAN_IN_L2TAG1 : CI_VLAN_IN_L2TAG2,
+			    iavf_get_context_desc, &ipsec_ops, NULL);
 }
 
 /* Check if the packet with vlan user priority is transmitted in the
diff --git a/drivers/net/intel/iavf/iavf_rxtx.h b/drivers/net/intel/iavf/iavf_rxtx.h
index cca5c25119..fe3385dcf6 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.h
+++ b/drivers/net/intel/iavf/iavf_rxtx.h
@@ -43,6 +43,7 @@
 		RTE_ETH_TX_OFFLOAD_TCP_CKSUM |		\
 		RTE_ETH_TX_OFFLOAD_SCTP_CKSUM |		\
 		RTE_ETH_TX_OFFLOAD_TCP_TSO |		\
+		RTE_ETH_TX_OFFLOAD_UDP_TSO |		\
 		RTE_ETH_TX_OFFLOAD_OUTER_IPV4_CKSUM |	\
 		RTE_ETH_TX_OFFLOAD_VXLAN_TNL_TSO |	\
 		RTE_ETH_TX_OFFLOAD_QINQ_INSERT |	\
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v5 17/35] net/i40e: document requirement for QinQ support
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (15 preceding siblings ...)
  2026-02-11 18:12   ` [PATCH v5 16/35] net/iavf: use common scalar Tx function Bruce Richardson
@ 2026-02-11 18:12   ` Bruce Richardson
  2026-02-11 18:12   ` [PATCH v5 18/35] net/idpf: use common scalar Tx function Bruce Richardson
                     ` (18 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 18:12 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

In order to get multiple VLANs inserted in an outgoing packet with QinQ
offload the i40e driver needs to be set to double vlan mode. This is
done by using the VLAN_EXTEND Rx config flag. Add a code check for this
dependency and update the docs about it.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/nics/i40e.rst           | 18 ++++++++++++++++++
 drivers/net/intel/i40e/i40e_rxtx.c |  9 +++++++++
 2 files changed, 27 insertions(+)

diff --git a/doc/guides/nics/i40e.rst b/doc/guides/nics/i40e.rst
index 40be9aa755..750af3c8b3 100644
--- a/doc/guides/nics/i40e.rst
+++ b/doc/guides/nics/i40e.rst
@@ -245,6 +245,24 @@ Runtime Configuration
   * ``segment``: Check number of mbuf segments not exceed hw limitation.
   * ``offload``: Check any unsupported offload flag.
 
+QinQ Configuration
+~~~~~~~~~~~~~~~~~~
+
+When using QinQ TX offload (``RTE_ETH_TX_OFFLOAD_QINQ_INSERT``), you must also
+enable ``RTE_ETH_RX_OFFLOAD_VLAN_EXTEND`` to configure the hardware for double
+VLAN mode. Without this, only the inner VLAN tag will be inserted.
+
+Example::
+
+  struct rte_eth_conf port_conf = {
+      .rxmode = {
+          .offloads = RTE_ETH_RX_OFFLOAD_VLAN_EXTEND,
+      },
+      .txmode = {
+          .offloads = RTE_ETH_TX_OFFLOAD_QINQ_INSERT,
+      },
+  };
+
 Vector RX Pre-conditions
 ~~~~~~~~~~~~~~~~~~~~~~~~
 For Vector RX it is assumed that the number of descriptor rings will be a power
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 35c1b53c1e..dfd2213020 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -2182,6 +2182,15 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	vsi = i40e_pf_get_vsi_by_qindex(pf, queue_idx);
 	if (!vsi)
 		return -EINVAL;
+
+	/* Check if QinQ TX offload requires VLAN extend mode */
+	if ((offloads & RTE_ETH_TX_OFFLOAD_QINQ_INSERT) &&
+			!(dev->data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_VLAN_EXTEND)) {
+		PMD_DRV_LOG(WARNING, "Port %u: QinQ TX offload is enabled but VLAN extend mode is not set. ",
+				dev->data->port_id);
+		PMD_DRV_LOG(WARNING, "Double VLAN insertion may not work correctly without RTE_ETH_RX_OFFLOAD_VLAN_EXTEND set in Rx configuration.");
+	}
+
 	q_offset = i40e_get_queue_offset_by_qindex(pf, queue_idx);
 	if (q_offset < 0)
 		return -EINVAL;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v5 18/35] net/idpf: use common scalar Tx function
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (16 preceding siblings ...)
  2026-02-11 18:12   ` [PATCH v5 17/35] net/i40e: document requirement for QinQ support Bruce Richardson
@ 2026-02-11 18:12   ` Bruce Richardson
  2026-02-11 18:12   ` [PATCH v5 19/35] net/intel: avoid writing the final pkt descriptor twice Bruce Richardson
                     ` (17 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 18:12 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov, Jingjing Wu, Praveen Shetty

Update idpf driver to use the common scalar Tx function in single-queue
configuration.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 drivers/net/intel/idpf/idpf_common_rxtx.c | 178 ++--------------------
 1 file changed, 10 insertions(+), 168 deletions(-)

diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index c34dde2796..77f4099f2b 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -8,7 +8,6 @@
 
 #include "idpf_common_rxtx.h"
 #include "idpf_common_device.h"
-#include "../common/rx.h"
 
 int idpf_timestamp_dynfield_offset = -1;
 uint64_t idpf_timestamp_dynflag;
@@ -848,9 +847,10 @@ idpf_calc_context_desc(uint64_t flags)
 /* set TSO context descriptor, returns 0 if no context needed, 1 if context set
  */
 static inline uint16_t
-idpf_set_tso_ctx(uint64_t ol_flags, struct rte_mbuf *mbuf,
-			union ci_tx_offload tx_offload,
-			uint64_t *qw0, uint64_t *qw1)
+idpf_set_tso_ctx(uint64_t ol_flags, const struct rte_mbuf *mbuf,
+		 const union ci_tx_offload *tx_offload,
+		 const struct ci_tx_queue *txq __rte_unused,
+		 uint64_t *qw0, uint64_t *qw1)
 {
 	uint16_t cmd_dtype = IDPF_TX_DESC_DTYPE_FLEX_TSO_CTX | IDPF_TX_FLEX_CTX_DESC_CMD_TSO;
 	uint16_t tso_segsz = mbuf->tso_segsz;
@@ -861,12 +861,12 @@ idpf_set_tso_ctx(uint64_t ol_flags, struct rte_mbuf *mbuf,
 		return 0;
 
 	/* TSO context descriptor setup */
-	if (tx_offload.l4_len == 0) {
+	if (tx_offload->l4_len == 0) {
 		TX_LOG(DEBUG, "L4 length set to 0");
 		return 0;
 	}
 
-	hdr_len = tx_offload.l2_len + tx_offload.l3_len + tx_offload.l4_len;
+	hdr_len = tx_offload->l2_len + tx_offload->l3_len + tx_offload->l4_len;
 	tso_len = mbuf->pkt_len - hdr_len;
 
 	*qw0 = rte_cpu_to_le_32(tso_len & IDPF_TXD_FLEX_CTX_MSS_RT_M) |
@@ -933,7 +933,8 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		tx_offload.tso_segsz = tx_pkt->tso_segsz;
 		/* Calculate the number of context descriptors needed. */
 		uint64_t cd_qw0 = 0, cd_qw1 = 0;
-		nb_ctx = idpf_set_tso_ctx(ol_flags, tx_pkt, tx_offload, &cd_qw0, &cd_qw1);
+		nb_ctx = idpf_set_tso_ctx(ol_flags, tx_pkt, &tx_offload, txq,
+					  &cd_qw0, &cd_qw1);
 
 		/* Calculate the number of TX descriptors needed for
 		 * each packet. For TSO packets, use ci_calc_pkt_desc as
@@ -1339,167 +1340,8 @@ uint16_t
 idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			  uint16_t nb_pkts)
 {
-	volatile struct ci_tx_desc *txd;
-	volatile struct ci_tx_desc *txr;
-	union ci_tx_offload tx_offload = {0};
-	struct ci_tx_entry *txe, *txn;
-	struct ci_tx_entry *sw_ring;
-	struct ci_tx_queue *txq;
-	struct rte_mbuf *tx_pkt;
-	struct rte_mbuf *m_seg;
-	uint64_t buf_dma_addr;
-	uint32_t td_offset;
-	uint64_t ol_flags;
-	uint16_t tx_last;
-	uint16_t nb_used;
-	uint16_t nb_ctx;
-	uint16_t td_cmd;
-	uint16_t tx_id;
-	uint16_t nb_tx;
-	uint16_t slen;
-
-	nb_tx = 0;
-	txq = tx_queue;
-
-	if (unlikely(txq == NULL))
-		return nb_tx;
-
-	sw_ring = txq->sw_ring;
-	txr = txq->ci_tx_ring;
-	tx_id = txq->tx_tail;
-	txe = &sw_ring[tx_id];
-
-	/* Check if the descriptor ring needs to be cleaned. */
-	if (txq->nb_tx_free < txq->tx_free_thresh)
-		(void)ci_tx_xmit_cleanup(txq);
-
-	for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
-		td_cmd = 0;
-		td_offset = 0;
-
-		tx_pkt = *tx_pkts++;
-		RTE_MBUF_PREFETCH_TO_FREE(txe->mbuf);
-
-		ol_flags = tx_pkt->ol_flags;
-		tx_offload.l2_len = tx_pkt->l2_len;
-		tx_offload.l3_len = tx_pkt->l3_len;
-		tx_offload.l4_len = tx_pkt->l4_len;
-		tx_offload.tso_segsz = tx_pkt->tso_segsz;
-		/* Calculate the number of context descriptors needed. */
-		uint64_t cd_qw0, cd_qw1;
-		nb_ctx = idpf_set_tso_ctx(ol_flags, tx_pkt, tx_offload, &cd_qw0, &cd_qw1);
-
-		/* The number of descriptors that must be allocated for
-		 * a packet. For TSO packets, use ci_calc_pkt_desc as
-		 * the mbuf data size might exceed max data size that hw allows
-		 * per tx desc.
-		 */
-		if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
-			nb_used = (uint16_t)(ci_calc_pkt_desc(tx_pkt) + nb_ctx);
-		else
-			nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
-		tx_last = (uint16_t)(tx_id + nb_used - 1);
-
-		/* Circular ring */
-		if (tx_last >= txq->nb_tx_desc)
-			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
-
-		TX_LOG(DEBUG, "port_id=%u queue_id=%u"
-		       " tx_first=%u tx_last=%u",
-		       txq->port_id, txq->queue_id, tx_id, tx_last);
-
-		if (nb_used > txq->nb_tx_free) {
-			if (ci_tx_xmit_cleanup(txq) != 0) {
-				if (nb_tx == 0)
-					return 0;
-				goto end_of_tx;
-			}
-			if (unlikely(nb_used > txq->tx_rs_thresh)) {
-				while (nb_used > txq->nb_tx_free) {
-					if (ci_tx_xmit_cleanup(txq) != 0) {
-						if (nb_tx == 0)
-							return 0;
-						goto end_of_tx;
-					}
-				}
-			}
-		}
-
-		if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
-			td_cmd |= IDPF_TX_FLEX_DESC_CMD_CS_EN;
-
-		if (nb_ctx != 0) {
-			/* Setup TX context descriptor if required */
-			uint64_t *ctx_txd = RTE_CAST_PTR(uint64_t *, &txr[tx_id]);
-
-			txn = &sw_ring[txe->next_id];
-			RTE_MBUF_PREFETCH_TO_FREE(txn->mbuf);
-			if (txe->mbuf != NULL) {
-				rte_pktmbuf_free_seg(txe->mbuf);
-				txe->mbuf = NULL;
-			}
-
-			ctx_txd[0] = cd_qw0;
-			ctx_txd[1] = cd_qw1;
-
-			txe->last_id = tx_last;
-			tx_id = txe->next_id;
-			txe = txn;
-		}
-
-		m_seg = tx_pkt;
-		do {
-			txd = &txr[tx_id];
-			txn = &sw_ring[txe->next_id];
-
-			if (txe->mbuf != NULL)
-				rte_pktmbuf_free_seg(txe->mbuf);
-			txe->mbuf = m_seg;
-
-			/* Setup TX Descriptor */
-			slen = m_seg->data_len;
-			buf_dma_addr = rte_mbuf_data_iova(m_seg);
-			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
-				((uint64_t)td_cmd  << CI_TXD_QW1_CMD_S) |
-				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
-				((uint64_t)slen << CI_TXD_QW1_TX_BUF_SZ_S));
-
-			txe->last_id = tx_last;
-			tx_id = txe->next_id;
-			txe = txn;
-			m_seg = m_seg->next;
-		} while (m_seg);
-
-		/* The last packet data descriptor needs End Of Packet (EOP) */
-		td_cmd |= CI_TX_DESC_CMD_EOP;
-		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
-		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
-
-		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
-			TX_LOG(DEBUG, "Setting RS bit on TXD id="
-			       "%4u (port=%d queue=%d)",
-			       tx_last, txq->port_id, txq->queue_id);
-
-			td_cmd |= CI_TX_DESC_CMD_RS;
-
-			/* Update txq RS bit counters */
-			txq->nb_tx_used = 0;
-		}
-
-		txd->cmd_type_offset_bsz |= rte_cpu_to_le_16(td_cmd << CI_TXD_QW1_CMD_S);
-	}
-
-end_of_tx:
-	rte_wmb();
-
-	TX_LOG(DEBUG, "port_id=%u queue_id=%u tx_tail=%u nb_tx=%u",
-	       txq->port_id, txq->queue_id, tx_id, nb_tx);
-
-	IDPF_PCI_REG_WRITE(txq->qtx_tail, tx_id);
-	txq->tx_tail = tx_id;
-
-	return nb_tx;
+	return ci_xmit_pkts(tx_queue, tx_pkts, nb_pkts, CI_VLAN_IN_L2TAG1,
+			idpf_set_tso_ctx, NULL, NULL);
 }
 
 /* TX prep functions */
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v5 19/35] net/intel: avoid writing the final pkt descriptor twice
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (17 preceding siblings ...)
  2026-02-11 18:12   ` [PATCH v5 18/35] net/idpf: use common scalar Tx function Bruce Richardson
@ 2026-02-11 18:12   ` Bruce Richardson
  2026-02-11 18:12   ` [PATCH v5 20/35] net/intel: write descriptors using non-volatile pointers Bruce Richardson
                     ` (16 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 18:12 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

In the scalar datapath, there is a loop to handle multi-segment, and
multi-descriptor packets on Tx. After that loop, the end-of-packet bit
was written to the descriptor separately, meaning that for each
single-descriptor packet there were two writes to the second quad-word -
basically 3 x 64-bit writes rather than just 2. Adjusting the code to
compute the EOP bit inside the loop saves that extra write per packet
and so improves performance.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 drivers/net/intel/common/tx_scalar.h | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index 15dc3dfa59..f6ed11a5a8 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -364,6 +364,10 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 				txn = &sw_ring[txe->next_id];
 			}
 
+			/* fill the last descriptor with End of Packet (EOP) bit */
+			if (m_seg->next == NULL)
+				td_cmd |= CI_TX_DESC_CMD_EOP;
+
 			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
 			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
 				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
@@ -376,21 +380,17 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			txe = txn;
 			m_seg = m_seg->next;
 		} while (m_seg);
-
-		/* fill the last descriptor with End of Packet (EOP) bit */
-		td_cmd |= CI_TX_DESC_CMD_EOP;
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
 		/* set RS bit on the last descriptor of one packet */
 		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
-			td_cmd |= CI_TX_DESC_CMD_RS;
+			txd->cmd_type_offset_bsz |=
+					rte_cpu_to_le_64(CI_TX_DESC_CMD_RS << CI_TXD_QW1_CMD_S);
 
 			/* Update txq RS bit counters */
 			txq->nb_tx_used = 0;
 		}
-		txd->cmd_type_offset_bsz |=
-				rte_cpu_to_le_64(((uint64_t)td_cmd) << CI_TXD_QW1_CMD_S);
 
 		if (ts_fns != NULL)
 			ts_id = ts_fns->write_ts_desc(txq, tx_pkt, tx_id, ts_id);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v5 20/35] net/intel: write descriptors using non-volatile pointers
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (18 preceding siblings ...)
  2026-02-11 18:12   ` [PATCH v5 19/35] net/intel: avoid writing the final pkt descriptor twice Bruce Richardson
@ 2026-02-11 18:12   ` Bruce Richardson
  2026-02-11 21:14     ` Morten Brørup
  2026-02-11 18:12   ` [PATCH v5 21/35] net/intel: remove unnecessary flag clearing Bruce Richardson
                     ` (15 subsequent siblings)
  35 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 18:12 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

Use a non-volatile uint64_t pointer to store to the descriptor ring.
This will allow the compiler to optionally merge the stores as it sees
best.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/common/tx_scalar.h | 28 ++++++++++++++++++++--------
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index f6ed11a5a8..00771402f8 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -166,6 +166,19 @@ struct ci_timestamp_queue_fns {
 	write_ts_tail_t write_ts_tail;
 };
 
+static inline void
+write_txd(volatile void *txd, uint64_t qw0, uint64_t qw1)
+{
+	/* we use an aligned structure and cast away the volatile to allow the compiler
+	 * to opportunistically optimize the two 64-bit writes as a single 128-bit write.
+	 */
+	__rte_aligned(16) struct txdesc {
+		uint64_t qw0, qw1;
+	} *txdesc = RTE_CAST_PTR(struct txdesc *, txd);
+	txdesc->qw0 = rte_cpu_to_le_64(qw0);
+	txdesc->qw1 = rte_cpu_to_le_64(qw1);
+}
+
 static inline uint16_t
 ci_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
@@ -299,8 +312,7 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 				txe->mbuf = NULL;
 			}
 
-			ctx_txd[0] = cd_qw0;
-			ctx_txd[1] = cd_qw1;
+			write_txd(ctx_txd, cd_qw0, cd_qw1);
 
 			txe->last_id = tx_last;
 			tx_id = txe->next_id;
@@ -347,12 +359,12 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 
 			while ((ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) &&
 					unlikely(slen > CI_MAX_DATA_PER_TXD)) {
-				txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-				txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+				const uint64_t cmd_type_offset_bsz = CI_TX_DESC_DTYPE_DATA |
 					((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
 					((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
 					((uint64_t)CI_MAX_DATA_PER_TXD << CI_TXD_QW1_TX_BUF_SZ_S) |
-					((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
+					((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S);
+				write_txd(txd, buf_dma_addr, cmd_type_offset_bsz);
 
 				buf_dma_addr += CI_MAX_DATA_PER_TXD;
 				slen -= CI_MAX_DATA_PER_TXD;
@@ -368,12 +380,12 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			if (m_seg->next == NULL)
 				td_cmd |= CI_TX_DESC_CMD_EOP;
 
-			txd->buffer_addr = rte_cpu_to_le_64(buf_dma_addr);
-			txd->cmd_type_offset_bsz = rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+			const uint64_t cmd_type_offset_bsz = CI_TX_DESC_DTYPE_DATA |
 				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
 				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
 				((uint64_t)slen << CI_TXD_QW1_TX_BUF_SZ_S) |
-				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S));
+				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S);
+			write_txd(txd, buf_dma_addr, cmd_type_offset_bsz);
 
 			txe->last_id = tx_last;
 			tx_id = txe->next_id;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v5 21/35] net/intel: remove unnecessary flag clearing
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (19 preceding siblings ...)
  2026-02-11 18:12   ` [PATCH v5 20/35] net/intel: write descriptors using non-volatile pointers Bruce Richardson
@ 2026-02-11 18:12   ` Bruce Richardson
  2026-02-11 18:12   ` [PATCH v5 22/35] net/intel: add special handling for single desc packets Bruce Richardson
                     ` (14 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 18:12 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

When cleaning the Tx ring, there is no need to zero out the done flag
from the completed entry. That flag will be automatically cleared when
the descriptor is next written. This gives a small performance benefit.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 drivers/net/intel/common/tx_scalar.h | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index 00771402f8..56c2dd526f 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -46,13 +46,6 @@ ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
 	else
 		nb_tx_to_clean = (uint16_t)(desc_to_clean_to - last_desc_cleaned);
 
-	/* The last descriptor to clean is done, so that means all the
-	 * descriptors from the last descriptor that was cleaned
-	 * up to the last descriptor with the RS bit set
-	 * are done. Only reset the threshold descriptor.
-	 */
-	txd[desc_to_clean_to].cmd_type_offset_bsz = 0;
-
 	/* Update the txq to reflect the last descriptor that was cleaned */
 	txq->last_desc_cleaned = desc_to_clean_to;
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v5 22/35] net/intel: add special handling for single desc packets
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (20 preceding siblings ...)
  2026-02-11 18:12   ` [PATCH v5 21/35] net/intel: remove unnecessary flag clearing Bruce Richardson
@ 2026-02-11 18:12   ` Bruce Richardson
  2026-02-11 18:12   ` [PATCH v5 23/35] net/intel: use separate array for desc status tracking Bruce Richardson
                     ` (13 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 18:12 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov

Within the scalar packets, add a shortcut path for packets that don't
use TSO and have only a single data descriptor.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 drivers/net/intel/common/tx_scalar.h | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index 56c2dd526f..0bc2956dcf 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -294,6 +294,31 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			ci_txd_enable_checksum(ol_flags, &td_cmd,
 						&td_offset, tx_offload);
 
+		/* special case for single descriptor packet, without TSO offload */
+		if (nb_used == 1 &&
+				(ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) == 0) {
+			txd = &ci_tx_ring[tx_id];
+			tx_id = txe->next_id;
+
+			if (txe->mbuf)
+				rte_pktmbuf_free_seg(txe->mbuf);
+			*txe = (struct ci_tx_entry){
+				.mbuf = tx_pkt, .last_id = tx_last, .next_id = tx_id
+			};
+
+			/* Setup TX Descriptor */
+			td_cmd |= CI_TX_DESC_CMD_EOP;
+			const uint64_t cmd_type_offset_bsz = CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)td_cmd << CI_TXD_QW1_CMD_S) |
+				((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
+				((uint64_t)tx_pkt->data_len << CI_TXD_QW1_TX_BUF_SZ_S) |
+				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S);
+			write_txd(txd, rte_mbuf_data_iova(tx_pkt), cmd_type_offset_bsz);
+
+			txe = &sw_ring[tx_id];
+			goto end_pkt;
+		}
+
 		if (nb_ctx) {
 			/* Setup TX context descriptor if required */
 			uint64_t *ctx_txd = RTE_CAST_PTR(uint64_t *, &ci_tx_ring[tx_id]);
@@ -385,6 +410,7 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			txe = txn;
 			m_seg = m_seg->next;
 		} while (m_seg);
+end_pkt:
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v5 23/35] net/intel: use separate array for desc status tracking
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (21 preceding siblings ...)
  2026-02-11 18:12   ` [PATCH v5 22/35] net/intel: add special handling for single desc packets Bruce Richardson
@ 2026-02-11 18:12   ` Bruce Richardson
  2026-02-11 21:51     ` Morten Brørup
  2026-02-11 18:12   ` [PATCH v5 24/35] net/ixgbe: " Bruce Richardson
                     ` (12 subsequent siblings)
  35 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 18:12 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Anatoly Burakov, Praveen Shetty,
	Vladimir Medvedkin, Jingjing Wu

Rather than writing a last_id for each individual descriptor, we can
write one only for places where the "report status" (RS) bit is set,
i.e. the descriptors which will be written back when done. The method
used for marking what descriptors are free is also changed in the
process, even if the last descriptor with the "done" bits set is past
the expected point, we only track up to the expected point, and leave
the rest to be counted as freed next time. This means that we always
have the RS/DD bits set at fixed intervals, and we always track free
slots in units of the same tx_free_thresh intervals.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 drivers/net/intel/common/tx.h             |  4 ++
 drivers/net/intel/common/tx_scalar.h      | 66 +++++++++++------------
 drivers/net/intel/cpfl/cpfl_rxtx.c        | 16 ++++++
 drivers/net/intel/i40e/i40e_rxtx.c        | 20 +++++++
 drivers/net/intel/iavf/iavf_rxtx.c        | 19 +++++++
 drivers/net/intel/ice/ice_rxtx.c          | 20 +++++++
 drivers/net/intel/idpf/idpf_common_rxtx.c |  7 +++
 drivers/net/intel/idpf/idpf_rxtx.c        | 12 +++++
 8 files changed, 130 insertions(+), 34 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 5da6c7c15d..acd362dca3 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -134,6 +134,8 @@ struct ci_tx_queue {
 		struct ci_tx_entry *sw_ring; /* virtual address of SW ring */
 		struct ci_tx_entry_vec *sw_ring_vec;
 	};
+	/* Scalar TX path: Array tracking last_id at each RS threshold boundary */
+	uint16_t *rs_last_id;
 	uint16_t nb_tx_desc;           /* number of TX descriptors */
 	uint16_t tx_tail; /* current value of tail register */
 	uint16_t nb_tx_used; /* number of TX desc used since RS bit set */
@@ -147,6 +149,8 @@ struct ci_tx_queue {
 	uint16_t tx_free_thresh;
 	/* Number of TX descriptors to use before RS bit is set. */
 	uint16_t tx_rs_thresh;
+	/* Scalar TX path: log2 of tx_rs_thresh for efficient bit operations */
+	uint8_t log2_rs_thresh;
 	uint16_t port_id;  /* Device port identifier. */
 	uint16_t queue_id; /* TX queue index. */
 	uint16_t reg_idx;
diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index 0bc2956dcf..7499e5ed20 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -22,33 +22,25 @@
 static __rte_always_inline int
 ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
 {
-	struct ci_tx_entry *sw_ring = txq->sw_ring;
 	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
-	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
-	uint16_t nb_tx_desc = txq->nb_tx_desc;
-	uint16_t desc_to_clean_to;
-	uint16_t nb_tx_to_clean;
-
-	/* Determine the last descriptor needing to be cleaned */
-	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
-	if (desc_to_clean_to >= nb_tx_desc)
-		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
-
-	/* Check if descriptor is done */
-	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-	if ((txd[desc_to_clean_to].cmd_type_offset_bsz & rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
-			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
+	const uint16_t last_desc_cleaned = txq->last_desc_cleaned;
+	const uint16_t nb_tx_desc = txq->nb_tx_desc;
+
+	/* Calculate where the next descriptor write-back will occur */
+	const uint16_t rs_idx = (last_desc_cleaned == nb_tx_desc - 1) ?
+			0 :
+			(last_desc_cleaned + 1) >> txq->log2_rs_thresh;
+	uint16_t desc_to_clean_to = (rs_idx << txq->log2_rs_thresh) + (txq->tx_rs_thresh - 1);
+
+	/* Check if descriptor is done  */
+	if ((txd[txq->rs_last_id[rs_idx]].cmd_type_offset_bsz &
+			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
+				rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
 		return -1;
 
-	/* Figure out how many descriptors will be cleaned */
-	if (last_desc_cleaned > desc_to_clean_to)
-		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) + desc_to_clean_to);
-	else
-		nb_tx_to_clean = (uint16_t)(desc_to_clean_to - last_desc_cleaned);
-
 	/* Update the txq to reflect the last descriptor that was cleaned */
 	txq->last_desc_cleaned = desc_to_clean_to;
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
+	txq->nb_tx_free += txq->tx_rs_thresh;
 
 	return 0;
 }
@@ -219,6 +211,7 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 		uint16_t nb_ipsec = 0;
 		uint64_t ipsec_qw0 = 0, ipsec_qw1 = 0;
 		uint64_t cd_qw0 = 0, cd_qw1 = 0;
+		uint16_t pkt_rs_idx;
 		tx_pkt = *tx_pkts++;
 
 		ol_flags = tx_pkt->ol_flags;
@@ -262,6 +255,9 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 		if (tx_last >= txq->nb_tx_desc)
 			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
 
+		/* Track the RS threshold bucket at packet start */
+		pkt_rs_idx = (uint16_t)(tx_id >> txq->log2_rs_thresh);
+
 		if (nb_used > txq->nb_tx_free) {
 			if (ci_tx_xmit_cleanup(txq) != 0) {
 				if (nb_tx == 0)
@@ -302,10 +298,7 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 
 			if (txe->mbuf)
 				rte_pktmbuf_free_seg(txe->mbuf);
-			*txe = (struct ci_tx_entry){
-				.mbuf = tx_pkt, .last_id = tx_last, .next_id = tx_id
-			};
-
+			txe->mbuf = tx_pkt;
 			/* Setup TX Descriptor */
 			td_cmd |= CI_TX_DESC_CMD_EOP;
 			const uint64_t cmd_type_offset_bsz = CI_TX_DESC_DTYPE_DATA |
@@ -332,7 +325,6 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 
 			write_txd(ctx_txd, cd_qw0, cd_qw1);
 
-			txe->last_id = tx_last;
 			tx_id = txe->next_id;
 			txe = txn;
 		}
@@ -351,7 +343,6 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			ipsec_txd[0] = ipsec_qw0;
 			ipsec_txd[1] = ipsec_qw1;
 
-			txe->last_id = tx_last;
 			tx_id = txe->next_id;
 			txe = txn;
 		}
@@ -387,7 +378,6 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 				buf_dma_addr += CI_MAX_DATA_PER_TXD;
 				slen -= CI_MAX_DATA_PER_TXD;
 
-				txe->last_id = tx_last;
 				tx_id = txe->next_id;
 				txe = txn;
 				txd = &ci_tx_ring[tx_id];
@@ -405,7 +395,6 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S);
 			write_txd(txd, buf_dma_addr, cmd_type_offset_bsz);
 
-			txe->last_id = tx_last;
 			tx_id = txe->next_id;
 			txe = txn;
 			m_seg = m_seg->next;
@@ -414,13 +403,22 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
-		/* set RS bit on the last descriptor of one packet */
-		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
+		/* Check if packet crosses into a new RS threshold bucket.
+		 * The RS bit is set on the last descriptor when we move from one bucket to another.
+		 * For example, with tx_rs_thresh=32 and a 5-descriptor packet using slots 30-34:
+		 *   - pkt_rs_idx = 30 >> 5 = 0 (started in bucket 0)
+		 *   - tx_last = 34, so 35 >> 5 = 1 (next packet is in bucket 1)
+		 *   - Since 0 != 1, set RS bit on descriptor 34, and record rs_last_id[0] = 34
+		 */
+		uint16_t next_rs_idx = ((tx_last + 1) >> txq->log2_rs_thresh);
+
+		if (next_rs_idx != pkt_rs_idx) {
+			/* Packet crossed into a new bucket - set RS bit on last descriptor */
 			txd->cmd_type_offset_bsz |=
 					rte_cpu_to_le_64(CI_TX_DESC_CMD_RS << CI_TXD_QW1_CMD_S);
 
-			/* Update txq RS bit counters */
-			txq->nb_tx_used = 0;
+			/* Record the last descriptor ID for the bucket we're leaving */
+			txq->rs_last_id[pkt_rs_idx] = tx_last;
 		}
 
 		if (ts_fns != NULL)
diff --git a/drivers/net/intel/cpfl/cpfl_rxtx.c b/drivers/net/intel/cpfl/cpfl_rxtx.c
index a4d15b7f9c..e7a98ed4f6 100644
--- a/drivers/net/intel/cpfl/cpfl_rxtx.c
+++ b/drivers/net/intel/cpfl/cpfl_rxtx.c
@@ -5,6 +5,7 @@
 #include <ethdev_driver.h>
 #include <rte_net.h>
 #include <rte_vect.h>
+#include <rte_bitops.h>
 
 #include "cpfl_ethdev.h"
 #include "cpfl_rxtx.h"
@@ -330,6 +331,7 @@ cpfl_tx_queue_release(void *txq)
 
 	ci_txq_release_all_mbufs(q, q->vector_tx);
 	rte_free(q->sw_ring);
+	rte_free(q->rs_last_id);
 	rte_memzone_free(q->mz);
 	rte_free(cpfl_txq);
 }
@@ -572,6 +574,7 @@ cpfl_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 
 	txq->nb_tx_desc = nb_desc;
 	txq->tx_rs_thresh = tx_rs_thresh;
+	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
 	txq->tx_free_thresh = tx_free_thresh;
 	txq->queue_id = vport->chunks_info.tx_start_qid + queue_idx;
 	txq->port_id = dev->data->port_id;
@@ -605,6 +608,17 @@ cpfl_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 		goto err_sw_ring_alloc;
 	}
 
+	/* Allocate RS last_id tracking array */
+	uint16_t num_rs_buckets = nb_desc / tx_rs_thresh;
+	txq->rs_last_id = rte_zmalloc_socket("cpfl tx rs_last_id",
+			sizeof(txq->rs_last_id[0]) * num_rs_buckets,
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (txq->rs_last_id == NULL) {
+		PMD_INIT_LOG(ERR, "Failed to allocate memory for RS last_id array");
+		ret = -ENOMEM;
+		goto err_rs_last_id_alloc;
+	}
+
 	if (!is_splitq) {
 		txq->ci_tx_ring = mz->addr;
 		idpf_qc_single_tx_queue_reset(txq);
@@ -628,6 +642,8 @@ cpfl_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 	return 0;
 
 err_complq_setup:
+	rte_free(txq->rs_last_id);
+err_rs_last_id_alloc:
 	rte_free(txq->sw_ring);
 err_sw_ring_alloc:
 	cpfl_dma_zone_release(mz);
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index dfd2213020..b554bc6c31 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -24,6 +24,7 @@
 #include <rte_ip.h>
 #include <rte_net.h>
 #include <rte_vect.h>
+#include <rte_bitops.h>
 
 #include "i40e_logs.h"
 #include "base/i40e_prototype.h"
@@ -2280,6 +2281,13 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 			     (int)queue_idx);
 		return I40E_ERR_PARAM;
 	}
+	if (!rte_is_power_of_2(tx_rs_thresh)) {
+		PMD_INIT_LOG(ERR, "tx_rs_thresh must be a power of 2. (tx_rs_thresh=%u port=%d queue=%d)",
+			     (unsigned int)tx_rs_thresh,
+			     (int)dev->data->port_id,
+			     (int)queue_idx);
+		return I40E_ERR_PARAM;
+	}
 	if ((tx_rs_thresh > 1) && (tx_conf->tx_thresh.wthresh != 0)) {
 		PMD_INIT_LOG(ERR, "TX WTHRESH must be set to 0 if "
 			     "tx_rs_thresh is greater than 1. "
@@ -2321,6 +2329,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	txq->mz = tz;
 	txq->nb_tx_desc = nb_desc;
 	txq->tx_rs_thresh = tx_rs_thresh;
+	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
 	txq->tx_free_thresh = tx_free_thresh;
 	txq->queue_id = queue_idx;
 	txq->reg_idx = reg_idx;
@@ -2346,6 +2355,16 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 		return -ENOMEM;
 	}
 
+	/* Allocate RS last_id tracking array */
+	uint16_t num_rs_buckets = nb_desc / tx_rs_thresh;
+	txq->rs_last_id = rte_zmalloc_socket(NULL, sizeof(txq->rs_last_id[0]) * num_rs_buckets,
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (txq->rs_last_id == NULL) {
+		i40e_tx_queue_release(txq);
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for RS last_id array");
+		return -ENOMEM;
+	}
+
 	i40e_reset_tx_queue(txq);
 	txq->q_set = TRUE;
 
@@ -2391,6 +2410,7 @@ i40e_tx_queue_release(void *txq)
 
 	ci_txq_release_all_mbufs(q, false);
 	rte_free(q->sw_ring);
+	rte_free(q->rs_last_id);
 	rte_memzone_free(q->mz);
 	rte_free(q);
 }
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index 67906841da..d63590d660 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -25,6 +25,7 @@
 #include <rte_ip.h>
 #include <rte_net.h>
 #include <rte_vect.h>
+#include <rte_bitops.h>
 #include <rte_vxlan.h>
 #include <rte_gtp.h>
 #include <rte_geneve.h>
@@ -194,6 +195,11 @@ check_tx_thresh(uint16_t nb_desc, uint16_t tx_rs_thresh,
 			     tx_rs_thresh, nb_desc);
 		return -EINVAL;
 	}
+	if (!rte_is_power_of_2(tx_rs_thresh)) {
+		PMD_INIT_LOG(ERR, "tx_rs_thresh must be a power of 2. (tx_rs_thresh=%u)",
+			     tx_rs_thresh);
+		return -EINVAL;
+	}
 
 	return 0;
 }
@@ -801,6 +807,7 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
 
 	txq->nb_tx_desc = nb_desc;
 	txq->tx_rs_thresh = tx_rs_thresh;
+	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
 	txq->tx_free_thresh = tx_free_thresh;
 	txq->queue_id = queue_idx;
 	txq->port_id = dev->data->port_id;
@@ -826,6 +833,17 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
 		return -ENOMEM;
 	}
 
+	/* Allocate RS last_id tracking array */
+	uint16_t num_rs_buckets = nb_desc / tx_rs_thresh;
+	txq->rs_last_id = rte_zmalloc_socket(NULL, sizeof(txq->rs_last_id[0]) * num_rs_buckets,
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (txq->rs_last_id == NULL) {
+		PMD_INIT_LOG(ERR, "Failed to allocate memory for RS last_id array");
+		rte_free(txq->sw_ring);
+		rte_free(txq);
+		return -ENOMEM;
+	}
+
 	/* Allocate TX hardware ring descriptors. */
 	ring_size = sizeof(struct ci_tx_desc) * IAVF_MAX_RING_DESC;
 	ring_size = RTE_ALIGN(ring_size, IAVF_DMA_MEM_ALIGN);
@@ -1050,6 +1068,7 @@ iavf_dev_tx_queue_release(struct rte_eth_dev *dev, uint16_t qid)
 
 	ci_txq_release_all_mbufs(q, q->use_ctx);
 	rte_free(q->sw_ring);
+	rte_free(q->rs_last_id);
 	rte_memzone_free(q->mz);
 	rte_free(q);
 }
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 111cb5e37f..2915223397 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -5,6 +5,7 @@
 #include <ethdev_driver.h>
 #include <rte_net.h>
 #include <rte_vect.h>
+#include <rte_bitops.h>
 
 #include "ice_rxtx.h"
 #include "ice_rxtx_vec_common.h"
@@ -1589,6 +1590,13 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
 			     (int)queue_idx);
 		return -EINVAL;
 	}
+	if (!rte_is_power_of_2(tx_rs_thresh)) {
+		PMD_INIT_LOG(ERR, "tx_rs_thresh must be a power of 2. (tx_rs_thresh=%u port=%d queue=%d)",
+			     (unsigned int)tx_rs_thresh,
+			     (int)dev->data->port_id,
+			     (int)queue_idx);
+		return -EINVAL;
+	}
 	if (tx_rs_thresh > 1 && tx_conf->tx_thresh.wthresh != 0) {
 		PMD_INIT_LOG(ERR, "TX WTHRESH must be set to 0 if "
 			     "tx_rs_thresh is greater than 1. "
@@ -1631,6 +1639,7 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
 	txq->mz = tz;
 	txq->nb_tx_desc = nb_desc;
 	txq->tx_rs_thresh = tx_rs_thresh;
+	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
 	txq->tx_free_thresh = tx_free_thresh;
 	txq->queue_id = queue_idx;
 
@@ -1657,6 +1666,16 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
 		return -ENOMEM;
 	}
 
+	/* Allocate RS last_id tracking array */
+	uint16_t num_rs_buckets = nb_desc / tx_rs_thresh;
+	txq->rs_last_id = rte_zmalloc_socket(NULL, sizeof(txq->rs_last_id[0]) * num_rs_buckets,
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (txq->rs_last_id == NULL) {
+		ice_tx_queue_release(txq);
+		PMD_INIT_LOG(ERR, "Failed to allocate memory for RS last_id array");
+		return -ENOMEM;
+	}
+
 	if (vsi->type == ICE_VSI_PF && (offloads & RTE_ETH_TX_OFFLOAD_SEND_ON_TIMESTAMP)) {
 		if (hw->phy_model != ICE_PHY_E830) {
 			ice_tx_queue_release(txq);
@@ -1729,6 +1748,7 @@ ice_tx_queue_release(void *txq)
 
 	ci_txq_release_all_mbufs(q, false);
 	rte_free(q->sw_ring);
+	rte_free(q->rs_last_id);
 	if (q->tsq) {
 		rte_memzone_free(q->tsq->ts_mz);
 		rte_free(q->tsq);
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 77f4099f2b..04db8823eb 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -5,6 +5,7 @@
 #include <eal_export.h>
 #include <rte_mbuf_dyn.h>
 #include <rte_errno.h>
+#include <rte_bitops.h>
 
 #include "idpf_common_rxtx.h"
 #include "idpf_common_device.h"
@@ -73,6 +74,11 @@ idpf_qc_tx_thresh_check(uint16_t nb_desc, uint16_t tx_rs_thresh,
 			tx_rs_thresh, nb_desc);
 		return -EINVAL;
 	}
+	if (!rte_is_power_of_2(tx_rs_thresh)) {
+		DRV_LOG(ERR, "tx_rs_thresh must be a power of 2. (tx_rs_thresh=%u)",
+			tx_rs_thresh);
+		return -EINVAL;
+	}
 
 	return 0;
 }
@@ -333,6 +339,7 @@ idpf_qc_tx_queue_release(void *txq)
 	}
 
 	ci_txq_release_all_mbufs(q, false);
+	rte_free(q->rs_last_id);
 	rte_free(q->sw_ring);
 	rte_memzone_free(q->mz);
 	rte_free(q);
diff --git a/drivers/net/intel/idpf/idpf_rxtx.c b/drivers/net/intel/idpf/idpf_rxtx.c
index 7d9c885458..9420200f6d 100644
--- a/drivers/net/intel/idpf/idpf_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_rxtx.c
@@ -447,6 +447,7 @@ idpf_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 
 	txq->nb_tx_desc = nb_desc;
 	txq->tx_rs_thresh = tx_rs_thresh;
+	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
 	txq->tx_free_thresh = tx_free_thresh;
 	txq->queue_id = vport->chunks_info.tx_start_qid + queue_idx;
 	txq->port_id = dev->data->port_id;
@@ -480,6 +481,15 @@ idpf_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 		goto err_sw_ring_alloc;
 	}
 
+	txq->rs_last_id = rte_zmalloc_socket("idpf tx rs_last_id",
+			sizeof(txq->rs_last_id[0]) * (nb_desc >> txq->log2_rs_thresh),
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (txq->rs_last_id == NULL) {
+		PMD_INIT_LOG(ERR, "Failed to allocate memory for TX RS tracking");
+		ret = -ENOMEM;
+		goto err_rs_last_id_alloc;
+	}
+
 	if (!is_splitq) {
 		txq->ci_tx_ring = mz->addr;
 		idpf_qc_single_tx_queue_reset(txq);
@@ -502,6 +512,8 @@ idpf_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
 	return 0;
 
 err_complq_setup:
+	rte_free(txq->rs_last_id);
+err_rs_last_id_alloc:
 	rte_free(txq->sw_ring);
 err_sw_ring_alloc:
 	idpf_dma_zone_release(mz);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v5 24/35] net/ixgbe: use separate array for desc status tracking
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (22 preceding siblings ...)
  2026-02-11 18:12   ` [PATCH v5 23/35] net/intel: use separate array for desc status tracking Bruce Richardson
@ 2026-02-11 18:12   ` Bruce Richardson
  2026-02-11 18:12   ` [PATCH v5 25/35] net/intel: drop unused Tx queue used count Bruce Richardson
                     ` (11 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 18:12 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Anatoly Burakov, Vladimir Medvedkin

Due to significant differences in the ixgbe transmit descriptors, the
ixgbe driver does not use the common scalar Tx functionality. Update the
driver directly so its use of the rs_last_id array matches that of the
common Tx code.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 drivers/net/intel/ixgbe/ixgbe_rxtx.c | 86 +++++++++++++++-------------
 1 file changed, 47 insertions(+), 39 deletions(-)

diff --git a/drivers/net/intel/ixgbe/ixgbe_rxtx.c b/drivers/net/intel/ixgbe/ixgbe_rxtx.c
index 0af04c9b0d..3e37ccc50d 100644
--- a/drivers/net/intel/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/intel/ixgbe/ixgbe_rxtx.c
@@ -43,6 +43,7 @@
 #include <rte_ip.h>
 #include <rte_net.h>
 #include <rte_vect.h>
+#include <rte_bitops.h>
 
 #include "ixgbe_logs.h"
 #include "base/ixgbe_api.h"
@@ -571,57 +572,35 @@ tx_desc_ol_flags_to_cmdtype(uint64_t ol_flags)
 static inline int
 ixgbe_xmit_cleanup(struct ci_tx_queue *txq)
 {
-	struct ci_tx_entry *sw_ring = txq->sw_ring;
 	volatile union ixgbe_adv_tx_desc *txr = txq->ixgbe_tx_ring;
-	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
-	uint16_t nb_tx_desc = txq->nb_tx_desc;
-	uint16_t desc_to_clean_to;
-	uint16_t nb_tx_to_clean;
-	uint32_t status;
+	const uint16_t last_desc_cleaned = txq->last_desc_cleaned;
+	const uint16_t nb_tx_desc = txq->nb_tx_desc;
 
-	/* Determine the last descriptor needing to be cleaned */
-	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq->tx_rs_thresh);
-	if (desc_to_clean_to >= nb_tx_desc)
-		desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
+	const uint16_t rs_idx = (last_desc_cleaned == nb_tx_desc - 1) ?
+			0 :
+			(last_desc_cleaned + 1) >> txq->log2_rs_thresh;
+	uint16_t desc_to_clean_to = (rs_idx << txq->log2_rs_thresh) + (txq->tx_rs_thresh - 1);
 
-	/* Check to make sure the last descriptor to clean is done */
-	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-	status = txr[desc_to_clean_to].wb.status;
+	uint32_t status = txr[txq->rs_last_id[rs_idx]].wb.status;
 	if (!(status & rte_cpu_to_le_32(IXGBE_TXD_STAT_DD))) {
 		PMD_TX_LOG(DEBUG,
 			   "TX descriptor %4u is not done"
 			   "(port=%d queue=%d)",
-			   desc_to_clean_to,
+			   txq->rs_last_id[rs_idx],
 			   txq->port_id, txq->queue_id);
 		/* Failed to clean any descriptors, better luck next time */
 		return -(1);
 	}
 
-	/* Figure out how many descriptors will be cleaned */
-	if (last_desc_cleaned > desc_to_clean_to)
-		nb_tx_to_clean = (uint16_t)((nb_tx_desc - last_desc_cleaned) +
-							desc_to_clean_to);
-	else
-		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
-						last_desc_cleaned);
-
 	PMD_TX_LOG(DEBUG,
 		   "Cleaning %4u TX descriptors: %4u to %4u "
 		   "(port=%d queue=%d)",
-		   nb_tx_to_clean, last_desc_cleaned, desc_to_clean_to,
+		   txq->tx_rs_thresh, last_desc_cleaned, desc_to_clean_to,
 		   txq->port_id, txq->queue_id);
 
-	/*
-	 * The last descriptor to clean is done, so that means all the
-	 * descriptors from the last descriptor that was cleaned
-	 * up to the last descriptor with the RS bit set
-	 * are done. Only reset the threshold descriptor.
-	 */
-	txr[desc_to_clean_to].wb.status = 0;
-
 	/* Update the txq to reflect the last descriptor that was cleaned */
 	txq->last_desc_cleaned = desc_to_clean_to;
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
+	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + txq->tx_rs_thresh);
 
 	/* No Error */
 	return 0;
@@ -749,6 +728,9 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		if (tx_last >= txq->nb_tx_desc)
 			tx_last = (uint16_t) (tx_last - txq->nb_tx_desc);
 
+		/* Track the RS threshold bucket at packet start */
+		uint16_t pkt_rs_idx = (uint16_t)(tx_id >> txq->log2_rs_thresh);
+
 		PMD_TX_LOG(DEBUG, "port_id=%u queue_id=%u pktlen=%u"
 			   " tx_first=%u tx_last=%u",
 			   (unsigned) txq->port_id,
@@ -876,7 +858,6 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 					tx_offload,
 					rte_security_dynfield(tx_pkt));
 
-				txe->last_id = tx_last;
 				tx_id = txe->next_id;
 				txe = txn;
 			}
@@ -922,7 +903,6 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 				rte_cpu_to_le_32(cmd_type_len | slen);
 			txd->read.olinfo_status =
 				rte_cpu_to_le_32(olinfo_status);
-			txe->last_id = tx_last;
 			tx_id = txe->next_id;
 			txe = txn;
 			m_seg = m_seg->next;
@@ -935,8 +915,18 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
-		/* Set RS bit only on threshold packets' last descriptor */
-		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
+		/*
+		 * Check if packet crosses into a new RS threshold bucket.
+		 * The RS bit is set on the last descriptor when we move from one bucket to another.
+		 * For example, with tx_rs_thresh=32 and a 5-descriptor packet using slots 30-34:
+		 *   - pkt_rs_idx = 30 >> 5 = 0 (started in bucket 0)
+		 *   - tx_last = 34, so 35 >> 5 = 1 (next packet is in bucket 1)
+		 *   - Since 0 != 1, set RS bit on descriptor 34, and record rs_last_id[0] = 34
+		 */
+		uint16_t next_rs_idx = ((tx_last + 1) >> txq->log2_rs_thresh);
+
+		if (next_rs_idx != pkt_rs_idx) {
+			/* Packet crossed into a new bucket - set RS bit on last descriptor */
 			PMD_TX_LOG(DEBUG,
 				   "Setting RS bit on TXD id="
 				   "%4u (port=%d queue=%d)",
@@ -944,9 +934,8 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 			cmd_type_len |= IXGBE_TXD_CMD_RS;
 
-			/* Update txq RS bit counters */
-			txq->nb_tx_used = 0;
-			txp = NULL;
+			/* Record the last descriptor ID for the bucket we're leaving */
+			txq->rs_last_id[pkt_rs_idx] = tx_last;
 		} else
 			txp = txd;
 
@@ -2521,6 +2510,7 @@ ixgbe_tx_queue_release(struct ci_tx_queue *txq)
 	if (txq != NULL && txq->ops != NULL) {
 		ci_txq_release_all_mbufs(txq, false);
 		txq->ops->free_swring(txq);
+		rte_free(txq->rs_last_id);
 		rte_memzone_free(txq->mz);
 		rte_free(txq);
 	}
@@ -2825,6 +2815,13 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev,
 			     (int)dev->data->port_id, (int)queue_idx);
 		return -(EINVAL);
 	}
+	if (!rte_is_power_of_2(tx_rs_thresh)) {
+		PMD_INIT_LOG(ERR, "tx_rs_thresh must be a power of 2. (tx_rs_thresh=%u port=%d queue=%d)",
+			     (unsigned int)tx_rs_thresh,
+			     (int)dev->data->port_id,
+			     (int)queue_idx);
+		return -(EINVAL);
+	}
 
 	/*
 	 * If rs_bit_thresh is greater than 1, then TX WTHRESH should be
@@ -2870,6 +2867,7 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	txq->mz = tz;
 	txq->nb_tx_desc = nb_desc;
 	txq->tx_rs_thresh = tx_rs_thresh;
+	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
 	txq->tx_free_thresh = tx_free_thresh;
 	txq->pthresh = tx_conf->tx_thresh.pthresh;
 	txq->hthresh = tx_conf->tx_thresh.hthresh;
@@ -2913,6 +2911,16 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	PMD_INIT_LOG(DEBUG, "sw_ring=%p hw_ring=%p dma_addr=0x%"PRIx64,
 		     txq->sw_ring, txq->ixgbe_tx_ring, txq->tx_ring_dma);
 
+	/* Allocate RS last_id tracking array */
+	uint16_t num_rs_buckets = nb_desc / tx_rs_thresh;
+	txq->rs_last_id = rte_zmalloc_socket(NULL, sizeof(txq->rs_last_id[0]) * num_rs_buckets,
+			RTE_CACHE_LINE_SIZE, socket_id);
+	if (txq->rs_last_id == NULL) {
+		ixgbe_tx_queue_release(txq);
+		PMD_DRV_LOG(ERR, "Failed to allocate memory for RS last_id array");
+		return -ENOMEM;
+	}
+
 	/* set up vector or scalar TX function as appropriate */
 	ixgbe_set_tx_function(dev, txq);
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v5 25/35] net/intel: drop unused Tx queue used count
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (23 preceding siblings ...)
  2026-02-11 18:12   ` [PATCH v5 24/35] net/ixgbe: " Bruce Richardson
@ 2026-02-11 18:12   ` Bruce Richardson
  2026-02-11 18:12   ` [PATCH v5 26/35] net/intel: remove index for tracking end of packet Bruce Richardson
                     ` (10 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 18:12 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Anatoly Burakov, Vladimir Medvedkin,
	Jingjing Wu, Praveen Shetty

Since drivers now track the setting of the RS bit based on fixed
thresholds rather than after a fixed number of descriptors, we no longer
need to track the number of descriptors used from one call to another.
Therefore we can remove the tx_used value in the Tx queue structure.

This value was still being used inside the IDPF splitq scalar code,
however, the ipdf driver-specific section of the Tx queue structure also
had an rs_compl_count value that was only used for the vector code
paths, so we can use it to replace the old tx_used value in the scalar
path.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 drivers/net/intel/common/tx.h                   | 1 -
 drivers/net/intel/common/tx_scalar.h            | 1 -
 drivers/net/intel/i40e/i40e_rxtx.c              | 1 -
 drivers/net/intel/iavf/iavf_rxtx.c              | 1 -
 drivers/net/intel/ice/ice_dcf_ethdev.c          | 1 -
 drivers/net/intel/ice/ice_rxtx.c                | 1 -
 drivers/net/intel/idpf/idpf_common_rxtx.c       | 8 +++-----
 drivers/net/intel/ixgbe/ixgbe_rxtx.c            | 8 --------
 drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c | 1 -
 9 files changed, 3 insertions(+), 20 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index acd362dca3..a4ef230523 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -138,7 +138,6 @@ struct ci_tx_queue {
 	uint16_t *rs_last_id;
 	uint16_t nb_tx_desc;           /* number of TX descriptors */
 	uint16_t tx_tail; /* current value of tail register */
-	uint16_t nb_tx_used; /* number of TX desc used since RS bit set */
 	/* index to last TX descriptor to have been cleaned */
 	uint16_t last_desc_cleaned;
 	/* Total number of TX descriptors ready to be allocated. */
diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index 7499e5ed20..c91a8156a2 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -400,7 +400,6 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
 			m_seg = m_seg->next;
 		} while (m_seg);
 end_pkt:
-		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
 		/* Check if packet crosses into a new RS threshold bucket.
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index b554bc6c31..1303010819 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -2645,7 +2645,6 @@ i40e_reset_tx_queue(struct ci_tx_queue *txq)
 	txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 
 	txq->last_desc_cleaned = (uint16_t)(txq->nb_tx_desc - 1);
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_desc - 1);
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index d63590d660..05aca9b1dd 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -288,7 +288,6 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	}
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 
 	txq->last_desc_cleaned = txq->nb_tx_desc - 1;
 	txq->nb_tx_free = txq->nb_tx_desc - 1;
diff --git a/drivers/net/intel/ice/ice_dcf_ethdev.c b/drivers/net/intel/ice/ice_dcf_ethdev.c
index 4ceecc15c6..02a23629d6 100644
--- a/drivers/net/intel/ice/ice_dcf_ethdev.c
+++ b/drivers/net/intel/ice/ice_dcf_ethdev.c
@@ -414,7 +414,6 @@ reset_tx_queue(struct ci_tx_queue *txq)
 	}
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 
 	txq->last_desc_cleaned = txq->nb_tx_desc - 1;
 	txq->nb_tx_free = txq->nb_tx_desc - 1;
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 2915223397..87ffcd3895 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -1130,7 +1130,6 @@ ice_reset_tx_queue(struct ci_tx_queue *txq)
 	txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 
 	txq->last_desc_cleaned = (uint16_t)(txq->nb_tx_desc - 1);
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_desc - 1);
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index 04db8823eb..c2dcf3cde3 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -224,7 +224,6 @@ idpf_qc_split_tx_descq_reset(struct ci_tx_queue *txq)
 	}
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 
 	/* Use this as next to clean for split desc queue */
 	txq->last_desc_cleaned = 0;
@@ -284,7 +283,6 @@ idpf_qc_single_tx_queue_reset(struct ci_tx_queue *txq)
 	}
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 
 	txq->last_desc_cleaned = txq->nb_tx_desc - 1;
 	txq->nb_tx_free = txq->nb_tx_desc - 1;
@@ -992,12 +990,12 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		txd->qw1.cmd_dtype |= IDPF_TXD_FLEX_FLOW_CMD_EOP;
 
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
-		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
+		txq->rs_compl_count += nb_used;
 
-		if (txq->nb_tx_used >= 32) {
+		if (txq->rs_compl_count >= 32) {
 			txd->qw1.cmd_dtype |= IDPF_TXD_FLEX_FLOW_CMD_RE;
 			/* Update txq RE bit counters */
-			txq->nb_tx_used = 0;
+			txq->rs_compl_count = 0;
 		}
 	}
 
diff --git a/drivers/net/intel/ixgbe/ixgbe_rxtx.c b/drivers/net/intel/ixgbe/ixgbe_rxtx.c
index 3e37ccc50d..ea609d926a 100644
--- a/drivers/net/intel/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/intel/ixgbe/ixgbe_rxtx.c
@@ -708,12 +708,6 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		 */
 		nb_used = (uint16_t)(tx_pkt->nb_segs + new_ctx);
 
-		if (txp != NULL &&
-				nb_used + txq->nb_tx_used >= txq->tx_rs_thresh)
-			/* set RS on the previous packet in the burst */
-			txp->read.cmd_type_len |=
-				rte_cpu_to_le_32(IXGBE_TXD_CMD_RS);
-
 		/*
 		 * The number of descriptors that must be allocated for a
 		 * packet is the number of segments of that packet, plus 1
@@ -912,7 +906,6 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		 * The last packet data descriptor needs End Of Packet (EOP)
 		 */
 		cmd_type_len |= IXGBE_TXD_CMD_EOP;
-		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
 		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
 
 		/*
@@ -2551,7 +2544,6 @@ ixgbe_reset_tx_queue(struct ci_tx_queue *txq)
 	txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 	/*
 	 * Always allow 1 descriptor to be un-allocated to avoid
 	 * a H/W race condition
diff --git a/drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c b/drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c
index eb7c79eaf9..63c7cb50d3 100644
--- a/drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c
+++ b/drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c
@@ -47,7 +47,6 @@ ixgbe_reset_tx_queue_vec(struct ci_tx_queue *txq)
 	txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 
 	txq->tx_tail = 0;
-	txq->nb_tx_used = 0;
 	/*
 	 * Always allow 1 descriptor to be un-allocated to avoid
 	 * a H/W race condition
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v5 26/35] net/intel: remove index for tracking end of packet
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (24 preceding siblings ...)
  2026-02-11 18:12   ` [PATCH v5 25/35] net/intel: drop unused Tx queue used count Bruce Richardson
@ 2026-02-11 18:12   ` Bruce Richardson
  2026-02-11 18:12   ` [PATCH v5 27/35] net/intel: merge ring writes in simple Tx for ice and i40e Bruce Richardson
                     ` (9 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 18:12 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Anatoly Burakov, Vladimir Medvedkin,
	Jingjing Wu, Praveen Shetty

The last_id value in each tx_sw_queue entry was no longer used in the
datapath, remove it and its initialization. For the function releasing
packets back, rather than relying on "last_id" to identify end of
packet, instead check for the next pointer being NULL.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 drivers/net/intel/common/tx.h             | 1 -
 drivers/net/intel/i40e/i40e_rxtx.c        | 8 +++-----
 drivers/net/intel/iavf/iavf_rxtx.c        | 9 ++++-----
 drivers/net/intel/ice/ice_dcf_ethdev.c    | 1 -
 drivers/net/intel/ice/ice_rxtx.c          | 9 ++++-----
 drivers/net/intel/idpf/idpf_common_rxtx.c | 2 --
 drivers/net/intel/ixgbe/ixgbe_rxtx.c      | 9 ++++-----
 7 files changed, 15 insertions(+), 24 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index a4ef230523..ee7c83cf00 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -112,7 +112,6 @@ struct ci_tx_queue;
 struct ci_tx_entry {
 	struct rte_mbuf *mbuf; /* mbuf associated with TX desc, if any. */
 	uint16_t next_id; /* Index of next descriptor in ring. */
-	uint16_t last_id; /* Index of last scattered descriptor. */
 };
 
 /**
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 1303010819..ba94c59c0a 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -2536,14 +2536,13 @@ i40e_tx_done_cleanup_full(struct ci_tx_queue *txq,
 			pkt_cnt < free_cnt &&
 			tx_id != tx_last; i++) {
 			if (swr_ring[tx_id].mbuf != NULL) {
-				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
-				swr_ring[tx_id].mbuf = NULL;
-
 				/*
 				 * last segment in the packet,
 				 * increment packet count
 				 */
-				pkt_cnt += (swr_ring[tx_id].last_id == tx_id);
+				pkt_cnt += (swr_ring[tx_id].mbuf->next == NULL) ? 1 : 0;
+				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
+				swr_ring[tx_id].mbuf = NULL;
 			}
 
 			tx_id = swr_ring[tx_id].next_id;
@@ -2636,7 +2635,6 @@ i40e_reset_tx_queue(struct ci_tx_queue *txq)
 		txd->cmd_type_offset_bsz =
 			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
-		txe[i].last_id = i;
 		txe[prev].next_id = i;
 		prev = i;
 	}
diff --git a/drivers/net/intel/iavf/iavf_rxtx.c b/drivers/net/intel/iavf/iavf_rxtx.c
index 05aca9b1dd..e621d4bf47 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.c
+++ b/drivers/net/intel/iavf/iavf_rxtx.c
@@ -282,7 +282,6 @@ reset_tx_queue(struct ci_tx_queue *txq)
 		txq->ci_tx_ring[i].cmd_type_offset_bsz =
 			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
-		txe[i].last_id = i;
 		txe[prev].next_id = i;
 		prev = i;
 	}
@@ -3957,14 +3956,14 @@ iavf_tx_done_cleanup_full(struct ci_tx_queue *txq,
 	while (pkt_cnt < free_cnt) {
 		do {
 			if (swr_ring[tx_id].mbuf != NULL) {
-				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
-				swr_ring[tx_id].mbuf = NULL;
-
 				/*
 				 * last segment in the packet,
 				 * increment packet count
 				 */
-				pkt_cnt += (swr_ring[tx_id].last_id == tx_id);
+				pkt_cnt += (swr_ring[tx_id].mbuf->next == NULL) ? 1 : 0;
+				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
+				swr_ring[tx_id].mbuf = NULL;
+
 			}
 
 			tx_id = swr_ring[tx_id].next_id;
diff --git a/drivers/net/intel/ice/ice_dcf_ethdev.c b/drivers/net/intel/ice/ice_dcf_ethdev.c
index 02a23629d6..abd7875e7b 100644
--- a/drivers/net/intel/ice/ice_dcf_ethdev.c
+++ b/drivers/net/intel/ice/ice_dcf_ethdev.c
@@ -408,7 +408,6 @@ reset_tx_queue(struct ci_tx_queue *txq)
 		txq->ci_tx_ring[i].cmd_type_offset_bsz =
 			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
-		txe[i].last_id = i;
 		txe[prev].next_id = i;
 		prev = i;
 	}
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 87ffcd3895..fe65df94da 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -1121,7 +1121,6 @@ ice_reset_tx_queue(struct ci_tx_queue *txq)
 		txd->cmd_type_offset_bsz =
 			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
-		txe[i].last_id = i;
 		txe[prev].next_id = i;
 		prev = i;
 	}
@@ -3201,14 +3200,14 @@ ice_tx_done_cleanup_full(struct ci_tx_queue *txq,
 			pkt_cnt < free_cnt &&
 			tx_id != tx_last; i++) {
 			if (swr_ring[tx_id].mbuf != NULL) {
-				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
-				swr_ring[tx_id].mbuf = NULL;
-
 				/*
 				 * last segment in the packet,
 				 * increment packet count
 				 */
-				pkt_cnt += (swr_ring[tx_id].last_id == tx_id);
+				pkt_cnt += swr_ring[tx_id].mbuf->next == NULL ? 1 : 0;
+				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
+				swr_ring[tx_id].mbuf = NULL;
+
 			}
 
 			tx_id = swr_ring[tx_id].next_id;
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index c2dcf3cde3..f14a20d6ec 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -218,7 +218,6 @@ idpf_qc_split_tx_descq_reset(struct ci_tx_queue *txq)
 	prev = (uint16_t)(txq->sw_nb_desc - 1);
 	for (i = 0; i < txq->sw_nb_desc; i++) {
 		txe[i].mbuf = NULL;
-		txe[i].last_id = i;
 		txe[prev].next_id = i;
 		prev = i;
 	}
@@ -277,7 +276,6 @@ idpf_qc_single_tx_queue_reset(struct ci_tx_queue *txq)
 		txq->ci_tx_ring[i].cmd_type_offset_bsz =
 			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
 		txe[i].mbuf =  NULL;
-		txe[i].last_id = i;
 		txe[prev].next_id = i;
 		prev = i;
 	}
diff --git a/drivers/net/intel/ixgbe/ixgbe_rxtx.c b/drivers/net/intel/ixgbe/ixgbe_rxtx.c
index ea609d926a..dc9fda8e21 100644
--- a/drivers/net/intel/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/intel/ixgbe/ixgbe_rxtx.c
@@ -2407,14 +2407,14 @@ ixgbe_tx_done_cleanup_full(struct ci_tx_queue *txq, uint32_t free_cnt)
 			pkt_cnt < free_cnt &&
 			tx_id != tx_last; i++) {
 			if (swr_ring[tx_id].mbuf != NULL) {
-				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
-				swr_ring[tx_id].mbuf = NULL;
-
 				/*
 				 * last segment in the packet,
 				 * increment packet count
 				 */
-				pkt_cnt += (swr_ring[tx_id].last_id == tx_id);
+				pkt_cnt += swr_ring[tx_id].mbuf->next == NULL ? 1 : 0;
+				rte_pktmbuf_free_seg(swr_ring[tx_id].mbuf);
+				swr_ring[tx_id].mbuf = NULL;
+
 			}
 
 			tx_id = swr_ring[tx_id].next_id;
@@ -2535,7 +2535,6 @@ ixgbe_reset_tx_queue(struct ci_tx_queue *txq)
 
 		txd->wb.status = rte_cpu_to_le_32(IXGBE_TXD_STAT_DD);
 		txe[i].mbuf = NULL;
-		txe[i].last_id = i;
 		txe[prev].next_id = i;
 		prev = i;
 	}
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v5 27/35] net/intel: merge ring writes in simple Tx for ice and i40e
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (25 preceding siblings ...)
  2026-02-11 18:12   ` [PATCH v5 26/35] net/intel: remove index for tracking end of packet Bruce Richardson
@ 2026-02-11 18:12   ` Bruce Richardson
  2026-02-11 18:12   ` [PATCH v5 28/35] net/intel: consolidate ice and i40e buffer free function Bruce Richardson
                     ` (8 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 18:12 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Vladimir Medvedkin, Anatoly Burakov

The ice and i40e drivers have identical code for writing ring entries in
the simple Tx path, so merge in the descriptor writing code.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
---
 drivers/net/intel/common/tx.h                 |  6 ++
 drivers/net/intel/common/tx_scalar.h          | 60 ++++++++++++++
 drivers/net/intel/i40e/i40e_rxtx.c            | 79 +------------------
 drivers/net/intel/i40e/i40e_rxtx.h            |  3 -
 .../net/intel/i40e/i40e_rxtx_vec_altivec.c    |  4 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c   |  4 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c |  4 +-
 drivers/net/intel/i40e/i40e_rxtx_vec_neon.c   |  4 +-
 drivers/net/intel/ice/ice_rxtx.c              | 69 +---------------
 drivers/net/intel/ice/ice_rxtx.h              |  2 -
 drivers/net/intel/ice/ice_rxtx_vec_avx2.c     |  4 +-
 drivers/net/intel/ice/ice_rxtx_vec_avx512.c   |  4 +-
 12 files changed, 86 insertions(+), 157 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index ee7c83cf00..a5cbe070fc 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -70,6 +70,12 @@ enum ci_tx_l2tag1_field {
 /* Common maximum data per TX descriptor */
 #define CI_MAX_DATA_PER_TXD     (CI_TXD_QW1_TX_BUF_SZ_M >> CI_TXD_QW1_TX_BUF_SZ_S)
 
+/* Common TX maximum burst size for chunked transmission in simple paths */
+#define CI_TX_MAX_BURST 32
+
+/* Common TX descriptor command flags for simple transmit */
+#define CI_TX_DESC_CMD_DEFAULT (CI_TX_DESC_CMD_ICRC | CI_TX_DESC_CMD_EOP)
+
 /* Checksum offload mask to identify packets requesting offload */
 #define CI_TX_CKSUM_OFFLOAD_MASK (RTE_MBUF_F_TX_IP_CKSUM |		 \
 				   RTE_MBUF_F_TX_L4_MASK |		 \
diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index c91a8156a2..3c9c1f611c 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -12,6 +12,66 @@
 /* depends on common Tx definitions. */
 #include "tx.h"
 
+/* Populate 4 descriptors with data from 4 mbufs */
+static inline void
+ci_tx_fill_hw_ring_tx4(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
+{
+	uint64_t dma_addr;
+	uint32_t i;
+
+	for (i = 0; i < 4; i++, txdp++, pkts++) {
+		dma_addr = rte_mbuf_data_iova(*pkts);
+		txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
+		txdp->cmd_type_offset_bsz =
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)CI_TX_DESC_CMD_DEFAULT << CI_TXD_QW1_CMD_S) |
+				((uint64_t)(*pkts)->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
+	}
+}
+
+/* Populate 1 descriptor with data from 1 mbuf */
+static inline void
+ci_tx_fill_hw_ring_tx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
+{
+	uint64_t dma_addr;
+
+	dma_addr = rte_mbuf_data_iova(*pkts);
+	txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
+	txdp->cmd_type_offset_bsz =
+		rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
+			((uint64_t)CI_TX_DESC_CMD_DEFAULT << CI_TXD_QW1_CMD_S) |
+			((uint64_t)(*pkts)->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
+}
+
+/* Fill hardware descriptor ring with mbuf data */
+static inline void
+ci_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
+		   uint16_t nb_pkts)
+{
+	volatile struct ci_tx_desc *txdp = &txq->ci_tx_ring[txq->tx_tail];
+	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
+	const int N_PER_LOOP = 4;
+	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
+	int mainpart, leftover;
+	int i, j;
+
+	mainpart = nb_pkts & ((uint32_t)~N_PER_LOOP_MASK);
+	leftover = nb_pkts & ((uint32_t)N_PER_LOOP_MASK);
+	for (i = 0; i < mainpart; i += N_PER_LOOP) {
+		for (j = 0; j < N_PER_LOOP; ++j)
+			(txep + i + j)->mbuf = *(pkts + i + j);
+		ci_tx_fill_hw_ring_tx4(txdp + i, pkts + i);
+	}
+
+	if (unlikely(leftover > 0)) {
+		for (i = 0; i < leftover; ++i) {
+			(txep + mainpart + i)->mbuf = *(pkts + mainpart + i);
+			ci_tx_fill_hw_ring_tx1(txdp + mainpart + i,
+					       pkts + mainpart + i);
+		}
+	}
+}
+
 /*
  * Common transmit descriptor cleanup function for Intel drivers.
  *
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index ba94c59c0a..174d517e9d 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -311,19 +311,6 @@ i40e_parse_tunneling_params(uint64_t ol_flags,
 		*cd_tunneling |= I40E_TXD_CTX_QW0_L4T_CS_MASK;
 }
 
-/* Construct the tx flags */
-static inline uint64_t
-i40e_build_ctob(uint32_t td_cmd,
-		uint32_t td_offset,
-		unsigned int size,
-		uint32_t td_tag)
-{
-	return rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
-			((uint64_t)td_cmd  << CI_TXD_QW1_CMD_S) |
-			((uint64_t)td_offset << CI_TXD_QW1_OFFSET_S) |
-			((uint64_t)size  << CI_TXD_QW1_TX_BUF_SZ_S) |
-			((uint64_t)td_tag  << CI_TXD_QW1_L2TAG1_S));
-}
 
 static inline int
 #ifdef RTE_LIBRTE_I40E_RX_ALLOW_BULK_ALLOC
@@ -1082,64 +1069,6 @@ i40e_tx_free_bufs(struct ci_tx_queue *txq)
 	return tx_rs_thresh;
 }
 
-/* Populate 4 descriptors with data from 4 mbufs */
-static inline void
-tx4(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
-{
-	uint64_t dma_addr;
-	uint32_t i;
-
-	for (i = 0; i < 4; i++, txdp++, pkts++) {
-		dma_addr = rte_mbuf_data_iova(*pkts);
-		txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
-		txdp->cmd_type_offset_bsz =
-			i40e_build_ctob((uint32_t)I40E_TD_CMD, 0,
-					(*pkts)->data_len, 0);
-	}
-}
-
-/* Populate 1 descriptor with data from 1 mbuf */
-static inline void
-tx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
-{
-	uint64_t dma_addr;
-
-	dma_addr = rte_mbuf_data_iova(*pkts);
-	txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
-	txdp->cmd_type_offset_bsz =
-		i40e_build_ctob((uint32_t)I40E_TD_CMD, 0,
-				(*pkts)->data_len, 0);
-}
-
-/* Fill hardware descriptor ring with mbuf data */
-static inline void
-i40e_tx_fill_hw_ring(struct ci_tx_queue *txq,
-		     struct rte_mbuf **pkts,
-		     uint16_t nb_pkts)
-{
-	volatile struct ci_tx_desc *txdp = &txq->ci_tx_ring[txq->tx_tail];
-	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
-	const int N_PER_LOOP = 4;
-	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
-	int mainpart, leftover;
-	int i, j;
-
-	mainpart = (nb_pkts & ((uint32_t) ~N_PER_LOOP_MASK));
-	leftover = (nb_pkts & ((uint32_t)  N_PER_LOOP_MASK));
-	for (i = 0; i < mainpart; i += N_PER_LOOP) {
-		for (j = 0; j < N_PER_LOOP; ++j) {
-			(txep + i + j)->mbuf = *(pkts + i + j);
-		}
-		tx4(txdp + i, pkts + i);
-	}
-	if (unlikely(leftover > 0)) {
-		for (i = 0; i < leftover; ++i) {
-			(txep + mainpart + i)->mbuf = *(pkts + mainpart + i);
-			tx1(txdp + mainpart + i, pkts + mainpart + i);
-		}
-	}
-}
-
 static inline uint16_t
 tx_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
@@ -1164,7 +1093,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
 	if ((txq->tx_tail + nb_pkts) > txq->nb_tx_desc) {
 		n = (uint16_t)(txq->nb_tx_desc - txq->tx_tail);
-		i40e_tx_fill_hw_ring(txq, tx_pkts, n);
+		ci_tx_fill_hw_ring(txq, tx_pkts, n);
 		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
@@ -1172,7 +1101,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	}
 
 	/* Fill hardware descriptor ring with mbuf data */
-	i40e_tx_fill_hw_ring(txq, tx_pkts + n, (uint16_t)(nb_pkts - n));
+	ci_tx_fill_hw_ring(txq, tx_pkts + n, (uint16_t)(nb_pkts - n));
 	txq->tx_tail = (uint16_t)(txq->tx_tail + (nb_pkts - n));
 
 	/* Determine if RS bit needs to be set */
@@ -1201,13 +1130,13 @@ i40e_xmit_pkts_simple(void *tx_queue,
 {
 	uint16_t nb_tx = 0;
 
-	if (likely(nb_pkts <= I40E_TX_MAX_BURST))
+	if (likely(nb_pkts <= CI_TX_MAX_BURST))
 		return tx_xmit_pkts((struct ci_tx_queue *)tx_queue,
 						tx_pkts, nb_pkts);
 
 	while (nb_pkts) {
 		uint16_t ret, num = (uint16_t)RTE_MIN(nb_pkts,
-						I40E_TX_MAX_BURST);
+						CI_TX_MAX_BURST);
 
 		ret = tx_xmit_pkts((struct ci_tx_queue *)tx_queue,
 						&tx_pkts[nb_tx], num);
diff --git a/drivers/net/intel/i40e/i40e_rxtx.h b/drivers/net/intel/i40e/i40e_rxtx.h
index db8525d52d..88d47f261e 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.h
+++ b/drivers/net/intel/i40e/i40e_rxtx.h
@@ -47,9 +47,6 @@
 #define I40E_RX_DESC_EXT_STATUS_FLEXBL_MASK   0x03
 #define I40E_RX_DESC_EXT_STATUS_FLEXBL_FLEX   0x01
 
-#define I40E_TD_CMD (CI_TX_DESC_CMD_ICRC |\
-		     CI_TX_DESC_CMD_EOP)
-
 enum i40e_header_split_mode {
 	i40e_header_split_none = 0,
 	i40e_header_split_enabled = 1,
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
index 4c36748d94..68667bdc9b 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_altivec.c
@@ -476,8 +476,8 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
-	uint64_t flags = I40E_TD_CMD;
-	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
+	uint64_t flags = CI_TX_DESC_CMD_DEFAULT;
+	uint64_t rs = CI_TX_DESC_CMD_RS | CI_TX_DESC_CMD_DEFAULT;
 	int i;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
index 502a1842c6..e1672c4371 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx2.c
@@ -741,8 +741,8 @@ i40e_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
-	uint64_t flags = I40E_TD_CMD;
-	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
+	uint64_t flags = CI_TX_DESC_CMD_DEFAULT;
+	uint64_t rs = CI_TX_DESC_CMD_RS | CI_TX_DESC_CMD_DEFAULT;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
 		ci_tx_free_bufs_vec(txq, i40e_tx_desc_done, false);
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
index d48ff9f51e..bceb95ad2d 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c
@@ -801,8 +801,8 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
-	uint64_t flags = I40E_TD_CMD;
-	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
+	uint64_t flags = CI_TX_DESC_CMD_DEFAULT;
+	uint64_t rs = CI_TX_DESC_CMD_RS | CI_TX_DESC_CMD_DEFAULT;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
 		ci_tx_free_bufs_vec(txq, i40e_tx_desc_done, false);
diff --git a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
index be4c64942e..debc9bda28 100644
--- a/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
+++ b/drivers/net/intel/i40e/i40e_rxtx_vec_neon.c
@@ -626,8 +626,8 @@ i40e_xmit_fixed_burst_vec(void *__rte_restrict tx_queue,
 	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
-	uint64_t flags = I40E_TD_CMD;
-	uint64_t rs = CI_TX_DESC_CMD_RS | I40E_TD_CMD;
+	uint64_t flags = CI_TX_DESC_CMD_DEFAULT;
+	uint64_t rs = CI_TX_DESC_CMD_RS | CI_TX_DESC_CMD_DEFAULT;
 	int i;
 
 	if (txq->nb_tx_free < txq->tx_free_thresh)
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index fe65df94da..e4fba453a9 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3286,67 +3286,6 @@ ice_tx_done_cleanup(void *txq, uint32_t free_cnt)
 		return ice_tx_done_cleanup_full(q, free_cnt);
 }
 
-/* Populate 4 descriptors with data from 4 mbufs */
-static inline void
-tx4(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
-{
-	uint64_t dma_addr;
-	uint32_t i;
-
-	for (i = 0; i < 4; i++, txdp++, pkts++) {
-		dma_addr = rte_mbuf_data_iova(*pkts);
-		txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
-		txdp->cmd_type_offset_bsz =
-			ice_build_ctob((uint32_t)ICE_TD_CMD, 0,
-				       (*pkts)->data_len, 0);
-	}
-}
-
-/* Populate 1 descriptor with data from 1 mbuf */
-static inline void
-tx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
-{
-	uint64_t dma_addr;
-
-	dma_addr = rte_mbuf_data_iova(*pkts);
-	txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
-	txdp->cmd_type_offset_bsz =
-		ice_build_ctob((uint32_t)ICE_TD_CMD, 0,
-			       (*pkts)->data_len, 0);
-}
-
-static inline void
-ice_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
-		    uint16_t nb_pkts)
-{
-	volatile struct ci_tx_desc *txdp = &txq->ci_tx_ring[txq->tx_tail];
-	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
-	const int N_PER_LOOP = 4;
-	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
-	int mainpart, leftover;
-	int i, j;
-
-	/**
-	 * Process most of the packets in chunks of N pkts.  Any
-	 * leftover packets will get processed one at a time.
-	 */
-	mainpart = nb_pkts & ((uint32_t)~N_PER_LOOP_MASK);
-	leftover = nb_pkts & ((uint32_t)N_PER_LOOP_MASK);
-	for (i = 0; i < mainpart; i += N_PER_LOOP) {
-		/* Copy N mbuf pointers to the S/W ring */
-		for (j = 0; j < N_PER_LOOP; ++j)
-			(txep + i + j)->mbuf = *(pkts + i + j);
-		tx4(txdp + i, pkts + i);
-	}
-
-	if (unlikely(leftover > 0)) {
-		for (i = 0; i < leftover; ++i) {
-			(txep + mainpart + i)->mbuf = *(pkts + mainpart + i);
-			tx1(txdp + mainpart + i, pkts + mainpart + i);
-		}
-	}
-}
-
 static inline uint16_t
 tx_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
@@ -3371,7 +3310,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
 	if ((txq->tx_tail + nb_pkts) > txq->nb_tx_desc) {
 		n = (uint16_t)(txq->nb_tx_desc - txq->tx_tail);
-		ice_tx_fill_hw_ring(txq, tx_pkts, n);
+		ci_tx_fill_hw_ring(txq, tx_pkts, n);
 		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
@@ -3379,7 +3318,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	}
 
 	/* Fill hardware descriptor ring with mbuf data */
-	ice_tx_fill_hw_ring(txq, tx_pkts + n, (uint16_t)(nb_pkts - n));
+	ci_tx_fill_hw_ring(txq, tx_pkts + n, (uint16_t)(nb_pkts - n));
 	txq->tx_tail = (uint16_t)(txq->tx_tail + (nb_pkts - n));
 
 	/* Determine if RS bit needs to be set */
@@ -3408,13 +3347,13 @@ ice_xmit_pkts_simple(void *tx_queue,
 {
 	uint16_t nb_tx = 0;
 
-	if (likely(nb_pkts <= ICE_TX_MAX_BURST))
+	if (likely(nb_pkts <= CI_TX_MAX_BURST))
 		return tx_xmit_pkts((struct ci_tx_queue *)tx_queue,
 				    tx_pkts, nb_pkts);
 
 	while (nb_pkts) {
 		uint16_t ret, num = (uint16_t)RTE_MIN(nb_pkts,
-						      ICE_TX_MAX_BURST);
+						      CI_TX_MAX_BURST);
 
 		ret = tx_xmit_pkts((struct ci_tx_queue *)tx_queue,
 				   &tx_pkts[nb_tx], num);
diff --git a/drivers/net/intel/ice/ice_rxtx.h b/drivers/net/intel/ice/ice_rxtx.h
index 7d6480b410..77ed41f9fd 100644
--- a/drivers/net/intel/ice/ice_rxtx.h
+++ b/drivers/net/intel/ice/ice_rxtx.h
@@ -46,8 +46,6 @@
 
 #define ICE_SUPPORT_CHAIN_NUM 5
 
-#define ICE_TD_CMD                      CI_TX_DESC_CMD_EOP
-
 #define ICE_VPMD_RX_BURST            CI_VPMD_RX_BURST
 #define ICE_VPMD_TX_BURST            32
 #define ICE_VPMD_RXQ_REARM_THRESH    CI_VPMD_RX_REARM_THRESH
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
index 2922671158..d03f2e5b36 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx2.c
@@ -845,8 +845,8 @@ ice_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
 	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
-	uint64_t flags = ICE_TD_CMD;
-	uint64_t rs = CI_TX_DESC_CMD_RS | ICE_TD_CMD;
+	uint64_t flags = CI_TX_DESC_CMD_DEFAULT;
+	uint64_t rs = CI_TX_DESC_CMD_RS | CI_TX_DESC_CMD_DEFAULT;
 
 	/* cross rx_thresh boundary is not allowed */
 	nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);
diff --git a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
index e64b6e227b..004c01054a 100644
--- a/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
+++ b/drivers/net/intel/ice/ice_rxtx_vec_avx512.c
@@ -909,8 +909,8 @@ ice_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,
 	volatile struct ci_tx_desc *txdp;
 	struct ci_tx_entry_vec *txep;
 	uint16_t n, nb_commit, tx_id;
-	uint64_t flags = ICE_TD_CMD;
-	uint64_t rs = CI_TX_DESC_CMD_RS | ICE_TD_CMD;
+	uint64_t flags = CI_TX_DESC_CMD_DEFAULT;
+	uint64_t rs = CI_TX_DESC_CMD_RS | CI_TX_DESC_CMD_DEFAULT;
 
 	/* cross rx_thresh boundary is not allowed */
 	nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v5 28/35] net/intel: consolidate ice and i40e buffer free function
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (26 preceding siblings ...)
  2026-02-11 18:12   ` [PATCH v5 27/35] net/intel: merge ring writes in simple Tx for ice and i40e Bruce Richardson
@ 2026-02-11 18:12   ` Bruce Richardson
  2026-02-11 18:12   ` [PATCH v5 29/35] net/intel: complete merging simple Tx paths Bruce Richardson
                     ` (7 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 18:12 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Vladimir Medvedkin, Anatoly Burakov

The buffer freeing function for the simple scalar Tx path is almost
identical in both ice and i40e drivers, except that the i40e has
batching for the FAST_FREE case. Consolidate both functions into a
common one based off the better i40e version.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
---
 drivers/net/intel/common/tx.h        |  3 ++
 drivers/net/intel/common/tx_scalar.h | 58 +++++++++++++++++++++++++
 drivers/net/intel/i40e/i40e_rxtx.c   | 63 +---------------------------
 drivers/net/intel/ice/ice_rxtx.c     | 45 +-------------------
 4 files changed, 65 insertions(+), 104 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index a5cbe070fc..f38e43f65a 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -73,6 +73,9 @@ enum ci_tx_l2tag1_field {
 /* Common TX maximum burst size for chunked transmission in simple paths */
 #define CI_TX_MAX_BURST 32
 
+/* Common TX maximum free buffer size for batched bulk freeing */
+#define CI_TX_MAX_FREE_BUF_SZ 64
+
 /* Common TX descriptor command flags for simple transmit */
 #define CI_TX_DESC_CMD_DEFAULT (CI_TX_DESC_CMD_ICRC | CI_TX_DESC_CMD_EOP)
 
diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index 3c9c1f611c..a85be8bbb7 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -72,6 +72,64 @@ ci_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
 	}
 }
 
+/* Free transmitted mbufs from descriptor ring with bulk freeing for Tx simple path */
+static __rte_always_inline int
+ci_tx_free_bufs(struct ci_tx_queue *txq)
+{
+	const uint16_t rs_thresh = txq->tx_rs_thresh;
+	const uint16_t k = RTE_ALIGN_FLOOR(rs_thresh, CI_TX_MAX_FREE_BUF_SZ);
+	const uint16_t m = rs_thresh % CI_TX_MAX_FREE_BUF_SZ;
+	struct rte_mbuf *free[CI_TX_MAX_FREE_BUF_SZ];
+	struct ci_tx_entry *txep;
+
+	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
+			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
+		return 0;
+
+	txep = &txq->sw_ring[txq->tx_next_dd - (rs_thresh - 1)];
+
+	struct rte_mempool *fast_free_mp =
+			likely(txq->fast_free_mp != (void *)UINTPTR_MAX) ?
+				txq->fast_free_mp :
+				(txq->fast_free_mp = txep[0].mbuf->pool);
+
+	if (fast_free_mp) {
+		if (k) {
+			for (uint16_t j = 0; j != k; j += CI_TX_MAX_FREE_BUF_SZ) {
+				for (uint16_t i = 0; i < CI_TX_MAX_FREE_BUF_SZ; ++i, ++txep) {
+					free[i] = txep->mbuf;
+					txep->mbuf = NULL;
+				}
+				rte_mbuf_raw_free_bulk(fast_free_mp, free, CI_TX_MAX_FREE_BUF_SZ);
+			}
+		}
+
+		if (m) {
+			for (uint16_t i = 0; i < m; ++i, ++txep) {
+				free[i] = txep->mbuf;
+				txep->mbuf = NULL;
+			}
+			rte_mbuf_raw_free_bulk(fast_free_mp, free, m);
+		}
+	} else {
+		for (uint16_t i = 0; i < rs_thresh; ++i, ++txep)
+			rte_prefetch0((txep + i)->mbuf);
+
+		for (uint16_t i = 0; i < rs_thresh; ++i, ++txep) {
+			rte_pktmbuf_free_seg(txep->mbuf);
+			txep->mbuf = NULL;
+		}
+	}
+
+	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + rs_thresh);
+	txq->tx_next_dd = (uint16_t)(txq->tx_next_dd + rs_thresh);
+	if (txq->tx_next_dd >= txq->nb_tx_desc)
+		txq->tx_next_dd = (uint16_t)(rs_thresh - 1);
+
+	return rs_thresh;
+}
+
 /*
  * Common transmit descriptor cleanup function for Intel drivers.
  *
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 174d517e9d..6b8d9fd70e 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -1010,65 +1010,6 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			get_context_desc, NULL, NULL);
 }
 
-static __rte_always_inline int
-i40e_tx_free_bufs(struct ci_tx_queue *txq)
-{
-	struct ci_tx_entry *txep;
-	const uint16_t tx_rs_thresh = txq->tx_rs_thresh;
-	uint16_t i, j;
-	struct rte_mbuf *free[I40E_TX_MAX_FREE_BUF_SZ];
-	const uint16_t k = RTE_ALIGN_FLOOR(tx_rs_thresh, I40E_TX_MAX_FREE_BUF_SZ);
-	const uint16_t m = tx_rs_thresh % I40E_TX_MAX_FREE_BUF_SZ;
-
-	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
-			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
-		return 0;
-
-	txep = &txq->sw_ring[txq->tx_next_dd - (tx_rs_thresh - 1)];
-
-	struct rte_mempool *fast_free_mp =
-			likely(txq->fast_free_mp != (void *)UINTPTR_MAX) ?
-			txq->fast_free_mp :
-			(txq->fast_free_mp = txep[0].mbuf->pool);
-
-	if (fast_free_mp != NULL) {
-		if (k) {
-			for (j = 0; j != k; j += I40E_TX_MAX_FREE_BUF_SZ) {
-				for (i = 0; i < I40E_TX_MAX_FREE_BUF_SZ; ++i, ++txep) {
-					free[i] = txep->mbuf;
-					txep->mbuf = NULL;
-				}
-				rte_mbuf_raw_free_bulk(fast_free_mp, free,
-						I40E_TX_MAX_FREE_BUF_SZ);
-			}
-		}
-
-		if (m) {
-			for (i = 0; i < m; ++i, ++txep) {
-				free[i] = txep->mbuf;
-				txep->mbuf = NULL;
-			}
-			rte_mbuf_raw_free_bulk(fast_free_mp, free, m);
-		}
-	} else {
-		for (i = 0; i < tx_rs_thresh; i++)
-			rte_prefetch0((txep + i)->mbuf);
-
-		for (i = 0; i < tx_rs_thresh; ++i, ++txep) {
-			rte_pktmbuf_free_seg(txep->mbuf);
-			txep->mbuf = NULL;
-		}
-	}
-
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + tx_rs_thresh);
-	txq->tx_next_dd = (uint16_t)(txq->tx_next_dd + tx_rs_thresh);
-	if (txq->tx_next_dd >= txq->nb_tx_desc)
-		txq->tx_next_dd = (uint16_t)(tx_rs_thresh - 1);
-
-	return tx_rs_thresh;
-}
-
 static inline uint16_t
 tx_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
@@ -1083,7 +1024,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	 * descriptor, free the associated buffer.
 	 */
 	if (txq->nb_tx_free < txq->tx_free_thresh)
-		i40e_tx_free_bufs(txq);
+		ci_tx_free_bufs(txq);
 
 	/* Use available descriptor only */
 	nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
@@ -2508,7 +2449,7 @@ i40e_tx_done_cleanup_simple(struct ci_tx_queue *txq,
 		if (txq->nb_tx_desc - txq->nb_tx_free < txq->tx_rs_thresh)
 			break;
 
-		n = i40e_tx_free_bufs(txq);
+		n = ci_tx_free_bufs(txq);
 
 		if (n == 0)
 			break;
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index e4fba453a9..a3a94033bf 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3129,47 +3129,6 @@ ice_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 	return ci_xmit_pkts(txq, tx_pkts, nb_pkts, CI_VLAN_IN_L2TAG1, get_context_desc, NULL, NULL);
 }
 
-static __rte_always_inline int
-ice_tx_free_bufs(struct ci_tx_queue *txq)
-{
-	struct ci_tx_entry *txep;
-	uint16_t i;
-
-	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
-	     rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
-	    rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
-		return 0;
-
-	txep = &txq->sw_ring[txq->tx_next_dd - (txq->tx_rs_thresh - 1)];
-
-	struct rte_mempool *fast_free_mp =
-			likely(txq->fast_free_mp != (void *)UINTPTR_MAX) ?
-			txq->fast_free_mp :
-			(txq->fast_free_mp = txep[0].mbuf->pool);
-
-	if (fast_free_mp != NULL) {
-		for (i = 0; i < txq->tx_rs_thresh; ++i, ++txep) {
-			rte_mempool_put(fast_free_mp, txep->mbuf);
-			txep->mbuf = NULL;
-		}
-	} else {
-		for (i = 0; i < txq->tx_rs_thresh; i++)
-			rte_prefetch0((txep + i)->mbuf);
-
-		for (i = 0; i < txq->tx_rs_thresh; ++i, ++txep) {
-			rte_pktmbuf_free_seg(txep->mbuf);
-			txep->mbuf = NULL;
-		}
-	}
-
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + txq->tx_rs_thresh);
-	txq->tx_next_dd = (uint16_t)(txq->tx_next_dd + txq->tx_rs_thresh);
-	if (txq->tx_next_dd >= txq->nb_tx_desc)
-		txq->tx_next_dd = (uint16_t)(txq->tx_rs_thresh - 1);
-
-	return txq->tx_rs_thresh;
-}
-
 static int
 ice_tx_done_cleanup_full(struct ci_tx_queue *txq,
 			uint32_t free_cnt)
@@ -3259,7 +3218,7 @@ ice_tx_done_cleanup_simple(struct ci_tx_queue *txq,
 		if (txq->nb_tx_desc - txq->nb_tx_free < txq->tx_rs_thresh)
 			break;
 
-		n = ice_tx_free_bufs(txq);
+		n = ci_tx_free_bufs(txq);
 
 		if (n == 0)
 			break;
@@ -3300,7 +3259,7 @@ tx_xmit_pkts(struct ci_tx_queue *txq,
 	 * descriptor, free the associated buffer.
 	 */
 	if (txq->nb_tx_free < txq->tx_free_thresh)
-		ice_tx_free_bufs(txq);
+		ci_tx_free_bufs(txq);
 
 	/* Use available descriptor only */
 	nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v5 29/35] net/intel: complete merging simple Tx paths
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (27 preceding siblings ...)
  2026-02-11 18:12   ` [PATCH v5 28/35] net/intel: consolidate ice and i40e buffer free function Bruce Richardson
@ 2026-02-11 18:12   ` Bruce Richardson
  2026-02-11 18:12   ` [PATCH v5 30/35] net/intel: use non-volatile stores in simple Tx function Bruce Richardson
                     ` (6 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 18:12 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Vladimir Medvedkin, Anatoly Burakov

Complete the deduplication/merger of the ice and i40e Tx simple scalar
paths.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
---
 drivers/net/intel/common/tx_scalar.h | 87 ++++++++++++++++++++++++++++
 drivers/net/intel/i40e/i40e_rxtx.c   | 74 +----------------------
 drivers/net/intel/ice/ice_rxtx.c     | 74 +----------------------
 3 files changed, 89 insertions(+), 146 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index a85be8bbb7..ca25a2fc9d 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -130,6 +130,93 @@ ci_tx_free_bufs(struct ci_tx_queue *txq)
 	return rs_thresh;
 }
 
+/* Simple burst transmit for descriptor-based simple Tx path
+ *
+ * Transmits a burst of packets by filling hardware descriptors with mbuf
+ * data. Handles ring wrap-around and RS bit management. Performs descriptor
+ * cleanup when tx_free_thresh is reached.
+ *
+ * Returns: number of packets transmitted
+ */
+static inline uint16_t
+ci_xmit_burst_simple(struct ci_tx_queue *txq,
+		     struct rte_mbuf **tx_pkts,
+		     uint16_t nb_pkts)
+{
+	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
+	uint16_t n = 0;
+
+	/**
+	 * Begin scanning the H/W ring for done descriptors when the number
+	 * of available descriptors drops below tx_free_thresh. For each done
+	 * descriptor, free the associated buffer.
+	 */
+	if (txq->nb_tx_free < txq->tx_free_thresh)
+		ci_tx_free_bufs(txq);
+
+	/* Use available descriptor only */
+	nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
+	if (unlikely(!nb_pkts))
+		return 0;
+
+	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
+	if ((txq->tx_tail + nb_pkts) > txq->nb_tx_desc) {
+		n = (uint16_t)(txq->nb_tx_desc - txq->tx_tail);
+		ci_tx_fill_hw_ring(txq, tx_pkts, n);
+		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) <<
+					  CI_TXD_QW1_CMD_S);
+		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
+		txq->tx_tail = 0;
+	}
+
+	/* Fill hardware descriptor ring with mbuf data */
+	ci_tx_fill_hw_ring(txq, tx_pkts + n, (uint16_t)(nb_pkts - n));
+	txq->tx_tail = (uint16_t)(txq->tx_tail + (nb_pkts - n));
+
+	/* Determine if RS bit needs to be set */
+	if (txq->tx_tail > txq->tx_next_rs) {
+		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
+			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) <<
+					  CI_TXD_QW1_CMD_S);
+		txq->tx_next_rs =
+			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
+		if (txq->tx_next_rs >= txq->nb_tx_desc)
+			txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
+	}
+
+	if (txq->tx_tail >= txq->nb_tx_desc)
+		txq->tx_tail = 0;
+
+	/* Update the tx tail register */
+	rte_write32_wc((uint32_t)txq->tx_tail, txq->qtx_tail);
+
+	return nb_pkts;
+}
+
+static __rte_always_inline uint16_t
+ci_xmit_pkts_simple(struct ci_tx_queue *txq,
+		     struct rte_mbuf **tx_pkts,
+		     uint16_t nb_pkts)
+{
+	uint16_t nb_tx = 0;
+
+	if (likely(nb_pkts <= CI_TX_MAX_BURST))
+		return ci_xmit_burst_simple(txq, tx_pkts, nb_pkts);
+
+	while (nb_pkts) {
+		uint16_t ret, num = RTE_MIN(nb_pkts, CI_TX_MAX_BURST);
+
+		ret = ci_xmit_burst_simple(txq, &tx_pkts[nb_tx], num);
+		nb_tx += ret;
+		nb_pkts -= ret;
+		if (ret < num)
+			break;
+	}
+
+	return nb_tx;
+}
+
 /*
  * Common transmit descriptor cleanup function for Intel drivers.
  *
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 6b8d9fd70e..bedc78b9ff 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -1010,84 +1010,12 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 			get_context_desc, NULL, NULL);
 }
 
-static inline uint16_t
-tx_xmit_pkts(struct ci_tx_queue *txq,
-	     struct rte_mbuf **tx_pkts,
-	     uint16_t nb_pkts)
-{
-	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
-	uint16_t n = 0;
-
-	/**
-	 * Begin scanning the H/W ring for done descriptors when the number
-	 * of available descriptors drops below tx_free_thresh. For each done
-	 * descriptor, free the associated buffer.
-	 */
-	if (txq->nb_tx_free < txq->tx_free_thresh)
-		ci_tx_free_bufs(txq);
-
-	/* Use available descriptor only */
-	nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
-	if (unlikely(!nb_pkts))
-		return 0;
-
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
-	if ((txq->tx_tail + nb_pkts) > txq->nb_tx_desc) {
-		n = (uint16_t)(txq->nb_tx_desc - txq->tx_tail);
-		ci_tx_fill_hw_ring(txq, tx_pkts, n);
-		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
-		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
-		txq->tx_tail = 0;
-	}
-
-	/* Fill hardware descriptor ring with mbuf data */
-	ci_tx_fill_hw_ring(txq, tx_pkts + n, (uint16_t)(nb_pkts - n));
-	txq->tx_tail = (uint16_t)(txq->tx_tail + (nb_pkts - n));
-
-	/* Determine if RS bit needs to be set */
-	if (txq->tx_tail > txq->tx_next_rs) {
-		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
-		txq->tx_next_rs =
-			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
-		if (txq->tx_next_rs >= txq->nb_tx_desc)
-			txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
-	}
-
-	if (txq->tx_tail >= txq->nb_tx_desc)
-		txq->tx_tail = 0;
-
-	/* Update the tx tail register */
-	I40E_PCI_REG_WC_WRITE(txq->qtx_tail, txq->tx_tail);
-
-	return nb_pkts;
-}
-
 static uint16_t
 i40e_xmit_pkts_simple(void *tx_queue,
 		      struct rte_mbuf **tx_pkts,
 		      uint16_t nb_pkts)
 {
-	uint16_t nb_tx = 0;
-
-	if (likely(nb_pkts <= CI_TX_MAX_BURST))
-		return tx_xmit_pkts((struct ci_tx_queue *)tx_queue,
-						tx_pkts, nb_pkts);
-
-	while (nb_pkts) {
-		uint16_t ret, num = (uint16_t)RTE_MIN(nb_pkts,
-						CI_TX_MAX_BURST);
-
-		ret = tx_xmit_pkts((struct ci_tx_queue *)tx_queue,
-						&tx_pkts[nb_tx], num);
-		nb_tx = (uint16_t)(nb_tx + ret);
-		nb_pkts = (uint16_t)(nb_pkts - ret);
-		if (ret < num)
-			break;
-	}
-
-	return nb_tx;
+	return ci_xmit_pkts_simple(tx_queue, tx_pkts, nb_pkts);
 }
 
 #ifndef RTE_ARCH_X86
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index a3a94033bf..2b82a16422 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3245,84 +3245,12 @@ ice_tx_done_cleanup(void *txq, uint32_t free_cnt)
 		return ice_tx_done_cleanup_full(q, free_cnt);
 }
 
-static inline uint16_t
-tx_xmit_pkts(struct ci_tx_queue *txq,
-	     struct rte_mbuf **tx_pkts,
-	     uint16_t nb_pkts)
-{
-	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
-	uint16_t n = 0;
-
-	/**
-	 * Begin scanning the H/W ring for done descriptors when the number
-	 * of available descriptors drops below tx_free_thresh. For each done
-	 * descriptor, free the associated buffer.
-	 */
-	if (txq->nb_tx_free < txq->tx_free_thresh)
-		ci_tx_free_bufs(txq);
-
-	/* Use available descriptor only */
-	nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
-	if (unlikely(!nb_pkts))
-		return 0;
-
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
-	if ((txq->tx_tail + nb_pkts) > txq->nb_tx_desc) {
-		n = (uint16_t)(txq->nb_tx_desc - txq->tx_tail);
-		ci_tx_fill_hw_ring(txq, tx_pkts, n);
-		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
-		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
-		txq->tx_tail = 0;
-	}
-
-	/* Fill hardware descriptor ring with mbuf data */
-	ci_tx_fill_hw_ring(txq, tx_pkts + n, (uint16_t)(nb_pkts - n));
-	txq->tx_tail = (uint16_t)(txq->tx_tail + (nb_pkts - n));
-
-	/* Determine if RS bit needs to be set */
-	if (txq->tx_tail > txq->tx_next_rs) {
-		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
-			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) << CI_TXD_QW1_CMD_S);
-		txq->tx_next_rs =
-			(uint16_t)(txq->tx_next_rs + txq->tx_rs_thresh);
-		if (txq->tx_next_rs >= txq->nb_tx_desc)
-			txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
-	}
-
-	if (txq->tx_tail >= txq->nb_tx_desc)
-		txq->tx_tail = 0;
-
-	/* Update the tx tail register */
-	ICE_PCI_REG_WC_WRITE(txq->qtx_tail, txq->tx_tail);
-
-	return nb_pkts;
-}
-
 static uint16_t
 ice_xmit_pkts_simple(void *tx_queue,
 		     struct rte_mbuf **tx_pkts,
 		     uint16_t nb_pkts)
 {
-	uint16_t nb_tx = 0;
-
-	if (likely(nb_pkts <= CI_TX_MAX_BURST))
-		return tx_xmit_pkts((struct ci_tx_queue *)tx_queue,
-				    tx_pkts, nb_pkts);
-
-	while (nb_pkts) {
-		uint16_t ret, num = (uint16_t)RTE_MIN(nb_pkts,
-						      CI_TX_MAX_BURST);
-
-		ret = tx_xmit_pkts((struct ci_tx_queue *)tx_queue,
-				   &tx_pkts[nb_tx], num);
-		nb_tx = (uint16_t)(nb_tx + ret);
-		nb_pkts = (uint16_t)(nb_pkts - ret);
-		if (ret < num)
-			break;
-	}
-
-	return nb_tx;
+	return ci_xmit_pkts_simple(tx_queue, tx_pkts, nb_pkts);
 }
 
 static const struct ci_rx_path_info ice_rx_path_infos[] = {
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v5 30/35] net/intel: use non-volatile stores in simple Tx function
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (28 preceding siblings ...)
  2026-02-11 18:12   ` [PATCH v5 29/35] net/intel: complete merging simple Tx paths Bruce Richardson
@ 2026-02-11 18:12   ` Bruce Richardson
  2026-02-11 18:13   ` [PATCH v5 31/35] net/intel: align scalar simple Tx path with vector logic Bruce Richardson
                     ` (5 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 18:12 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Vladimir Medvedkin

The simple Tx code path can be reworked to use non-volatile stores - as
is the case with the full-featured Tx path - by reusing the existing
write_txd function (which just needs to be moved up in the header file).
This gives a small performance boost.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
---
 drivers/net/intel/common/tx_scalar.h | 65 +++++++++-------------------
 1 file changed, 21 insertions(+), 44 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index ca25a2fc9d..2c624e97e7 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -12,35 +12,17 @@
 /* depends on common Tx definitions. */
 #include "tx.h"
 
-/* Populate 4 descriptors with data from 4 mbufs */
 static inline void
-ci_tx_fill_hw_ring_tx4(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
-{
-	uint64_t dma_addr;
-	uint32_t i;
-
-	for (i = 0; i < 4; i++, txdp++, pkts++) {
-		dma_addr = rte_mbuf_data_iova(*pkts);
-		txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
-		txdp->cmd_type_offset_bsz =
-			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
-				((uint64_t)CI_TX_DESC_CMD_DEFAULT << CI_TXD_QW1_CMD_S) |
-				((uint64_t)(*pkts)->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
-	}
-}
-
-/* Populate 1 descriptor with data from 1 mbuf */
-static inline void
-ci_tx_fill_hw_ring_tx1(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts)
+write_txd(volatile void *txd, uint64_t qw0, uint64_t qw1)
 {
-	uint64_t dma_addr;
-
-	dma_addr = rte_mbuf_data_iova(*pkts);
-	txdp->buffer_addr = rte_cpu_to_le_64(dma_addr);
-	txdp->cmd_type_offset_bsz =
-		rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DATA |
-			((uint64_t)CI_TX_DESC_CMD_DEFAULT << CI_TXD_QW1_CMD_S) |
-			((uint64_t)(*pkts)->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
+	/* we use an aligned structure and cast away the volatile to allow the compiler
+	 * to opportunistically optimize the two 64-bit writes as a single 128-bit write.
+	 */
+	__rte_aligned(16) struct txdesc {
+		uint64_t qw0, qw1;
+	} *txdesc = RTE_CAST_PTR(struct txdesc *, txd);
+	txdesc->qw0 = rte_cpu_to_le_64(qw0);
+	txdesc->qw1 = rte_cpu_to_le_64(qw1);
 }
 
 /* Fill hardware descriptor ring with mbuf data */
@@ -60,14 +42,22 @@ ci_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
 	for (i = 0; i < mainpart; i += N_PER_LOOP) {
 		for (j = 0; j < N_PER_LOOP; ++j)
 			(txep + i + j)->mbuf = *(pkts + i + j);
-		ci_tx_fill_hw_ring_tx4(txdp + i, pkts + i);
+		for (j = 0; j < N_PER_LOOP; ++j)
+			write_txd(txdp + i + j, rte_mbuf_data_iova(*(pkts + i + j)),
+				CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)CI_TX_DESC_CMD_DEFAULT << CI_TXD_QW1_CMD_S) |
+				((uint64_t)(*(pkts + i + j))->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
 	}
 
 	if (unlikely(leftover > 0)) {
 		for (i = 0; i < leftover; ++i) {
-			(txep + mainpart + i)->mbuf = *(pkts + mainpart + i);
-			ci_tx_fill_hw_ring_tx1(txdp + mainpart + i,
-					       pkts + mainpart + i);
+			uint16_t idx = mainpart + i;
+			(txep + idx)->mbuf = *(pkts + idx);
+			write_txd(txdp + idx, rte_mbuf_data_iova(*(pkts + idx)),
+				CI_TX_DESC_DTYPE_DATA |
+				((uint64_t)CI_TX_DESC_CMD_DEFAULT << CI_TXD_QW1_CMD_S) |
+				((uint64_t)(*(pkts + idx))->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
+
 		}
 	}
 }
@@ -356,19 +346,6 @@ struct ci_timestamp_queue_fns {
 	write_ts_tail_t write_ts_tail;
 };
 
-static inline void
-write_txd(volatile void *txd, uint64_t qw0, uint64_t qw1)
-{
-	/* we use an aligned structure and cast away the volatile to allow the compiler
-	 * to opportunistically optimize the two 64-bit writes as a single 128-bit write.
-	 */
-	__rte_aligned(16) struct txdesc {
-		uint64_t qw0, qw1;
-	} *txdesc = RTE_CAST_PTR(struct txdesc *, txd);
-	txdesc->qw0 = rte_cpu_to_le_64(qw0);
-	txdesc->qw1 = rte_cpu_to_le_64(qw1);
-}
-
 static inline uint16_t
 ci_xmit_pkts(struct ci_tx_queue *txq,
 	     struct rte_mbuf **tx_pkts,
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v5 31/35] net/intel: align scalar simple Tx path with vector logic
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (29 preceding siblings ...)
  2026-02-11 18:12   ` [PATCH v5 30/35] net/intel: use non-volatile stores in simple Tx function Bruce Richardson
@ 2026-02-11 18:13   ` Bruce Richardson
  2026-02-11 18:13   ` [PATCH v5 32/35] net/intel: use vector SW ring entry for simple path Bruce Richardson
                     ` (4 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 18:13 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Vladimir Medvedkin

The scalar simple Tx path has the same restrictions as the vector Tx
path, so we can use the same logic flow in both, to try and ensure we
get best performance from the scalar path.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
---
 drivers/net/intel/common/tx_scalar.h | 54 +++++++++++++++++-----------
 1 file changed, 34 insertions(+), 20 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index 2c624e97e7..8f87acde25 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -25,13 +25,11 @@ write_txd(volatile void *txd, uint64_t qw0, uint64_t qw1)
 	txdesc->qw1 = rte_cpu_to_le_64(qw1);
 }
 
-/* Fill hardware descriptor ring with mbuf data */
+/* Fill hardware descriptor ring with mbuf data (simple path) */
 static inline void
-ci_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
-		   uint16_t nb_pkts)
+ci_tx_fill_hw_ring_simple(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts,
+			  uint16_t nb_pkts)
 {
-	volatile struct ci_tx_desc *txdp = &txq->ci_tx_ring[txq->tx_tail];
-	struct ci_tx_entry *txep = &txq->sw_ring[txq->tx_tail];
 	const int N_PER_LOOP = 4;
 	const int N_PER_LOOP_MASK = N_PER_LOOP - 1;
 	int mainpart, leftover;
@@ -40,8 +38,6 @@ ci_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
 	mainpart = nb_pkts & ((uint32_t)~N_PER_LOOP_MASK);
 	leftover = nb_pkts & ((uint32_t)N_PER_LOOP_MASK);
 	for (i = 0; i < mainpart; i += N_PER_LOOP) {
-		for (j = 0; j < N_PER_LOOP; ++j)
-			(txep + i + j)->mbuf = *(pkts + i + j);
 		for (j = 0; j < N_PER_LOOP; ++j)
 			write_txd(txdp + i + j, rte_mbuf_data_iova(*(pkts + i + j)),
 				CI_TX_DESC_DTYPE_DATA |
@@ -52,12 +48,10 @@ ci_tx_fill_hw_ring(struct ci_tx_queue *txq, struct rte_mbuf **pkts,
 	if (unlikely(leftover > 0)) {
 		for (i = 0; i < leftover; ++i) {
 			uint16_t idx = mainpart + i;
-			(txep + idx)->mbuf = *(pkts + idx);
 			write_txd(txdp + idx, rte_mbuf_data_iova(*(pkts + idx)),
 				CI_TX_DESC_DTYPE_DATA |
 				((uint64_t)CI_TX_DESC_CMD_DEFAULT << CI_TXD_QW1_CMD_S) |
 				((uint64_t)(*(pkts + idx))->data_len << CI_TXD_QW1_TX_BUF_SZ_S));
-
 		}
 	}
 }
@@ -134,6 +128,9 @@ ci_xmit_burst_simple(struct ci_tx_queue *txq,
 		     uint16_t nb_pkts)
 {
 	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
+	volatile struct ci_tx_desc *txdp;
+	struct ci_tx_entry *txep;
+	uint16_t tx_id;
 	uint16_t n = 0;
 
 	/**
@@ -149,23 +146,41 @@ ci_xmit_burst_simple(struct ci_tx_queue *txq,
 	if (unlikely(!nb_pkts))
 		return 0;
 
+	tx_id = txq->tx_tail;
+	txdp = &txr[tx_id];
+	txep = &txq->sw_ring[tx_id];
+
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
-	if ((txq->tx_tail + nb_pkts) > txq->nb_tx_desc) {
-		n = (uint16_t)(txq->nb_tx_desc - txq->tx_tail);
-		ci_tx_fill_hw_ring(txq, tx_pkts, n);
+
+	if ((tx_id + nb_pkts) > txq->nb_tx_desc) {
+		n = (uint16_t)(txq->nb_tx_desc - tx_id);
+
+		/* Store mbufs in backlog */
+		ci_tx_backlog_entry(txep, tx_pkts, n);
+
+		/* Write descriptors to HW ring */
+		ci_tx_fill_hw_ring_simple(txdp, tx_pkts, n);
+
 		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) <<
 					  CI_TXD_QW1_CMD_S);
 		txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
-		txq->tx_tail = 0;
+
+		tx_id = 0;
+		txdp = &txr[tx_id];
+		txep = &txq->sw_ring[tx_id];
 	}
 
-	/* Fill hardware descriptor ring with mbuf data */
-	ci_tx_fill_hw_ring(txq, tx_pkts + n, (uint16_t)(nb_pkts - n));
-	txq->tx_tail = (uint16_t)(txq->tx_tail + (nb_pkts - n));
+	/* Store remaining mbufs in backlog */
+	ci_tx_backlog_entry(txep, tx_pkts + n, (uint16_t)(nb_pkts - n));
+
+	/* Write remaining descriptors to HW ring */
+	ci_tx_fill_hw_ring_simple(txdp, tx_pkts + n, (uint16_t)(nb_pkts - n));
+
+	tx_id = (uint16_t)(tx_id + (nb_pkts - n));
 
 	/* Determine if RS bit needs to be set */
-	if (txq->tx_tail > txq->tx_next_rs) {
+	if (tx_id > txq->tx_next_rs) {
 		txr[txq->tx_next_rs].cmd_type_offset_bsz |=
 			rte_cpu_to_le_64(((uint64_t)CI_TX_DESC_CMD_RS) <<
 					  CI_TXD_QW1_CMD_S);
@@ -175,11 +190,10 @@ ci_xmit_burst_simple(struct ci_tx_queue *txq,
 			txq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);
 	}
 
-	if (txq->tx_tail >= txq->nb_tx_desc)
-		txq->tx_tail = 0;
+	txq->tx_tail = tx_id;
 
 	/* Update the tx tail register */
-	rte_write32_wc((uint32_t)txq->tx_tail, txq->qtx_tail);
+	rte_write32_wc((uint32_t)tx_id, txq->qtx_tail);
 
 	return nb_pkts;
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v5 32/35] net/intel: use vector SW ring entry for simple path
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (30 preceding siblings ...)
  2026-02-11 18:13   ` [PATCH v5 31/35] net/intel: align scalar simple Tx path with vector logic Bruce Richardson
@ 2026-02-11 18:13   ` Bruce Richardson
  2026-02-11 18:13   ` [PATCH v5 33/35] net/intel: use vector mbuf cleanup from simple scalar path Bruce Richardson
                     ` (3 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 18:13 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Vladimir Medvedkin, Praveen Shetty,
	Anatoly Burakov, Jingjing Wu

The simple scalar Tx path does not need to use the full sw_entry
structure that the full Tx path uses, so rename the flag for "vector_tx"
to instead be "use_vec_entry" since its sole purpose is to flag the use
of the smaller tx_entry_vec structure. Then set this flag for the simple
Tx path, giving us a perf boost.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
---
 drivers/net/intel/common/tx.h                    |  6 ++++--
 drivers/net/intel/common/tx_scalar.h             | 14 +++++++-------
 drivers/net/intel/cpfl/cpfl_rxtx.c               |  4 ++--
 drivers/net/intel/i40e/i40e_rxtx.c               |  2 +-
 drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c      |  2 +-
 drivers/net/intel/ice/ice_rxtx.c                 |  2 +-
 drivers/net/intel/idpf/idpf_common_rxtx_avx512.c |  2 +-
 drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c  |  2 +-
 8 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index f38e43f65a..8fec8d7909 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -173,7 +173,7 @@ struct ci_tx_queue {
 	rte_iova_t tx_ring_dma;        /* TX ring DMA address */
 	bool tx_deferred_start; /* don't start this queue in dev start */
 	bool q_set;             /* indicate if tx queue has been configured */
-	bool vector_tx;         /* port is using vector TX */
+	bool use_vec_entry;     /* use sw_ring_vec (true for vector and simple paths) */
 	union {                  /* the VSI this queue belongs to */
 		struct i40e_vsi *i40e_vsi;
 		struct iavf_vsi *iavf_vsi;
@@ -361,7 +361,8 @@ ci_txq_release_all_mbufs(struct ci_tx_queue *txq, bool use_ctx)
 	if (unlikely(!txq || !txq->sw_ring))
 		return;
 
-	if (!txq->vector_tx) {
+	if (!txq->use_vec_entry) {
+		/* Regular scalar path uses sw_ring with ci_tx_entry */
 		for (uint16_t i = 0; i < txq->nb_tx_desc; i++) {
 			if (txq->sw_ring[i].mbuf != NULL) {
 				rte_pktmbuf_free_seg(txq->sw_ring[i].mbuf);
@@ -372,6 +373,7 @@ ci_txq_release_all_mbufs(struct ci_tx_queue *txq, bool use_ctx)
 	}
 
 	/**
+	 *  Vector and simple paths use sw_ring_vec (ci_tx_entry_vec).
 	 *  vPMD tx will not set sw_ring's mbuf to NULL after free,
 	 *  so determining buffers to free is a little more complex.
 	 */
diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index 8f87acde25..c8c1a55c24 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -64,14 +64,14 @@ ci_tx_free_bufs(struct ci_tx_queue *txq)
 	const uint16_t k = RTE_ALIGN_FLOOR(rs_thresh, CI_TX_MAX_FREE_BUF_SZ);
 	const uint16_t m = rs_thresh % CI_TX_MAX_FREE_BUF_SZ;
 	struct rte_mbuf *free[CI_TX_MAX_FREE_BUF_SZ];
-	struct ci_tx_entry *txep;
+	struct ci_tx_entry_vec *txep;
 
 	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
 			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
 			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
 		return 0;
 
-	txep = &txq->sw_ring[txq->tx_next_dd - (rs_thresh - 1)];
+	txep = &txq->sw_ring_vec[txq->tx_next_dd - (rs_thresh - 1)];
 
 	struct rte_mempool *fast_free_mp =
 			likely(txq->fast_free_mp != (void *)UINTPTR_MAX) ?
@@ -129,7 +129,7 @@ ci_xmit_burst_simple(struct ci_tx_queue *txq,
 {
 	volatile struct ci_tx_desc *txr = txq->ci_tx_ring;
 	volatile struct ci_tx_desc *txdp;
-	struct ci_tx_entry *txep;
+	struct ci_tx_entry_vec *txep;
 	uint16_t tx_id;
 	uint16_t n = 0;
 
@@ -148,7 +148,7 @@ ci_xmit_burst_simple(struct ci_tx_queue *txq,
 
 	tx_id = txq->tx_tail;
 	txdp = &txr[tx_id];
-	txep = &txq->sw_ring[tx_id];
+	txep = &txq->sw_ring_vec[tx_id];
 
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
 
@@ -156,7 +156,7 @@ ci_xmit_burst_simple(struct ci_tx_queue *txq,
 		n = (uint16_t)(txq->nb_tx_desc - tx_id);
 
 		/* Store mbufs in backlog */
-		ci_tx_backlog_entry(txep, tx_pkts, n);
+		ci_tx_backlog_entry_vec(txep, tx_pkts, n);
 
 		/* Write descriptors to HW ring */
 		ci_tx_fill_hw_ring_simple(txdp, tx_pkts, n);
@@ -168,11 +168,11 @@ ci_xmit_burst_simple(struct ci_tx_queue *txq,
 
 		tx_id = 0;
 		txdp = &txr[tx_id];
-		txep = &txq->sw_ring[tx_id];
+		txep = &txq->sw_ring_vec[tx_id];
 	}
 
 	/* Store remaining mbufs in backlog */
-	ci_tx_backlog_entry(txep, tx_pkts + n, (uint16_t)(nb_pkts - n));
+	ci_tx_backlog_entry_vec(txep, tx_pkts + n, (uint16_t)(nb_pkts - n));
 
 	/* Write remaining descriptors to HW ring */
 	ci_tx_fill_hw_ring_simple(txdp, tx_pkts + n, (uint16_t)(nb_pkts - n));
diff --git a/drivers/net/intel/cpfl/cpfl_rxtx.c b/drivers/net/intel/cpfl/cpfl_rxtx.c
index e7a98ed4f6..b5b9015310 100644
--- a/drivers/net/intel/cpfl/cpfl_rxtx.c
+++ b/drivers/net/intel/cpfl/cpfl_rxtx.c
@@ -329,7 +329,7 @@ cpfl_tx_queue_release(void *txq)
 		rte_free(q->complq);
 	}
 
-	ci_txq_release_all_mbufs(q, q->vector_tx);
+	ci_txq_release_all_mbufs(q, q->use_vec_entry);
 	rte_free(q->sw_ring);
 	rte_free(q->rs_last_id);
 	rte_memzone_free(q->mz);
@@ -1364,7 +1364,7 @@ cpfl_tx_queue_stop(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 	}
 
 	txq = &cpfl_txq->base;
-	ci_txq_release_all_mbufs(txq, txq->vector_tx);
+	ci_txq_release_all_mbufs(txq, txq->use_vec_entry);
 	if (vport->txq_model == VIRTCHNL2_QUEUE_MODEL_SINGLE) {
 		idpf_qc_single_tx_queue_reset(txq);
 	} else {
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index bedc78b9ff..155eec210e 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -1451,7 +1451,7 @@ i40e_dev_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 		PMD_DRV_LOG(WARNING, "TX queue %u is deferred start",
 			    tx_queue_id);
 
-	txq->vector_tx = ad->tx_vec_allowed;
+	txq->use_vec_entry = ad->tx_vec_allowed || ad->tx_simple_allowed;
 
 	/*
 	 * tx_queue_id is queue id application refers to, while
diff --git a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
index cea4ee9863..374c713a94 100644
--- a/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
+++ b/drivers/net/intel/iavf/iavf_rxtx_vec_avx2.c
@@ -1803,7 +1803,7 @@ iavf_xmit_pkts_vec_avx2_offload(void *tx_queue, struct rte_mbuf **tx_pkts,
 int __rte_cold
 iavf_txq_vec_setup(struct ci_tx_queue *txq)
 {
-	txq->vector_tx = true;
+	txq->use_vec_entry = true;
 	return 0;
 }
 
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 2b82a16422..0fc7237234 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -882,7 +882,7 @@ ice_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 		}
 
 	/* record what kind of descriptor cleanup we need on teardown */
-	txq->vector_tx = ad->tx_vec_allowed;
+	txq->use_vec_entry = ad->tx_vec_allowed || ad->tx_simple_allowed;
 
 	if (txq->tsq != NULL && txq->tsq->ts_flag > 0) {
 		struct ice_aqc_set_txtime_qgrp *ts_elem;
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
index 49ace35615..666ad1a4dd 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
@@ -1365,6 +1365,6 @@ idpf_qc_tx_vec_avx512_setup(struct ci_tx_queue *txq)
 	if (!txq)
 		return 0;
 
-	txq->vector_tx = true;
+	txq->use_vec_entry = true;
 	return 0;
 }
diff --git a/drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c b/drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c
index 63c7cb50d3..c42b8fc96b 100644
--- a/drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c
+++ b/drivers/net/intel/ixgbe/ixgbe_rxtx_vec_common.c
@@ -111,7 +111,7 @@ ixgbe_txq_vec_setup(struct ci_tx_queue *txq)
 	/* leave the first one for overflow */
 	txq->sw_ring_vec = txq->sw_ring_vec + 1;
 	txq->ops = &vec_txq_ops;
-	txq->vector_tx = 1;
+	txq->use_vec_entry = true;
 
 	return 0;
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v5 33/35] net/intel: use vector mbuf cleanup from simple scalar path
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (31 preceding siblings ...)
  2026-02-11 18:13   ` [PATCH v5 32/35] net/intel: use vector SW ring entry for simple path Bruce Richardson
@ 2026-02-11 18:13   ` Bruce Richardson
  2026-02-11 18:13   ` [PATCH v5 34/35] net/idpf: enable simple Tx function Bruce Richardson
                     ` (2 subsequent siblings)
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 18:13 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Vladimir Medvedkin, Anatoly Burakov

Since the simple scalar path now uses the vector Tx entry struct, we can
leverage the vector mbuf cleanup function from that path and avoid
having a separate cleanup function for it.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
---
 drivers/net/intel/common/tx_scalar.h | 74 ++++++----------------------
 drivers/net/intel/i40e/i40e_rxtx.c   |  2 +-
 drivers/net/intel/ice/ice_rxtx.c     |  2 +-
 3 files changed, 17 insertions(+), 61 deletions(-)

diff --git a/drivers/net/intel/common/tx_scalar.h b/drivers/net/intel/common/tx_scalar.h
index c8c1a55c24..056e7e5fad 100644
--- a/drivers/net/intel/common/tx_scalar.h
+++ b/drivers/net/intel/common/tx_scalar.h
@@ -25,6 +25,20 @@ write_txd(volatile void *txd, uint64_t qw0, uint64_t qw1)
 	txdesc->qw1 = rte_cpu_to_le_64(qw1);
 }
 
+static __rte_always_inline int
+ci_tx_desc_done_simple(struct ci_tx_queue *txq, uint16_t idx)
+{
+	return (txq->ci_tx_ring[idx].cmd_type_offset_bsz & rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) ==
+			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE);
+}
+
+/* Free transmitted mbufs using vector-style cleanup */
+static __rte_always_inline int
+ci_tx_free_bufs_simple(struct ci_tx_queue *txq)
+{
+	return ci_tx_free_bufs_vec(txq, ci_tx_desc_done_simple, false);
+}
+
 /* Fill hardware descriptor ring with mbuf data (simple path) */
 static inline void
 ci_tx_fill_hw_ring_simple(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pkts,
@@ -56,64 +70,6 @@ ci_tx_fill_hw_ring_simple(volatile struct ci_tx_desc *txdp, struct rte_mbuf **pk
 	}
 }
 
-/* Free transmitted mbufs from descriptor ring with bulk freeing for Tx simple path */
-static __rte_always_inline int
-ci_tx_free_bufs(struct ci_tx_queue *txq)
-{
-	const uint16_t rs_thresh = txq->tx_rs_thresh;
-	const uint16_t k = RTE_ALIGN_FLOOR(rs_thresh, CI_TX_MAX_FREE_BUF_SZ);
-	const uint16_t m = rs_thresh % CI_TX_MAX_FREE_BUF_SZ;
-	struct rte_mbuf *free[CI_TX_MAX_FREE_BUF_SZ];
-	struct ci_tx_entry_vec *txep;
-
-	if ((txq->ci_tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
-			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
-			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
-		return 0;
-
-	txep = &txq->sw_ring_vec[txq->tx_next_dd - (rs_thresh - 1)];
-
-	struct rte_mempool *fast_free_mp =
-			likely(txq->fast_free_mp != (void *)UINTPTR_MAX) ?
-				txq->fast_free_mp :
-				(txq->fast_free_mp = txep[0].mbuf->pool);
-
-	if (fast_free_mp) {
-		if (k) {
-			for (uint16_t j = 0; j != k; j += CI_TX_MAX_FREE_BUF_SZ) {
-				for (uint16_t i = 0; i < CI_TX_MAX_FREE_BUF_SZ; ++i, ++txep) {
-					free[i] = txep->mbuf;
-					txep->mbuf = NULL;
-				}
-				rte_mbuf_raw_free_bulk(fast_free_mp, free, CI_TX_MAX_FREE_BUF_SZ);
-			}
-		}
-
-		if (m) {
-			for (uint16_t i = 0; i < m; ++i, ++txep) {
-				free[i] = txep->mbuf;
-				txep->mbuf = NULL;
-			}
-			rte_mbuf_raw_free_bulk(fast_free_mp, free, m);
-		}
-	} else {
-		for (uint16_t i = 0; i < rs_thresh; ++i, ++txep)
-			rte_prefetch0((txep + i)->mbuf);
-
-		for (uint16_t i = 0; i < rs_thresh; ++i, ++txep) {
-			rte_pktmbuf_free_seg(txep->mbuf);
-			txep->mbuf = NULL;
-		}
-	}
-
-	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + rs_thresh);
-	txq->tx_next_dd = (uint16_t)(txq->tx_next_dd + rs_thresh);
-	if (txq->tx_next_dd >= txq->nb_tx_desc)
-		txq->tx_next_dd = (uint16_t)(rs_thresh - 1);
-
-	return rs_thresh;
-}
-
 /* Simple burst transmit for descriptor-based simple Tx path
  *
  * Transmits a burst of packets by filling hardware descriptors with mbuf
@@ -139,7 +95,7 @@ ci_xmit_burst_simple(struct ci_tx_queue *txq,
 	 * descriptor, free the associated buffer.
 	 */
 	if (txq->nb_tx_free < txq->tx_free_thresh)
-		ci_tx_free_bufs(txq);
+		ci_tx_free_bufs_simple(txq);
 
 	/* Use available descriptor only */
 	nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
diff --git a/drivers/net/intel/i40e/i40e_rxtx.c b/drivers/net/intel/i40e/i40e_rxtx.c
index 155eec210e..ffb303158b 100644
--- a/drivers/net/intel/i40e/i40e_rxtx.c
+++ b/drivers/net/intel/i40e/i40e_rxtx.c
@@ -2377,7 +2377,7 @@ i40e_tx_done_cleanup_simple(struct ci_tx_queue *txq,
 		if (txq->nb_tx_desc - txq->nb_tx_free < txq->tx_rs_thresh)
 			break;
 
-		n = ci_tx_free_bufs(txq);
+		n = ci_tx_free_bufs_simple(txq);
 
 		if (n == 0)
 			break;
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 0fc7237234..321415d839 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -3218,7 +3218,7 @@ ice_tx_done_cleanup_simple(struct ci_tx_queue *txq,
 		if (txq->nb_tx_desc - txq->nb_tx_free < txq->tx_rs_thresh)
 			break;
 
-		n = ci_tx_free_bufs(txq);
+		n = ci_tx_free_bufs_simple(txq);
 
 		if (n == 0)
 			break;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v5 34/35] net/idpf: enable simple Tx function
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (32 preceding siblings ...)
  2026-02-11 18:13   ` [PATCH v5 33/35] net/intel: use vector mbuf cleanup from simple scalar path Bruce Richardson
@ 2026-02-11 18:13   ` Bruce Richardson
  2026-02-12 12:28     ` Burakov, Anatoly
  2026-02-11 18:13   ` [PATCH v5 35/35] net/cpfl: " Bruce Richardson
  2026-02-12 14:45   ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
  35 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 18:13 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Jingjing Wu, Praveen Shetty

The common "simple Tx" function - in some ways a scalar version of the
vector Tx functions - can be used by the idpf driver as well as i40e and
ice, so add support for it to the driver.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/idpf/idpf_common_device.h   |  1 +
 drivers/net/intel/idpf/idpf_common_rxtx.c     | 19 ++++++++++
 drivers/net/intel/idpf/idpf_common_rxtx.h     |  3 ++
 .../net/intel/idpf/idpf_common_rxtx_avx512.c  |  1 -
 drivers/net/intel/idpf/idpf_rxtx.c            | 38 ++++++++++++++++---
 5 files changed, 56 insertions(+), 6 deletions(-)

diff --git a/drivers/net/intel/idpf/idpf_common_device.h b/drivers/net/intel/idpf/idpf_common_device.h
index 31915a03d4..bbc969c734 100644
--- a/drivers/net/intel/idpf/idpf_common_device.h
+++ b/drivers/net/intel/idpf/idpf_common_device.h
@@ -78,6 +78,7 @@ enum idpf_rx_func_type {
 enum idpf_tx_func_type {
 	IDPF_TX_DEFAULT,
 	IDPF_TX_SINGLEQ,
+	IDPF_TX_SINGLEQ_SIMPLE,
 	IDPF_TX_SINGLEQ_AVX2,
 	IDPF_TX_AVX512,
 	IDPF_TX_SINGLEQ_AVX512,
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index f14a20d6ec..b8f6418d4a 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -1347,6 +1347,15 @@ idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			idpf_set_tso_ctx, NULL, NULL);
 }
 
+RTE_EXPORT_INTERNAL_SYMBOL(idpf_dp_singleq_xmit_pkts_simple)
+uint16_t
+idpf_dp_singleq_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
+				   uint16_t nb_pkts)
+{
+	return ci_xmit_pkts_simple(tx_queue, tx_pkts, nb_pkts);
+}
+
+
 /* TX prep functions */
 RTE_EXPORT_INTERNAL_SYMBOL(idpf_dp_prep_pkts)
 uint16_t
@@ -1532,6 +1541,16 @@ const struct ci_tx_path_info idpf_tx_path_infos[] = {
 			.single_queue = true
 		}
 	},
+	[IDPF_TX_SINGLEQ_SIMPLE] = {
+		.pkt_burst = idpf_dp_singleq_xmit_pkts_simple,
+		.info = "Single Queue Scalar Simple",
+		.features = {
+			.tx_offloads = IDPF_TX_VECTOR_OFFLOADS,
+			.single_queue = true,
+			.simple_tx = true,
+		}
+	},
+
 #ifdef RTE_ARCH_X86
 	[IDPF_TX_SINGLEQ_AVX2] = {
 		.pkt_burst = idpf_dp_singleq_xmit_pkts_avx2,
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.h b/drivers/net/intel/idpf/idpf_common_rxtx.h
index fe7094d434..914cab0f25 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.h
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.h
@@ -221,6 +221,9 @@ __rte_internal
 uint16_t idpf_dp_singleq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 				   uint16_t nb_pkts);
 __rte_internal
+uint16_t idpf_dp_singleq_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
+				   uint16_t nb_pkts);
+__rte_internal
 uint16_t idpf_dp_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			   uint16_t nb_pkts);
 __rte_internal
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
index 666ad1a4dd..c5f2018924 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
@@ -1365,6 +1365,5 @@ idpf_qc_tx_vec_avx512_setup(struct ci_tx_queue *txq)
 	if (!txq)
 		return 0;
 
-	txq->use_vec_entry = true;
 	return 0;
 }
diff --git a/drivers/net/intel/idpf/idpf_rxtx.c b/drivers/net/intel/idpf/idpf_rxtx.c
index 9420200f6d..6317112353 100644
--- a/drivers/net/intel/idpf/idpf_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_rxtx.c
@@ -833,21 +833,39 @@ idpf_set_rx_function(struct rte_eth_dev *dev)
 
 }
 
+static bool
+idpf_tx_simple_allowed(struct rte_eth_dev *dev)
+{
+	struct idpf_vport *vport = dev->data->dev_private;
+	struct ci_tx_queue *txq;
+
+	if (vport->txq_model != VIRTCHNL2_QUEUE_MODEL_SINGLE)
+		return false;
+
+	for (int i = 0; i < dev->data->nb_tx_queues; i++) {
+		txq = dev->data->tx_queues[i];
+		if (txq == NULL)
+			continue;
+		if (txq->offloads != (txq->offloads & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE) ||
+				txq->tx_rs_thresh < IDPF_VPMD_TX_MAX_BURST)
+			return false;
+	}
+	return true;
+}
+
 void
 idpf_set_tx_function(struct rte_eth_dev *dev)
 {
 	struct idpf_vport *vport = dev->data->dev_private;
-#ifdef RTE_ARCH_X86
-#ifdef CC_AVX512_SUPPORT
 	struct ci_tx_queue *txq;
 	int i;
-#endif /* CC_AVX512_SUPPORT */
-#endif /* RTE_ARCH_X86 */
 	struct idpf_adapter *ad = vport->adapter;
+	bool simple_allowed = idpf_tx_simple_allowed(dev);
 	struct ci_tx_path_features req_features = {
 		.tx_offloads = dev->data->dev_conf.txmode.offloads,
 		.simd_width = RTE_VECT_SIMD_DISABLED,
-		.single_queue = (vport->txq_model == VIRTCHNL2_QUEUE_MODEL_SINGLE)
+		.single_queue = (vport->txq_model == VIRTCHNL2_QUEUE_MODEL_SINGLE),
+		.simple_tx = simple_allowed
 	};
 
 	/* The primary process selects the tx path for all processes. */
@@ -864,6 +882,16 @@ idpf_set_tx_function(struct rte_eth_dev *dev)
 					IDPF_TX_MAX,
 					IDPF_TX_DEFAULT);
 
+	/* Set use_vec_entry for single queue mode - only IDPF_TX_SINGLEQ uses regular entries */
+	if (vport->txq_model == VIRTCHNL2_QUEUE_MODEL_SINGLE) {
+		for (i = 0; i < dev->data->nb_tx_queues; i++) {
+			txq = dev->data->tx_queues[i];
+			if (txq == NULL)
+				continue;
+			txq->use_vec_entry = (ad->tx_func_type != IDPF_TX_SINGLEQ);
+		}
+	}
+
 out:
 	dev->tx_pkt_burst = idpf_tx_path_infos[ad->tx_func_type].pkt_burst;
 	dev->tx_pkt_prepare = idpf_dp_prep_pkts;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* [PATCH v5 35/35] net/cpfl: enable simple Tx function
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (33 preceding siblings ...)
  2026-02-11 18:13   ` [PATCH v5 34/35] net/idpf: enable simple Tx function Bruce Richardson
@ 2026-02-11 18:13   ` Bruce Richardson
  2026-02-12 12:30     ` Burakov, Anatoly
  2026-02-12 14:45   ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
  35 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-11 18:13 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Praveen Shetty

The common "simple Tx" function - in some ways a scalar version of the
vector Tx functions - can be used by the cpfl driver in the same way as
the idpf driver does.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/cpfl/cpfl_rxtx.c | 39 ++++++++++++++++++++++++++----
 1 file changed, 34 insertions(+), 5 deletions(-)

diff --git a/drivers/net/intel/cpfl/cpfl_rxtx.c b/drivers/net/intel/cpfl/cpfl_rxtx.c
index b5b9015310..11451e1666 100644
--- a/drivers/net/intel/cpfl/cpfl_rxtx.c
+++ b/drivers/net/intel/cpfl/cpfl_rxtx.c
@@ -1485,22 +1485,41 @@ cpfl_set_rx_function(struct rte_eth_dev *dev)
 
 }
 
+static bool
+cpfl_tx_simple_allowed(struct rte_eth_dev *dev)
+{
+	struct cpfl_vport *cpfl_vport = dev->data->dev_private;
+	struct idpf_vport *vport = &cpfl_vport->base;
+	struct ci_tx_queue *txq;
+
+	if (vport->txq_model != VIRTCHNL2_QUEUE_MODEL_SINGLE)
+		return false;
+
+	for (int i = 0; i < dev->data->nb_tx_queues; i++) {
+		txq = dev->data->tx_queues[i];
+		if (txq == NULL)
+			continue;
+		if (txq->offloads != (txq->offloads & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE) ||
+				txq->tx_rs_thresh < IDPF_VPMD_TX_MAX_BURST)
+			return false;
+	}
+	return true;
+}
+
 void
 cpfl_set_tx_function(struct rte_eth_dev *dev)
 {
 	struct cpfl_vport *cpfl_vport = dev->data->dev_private;
 	struct idpf_vport *vport = &cpfl_vport->base;
-#ifdef RTE_ARCH_X86
-#ifdef CC_AVX512_SUPPORT
 	struct ci_tx_queue *txq;
 	int i;
-#endif /* CC_AVX512_SUPPORT */
-#endif /* RTE_ARCH_X86 */
 	struct idpf_adapter *ad = vport->adapter;
+	bool simple_allowed = cpfl_tx_simple_allowed(dev);
 	struct ci_tx_path_features req_features = {
 		.tx_offloads = dev->data->dev_conf.txmode.offloads,
 		.simd_width = RTE_VECT_SIMD_DISABLED,
-		.single_queue = (vport->txq_model == VIRTCHNL2_QUEUE_MODEL_SINGLE)
+		.single_queue = (vport->txq_model == VIRTCHNL2_QUEUE_MODEL_SINGLE),
+		.simple_tx = simple_allowed
 	};
 
 	/* The primary process selects the tx path for all processes. */
@@ -1517,6 +1536,16 @@ cpfl_set_tx_function(struct rte_eth_dev *dev)
 					IDPF_TX_MAX,
 					IDPF_TX_DEFAULT);
 
+	/* Set use_vec_entry for single queue mode - only IDPF_TX_SINGLEQ uses regular entries */
+	if (vport->txq_model == VIRTCHNL2_QUEUE_MODEL_SINGLE) {
+		for (i = 0; i < dev->data->nb_tx_queues; i++) {
+			txq = dev->data->tx_queues[i];
+			if (txq == NULL)
+				continue;
+			txq->use_vec_entry = (ad->tx_func_type != IDPF_TX_SINGLEQ);
+		}
+	}
+
 out:
 	dev->tx_pkt_burst = idpf_tx_path_infos[ad->tx_func_type].pkt_burst;
 	dev->tx_pkt_prepare = idpf_dp_prep_pkts;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 274+ messages in thread

* RE: [PATCH v5 20/35] net/intel: write descriptors using non-volatile pointers
  2026-02-11 18:12   ` [PATCH v5 20/35] net/intel: write descriptors using non-volatile pointers Bruce Richardson
@ 2026-02-11 21:14     ` Morten Brørup
  2026-02-12  8:43       ` Bruce Richardson
  0 siblings, 1 reply; 274+ messages in thread
From: Morten Brørup @ 2026-02-11 21:14 UTC (permalink / raw)
  To: Bruce Richardson, dev

> +static inline void
> +write_txd(volatile void *txd, uint64_t qw0, uint64_t qw1)
> +{
> +	/* we use an aligned structure and cast away the volatile to
> allow the compiler
> +	 * to opportunistically optimize the two 64-bit writes as a
> single 128-bit write.
> +	 */
> +	__rte_aligned(16) struct txdesc {
> +		uint64_t qw0, qw1;

The documentation for __rte_aligned() says [1] it must be between the "struct" keyword and the name of the structure.
I.e. it should be:
struct __rte_aligned(16) txdesc {

[1]: https://elixir.bootlin.com/dpdk/v25.11/source/lib/eal/include/rte_common.h#L109

> +	} *txdesc = RTE_CAST_PTR(struct txdesc *, txd);
> +	txdesc->qw0 = rte_cpu_to_le_64(qw0);
> +	txdesc->qw1 = rte_cpu_to_le_64(qw1);
> +}



^ permalink raw reply	[flat|nested] 274+ messages in thread

* RE: [PATCH v5 23/35] net/intel: use separate array for desc status tracking
  2026-02-11 18:12   ` [PATCH v5 23/35] net/intel: use separate array for desc status tracking Bruce Richardson
@ 2026-02-11 21:51     ` Morten Brørup
  2026-02-12  9:15       ` Bruce Richardson
  0 siblings, 1 reply; 274+ messages in thread
From: Morten Brørup @ 2026-02-11 21:51 UTC (permalink / raw)
  To: Bruce Richardson, dev
  Cc: Anatoly Burakov, Praveen Shetty, Vladimir Medvedkin, Jingjing Wu

> Rather than writing a last_id for each individual descriptor, we can
> write one only for places where the "report status" (RS) bit is set,
> i.e. the descriptors which will be written back when done. The method
> used for marking what descriptors are free is also changed in the
> process, even if the last descriptor with the "done" bits set is past
> the expected point, we only track up to the expected point, and leave
> the rest to be counted as freed next time. This means that we always
> have the RS/DD bits set at fixed intervals, and we always track free
> slots in units of the same tx_free_thresh intervals.

I'm not saying it's good or bad, I'm simply trying to understand the performance tradeoff...
I'm wondering if spreading fields over two separate arrays is beneficial when considering cache misses.

This patch introduces a separate array, uint16_t txq->rs_last_id[], which is not in the same cache line as the txe array.

So now, two separate cache lines must be updated, rs_last_id and txe.

Previously, only txe needed updating.

Assuming both rings are cold, how many cache misses would a burst of 32 (single-segment) packets cause...
Number of cache misses in the txe ring (before this patch, and after)?
Number of cache misses in the rs_last_id ring (after this patch)?


> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>  drivers/net/intel/common/tx.h             |  4 ++
>  drivers/net/intel/common/tx_scalar.h      | 66 +++++++++++------------
>  drivers/net/intel/cpfl/cpfl_rxtx.c        | 16 ++++++
>  drivers/net/intel/i40e/i40e_rxtx.c        | 20 +++++++
>  drivers/net/intel/iavf/iavf_rxtx.c        | 19 +++++++
>  drivers/net/intel/ice/ice_rxtx.c          | 20 +++++++
>  drivers/net/intel/idpf/idpf_common_rxtx.c |  7 +++
>  drivers/net/intel/idpf/idpf_rxtx.c        | 12 +++++
>  8 files changed, 130 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/net/intel/common/tx.h
> b/drivers/net/intel/common/tx.h
> index 5da6c7c15d..acd362dca3 100644
> --- a/drivers/net/intel/common/tx.h
> +++ b/drivers/net/intel/common/tx.h
> @@ -134,6 +134,8 @@ struct ci_tx_queue {
>  		struct ci_tx_entry *sw_ring; /* virtual address of SW ring
> */
>  		struct ci_tx_entry_vec *sw_ring_vec;
>  	};
> +	/* Scalar TX path: Array tracking last_id at each RS threshold
> boundary */
> +	uint16_t *rs_last_id;
>  	uint16_t nb_tx_desc;           /* number of TX descriptors */
>  	uint16_t tx_tail; /* current value of tail register */
>  	uint16_t nb_tx_used; /* number of TX desc used since RS bit set
> */
> @@ -147,6 +149,8 @@ struct ci_tx_queue {
>  	uint16_t tx_free_thresh;
>  	/* Number of TX descriptors to use before RS bit is set. */
>  	uint16_t tx_rs_thresh;
> +	/* Scalar TX path: log2 of tx_rs_thresh for efficient bit
> operations */
> +	uint8_t log2_rs_thresh;
>  	uint16_t port_id;  /* Device port identifier. */
>  	uint16_t queue_id; /* TX queue index. */
>  	uint16_t reg_idx;
> diff --git a/drivers/net/intel/common/tx_scalar.h
> b/drivers/net/intel/common/tx_scalar.h
> index 0bc2956dcf..7499e5ed20 100644
> --- a/drivers/net/intel/common/tx_scalar.h
> +++ b/drivers/net/intel/common/tx_scalar.h
> @@ -22,33 +22,25 @@
>  static __rte_always_inline int
>  ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
>  {
> -	struct ci_tx_entry *sw_ring = txq->sw_ring;
>  	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
> -	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
> -	uint16_t nb_tx_desc = txq->nb_tx_desc;
> -	uint16_t desc_to_clean_to;
> -	uint16_t nb_tx_to_clean;
> -
> -	/* Determine the last descriptor needing to be cleaned */
> -	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq-
> >tx_rs_thresh);
> -	if (desc_to_clean_to >= nb_tx_desc)
> -		desc_to_clean_to = (uint16_t)(desc_to_clean_to -
> nb_tx_desc);
> -
> -	/* Check if descriptor is done */
> -	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
> -	if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
> rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
> -			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
> +	const uint16_t last_desc_cleaned = txq->last_desc_cleaned;
> +	const uint16_t nb_tx_desc = txq->nb_tx_desc;
> +
> +	/* Calculate where the next descriptor write-back will occur */
> +	const uint16_t rs_idx = (last_desc_cleaned == nb_tx_desc - 1) ?
> +			0 :
> +			(last_desc_cleaned + 1) >> txq->log2_rs_thresh;
> +	uint16_t desc_to_clean_to = (rs_idx << txq->log2_rs_thresh) +
> (txq->tx_rs_thresh - 1);
> +
> +	/* Check if descriptor is done  */
> +	if ((txd[txq->rs_last_id[rs_idx]].cmd_type_offset_bsz &
> +			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
> +				rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
>  		return -1;
> 
> -	/* Figure out how many descriptors will be cleaned */
> -	if (last_desc_cleaned > desc_to_clean_to)
> -		nb_tx_to_clean = (uint16_t)((nb_tx_desc -
> last_desc_cleaned) + desc_to_clean_to);
> -	else
> -		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
> last_desc_cleaned);
> -
>  	/* Update the txq to reflect the last descriptor that was cleaned
> */
>  	txq->last_desc_cleaned = desc_to_clean_to;
> -	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
> +	txq->nb_tx_free += txq->tx_rs_thresh;
> 
>  	return 0;
>  }
> @@ -219,6 +211,7 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
>  		uint16_t nb_ipsec = 0;
>  		uint64_t ipsec_qw0 = 0, ipsec_qw1 = 0;
>  		uint64_t cd_qw0 = 0, cd_qw1 = 0;
> +		uint16_t pkt_rs_idx;
>  		tx_pkt = *tx_pkts++;
> 
>  		ol_flags = tx_pkt->ol_flags;
> @@ -262,6 +255,9 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
>  		if (tx_last >= txq->nb_tx_desc)
>  			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
> 
> +		/* Track the RS threshold bucket at packet start */
> +		pkt_rs_idx = (uint16_t)(tx_id >> txq->log2_rs_thresh);
> +
>  		if (nb_used > txq->nb_tx_free) {
>  			if (ci_tx_xmit_cleanup(txq) != 0) {
>  				if (nb_tx == 0)
> @@ -302,10 +298,7 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
> 
>  			if (txe->mbuf)
>  				rte_pktmbuf_free_seg(txe->mbuf);
> -			*txe = (struct ci_tx_entry){
> -				.mbuf = tx_pkt, .last_id = tx_last, .next_id =
> tx_id
> -			};
> -
> +			txe->mbuf = tx_pkt;
>  			/* Setup TX Descriptor */
>  			td_cmd |= CI_TX_DESC_CMD_EOP;
>  			const uint64_t cmd_type_offset_bsz =
> CI_TX_DESC_DTYPE_DATA |
> @@ -332,7 +325,6 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
> 
>  			write_txd(ctx_txd, cd_qw0, cd_qw1);
> 
> -			txe->last_id = tx_last;
>  			tx_id = txe->next_id;
>  			txe = txn;
>  		}
> @@ -351,7 +343,6 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
>  			ipsec_txd[0] = ipsec_qw0;
>  			ipsec_txd[1] = ipsec_qw1;
> 
> -			txe->last_id = tx_last;
>  			tx_id = txe->next_id;
>  			txe = txn;
>  		}
> @@ -387,7 +378,6 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
>  				buf_dma_addr += CI_MAX_DATA_PER_TXD;
>  				slen -= CI_MAX_DATA_PER_TXD;
> 
> -				txe->last_id = tx_last;
>  				tx_id = txe->next_id;
>  				txe = txn;
>  				txd = &ci_tx_ring[tx_id];
> @@ -405,7 +395,6 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
>  				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S);
>  			write_txd(txd, buf_dma_addr, cmd_type_offset_bsz);
> 
> -			txe->last_id = tx_last;
>  			tx_id = txe->next_id;
>  			txe = txn;
>  			m_seg = m_seg->next;
> @@ -414,13 +403,22 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
>  		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
>  		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
> 
> -		/* set RS bit on the last descriptor of one packet */
> -		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
> +		/* Check if packet crosses into a new RS threshold bucket.
> +		 * The RS bit is set on the last descriptor when we move
> from one bucket to another.
> +		 * For example, with tx_rs_thresh=32 and a 5-descriptor
> packet using slots 30-34:
> +		 *   - pkt_rs_idx = 30 >> 5 = 0 (started in bucket 0)
> +		 *   - tx_last = 34, so 35 >> 5 = 1 (next packet is in
> bucket 1)
> +		 *   - Since 0 != 1, set RS bit on descriptor 34, and
> record rs_last_id[0] = 34
> +		 */
> +		uint16_t next_rs_idx = ((tx_last + 1) >> txq-
> >log2_rs_thresh);
> +
> +		if (next_rs_idx != pkt_rs_idx) {
> +			/* Packet crossed into a new bucket - set RS bit on
> last descriptor */
>  			txd->cmd_type_offset_bsz |=
>  					rte_cpu_to_le_64(CI_TX_DESC_CMD_RS <<
> CI_TXD_QW1_CMD_S);
> 
> -			/* Update txq RS bit counters */
> -			txq->nb_tx_used = 0;
> +			/* Record the last descriptor ID for the bucket we're
> leaving */
> +			txq->rs_last_id[pkt_rs_idx] = tx_last;
>  		}
> 
>  		if (ts_fns != NULL)
> diff --git a/drivers/net/intel/cpfl/cpfl_rxtx.c
> b/drivers/net/intel/cpfl/cpfl_rxtx.c
> index a4d15b7f9c..e7a98ed4f6 100644
> --- a/drivers/net/intel/cpfl/cpfl_rxtx.c
> +++ b/drivers/net/intel/cpfl/cpfl_rxtx.c
> @@ -5,6 +5,7 @@
>  #include <ethdev_driver.h>
>  #include <rte_net.h>
>  #include <rte_vect.h>
> +#include <rte_bitops.h>
> 
>  #include "cpfl_ethdev.h"
>  #include "cpfl_rxtx.h"
> @@ -330,6 +331,7 @@ cpfl_tx_queue_release(void *txq)
> 
>  	ci_txq_release_all_mbufs(q, q->vector_tx);
>  	rte_free(q->sw_ring);
> +	rte_free(q->rs_last_id);
>  	rte_memzone_free(q->mz);
>  	rte_free(cpfl_txq);
>  }
> @@ -572,6 +574,7 @@ cpfl_tx_queue_setup(struct rte_eth_dev *dev,
> uint16_t queue_idx,
> 
>  	txq->nb_tx_desc = nb_desc;
>  	txq->tx_rs_thresh = tx_rs_thresh;
> +	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
>  	txq->tx_free_thresh = tx_free_thresh;
>  	txq->queue_id = vport->chunks_info.tx_start_qid + queue_idx;
>  	txq->port_id = dev->data->port_id;
> @@ -605,6 +608,17 @@ cpfl_tx_queue_setup(struct rte_eth_dev *dev,
> uint16_t queue_idx,
>  		goto err_sw_ring_alloc;
>  	}
> 
> +	/* Allocate RS last_id tracking array */
> +	uint16_t num_rs_buckets = nb_desc / tx_rs_thresh;
> +	txq->rs_last_id = rte_zmalloc_socket("cpfl tx rs_last_id",
> +			sizeof(txq->rs_last_id[0]) * num_rs_buckets,
> +			RTE_CACHE_LINE_SIZE, socket_id);
> +	if (txq->rs_last_id == NULL) {
> +		PMD_INIT_LOG(ERR, "Failed to allocate memory for RS last_id
> array");
> +		ret = -ENOMEM;
> +		goto err_rs_last_id_alloc;
> +	}
> +
>  	if (!is_splitq) {
>  		txq->ci_tx_ring = mz->addr;
>  		idpf_qc_single_tx_queue_reset(txq);
> @@ -628,6 +642,8 @@ cpfl_tx_queue_setup(struct rte_eth_dev *dev,
> uint16_t queue_idx,
>  	return 0;
> 
>  err_complq_setup:
> +	rte_free(txq->rs_last_id);
> +err_rs_last_id_alloc:
>  	rte_free(txq->sw_ring);
>  err_sw_ring_alloc:
>  	cpfl_dma_zone_release(mz);
> diff --git a/drivers/net/intel/i40e/i40e_rxtx.c
> b/drivers/net/intel/i40e/i40e_rxtx.c
> index dfd2213020..b554bc6c31 100644
> --- a/drivers/net/intel/i40e/i40e_rxtx.c
> +++ b/drivers/net/intel/i40e/i40e_rxtx.c
> @@ -24,6 +24,7 @@
>  #include <rte_ip.h>
>  #include <rte_net.h>
>  #include <rte_vect.h>
> +#include <rte_bitops.h>
> 
>  #include "i40e_logs.h"
>  #include "base/i40e_prototype.h"
> @@ -2280,6 +2281,13 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
>  			     (int)queue_idx);
>  		return I40E_ERR_PARAM;
>  	}
> +	if (!rte_is_power_of_2(tx_rs_thresh)) {
> +		PMD_INIT_LOG(ERR, "tx_rs_thresh must be a power of 2.
> (tx_rs_thresh=%u port=%d queue=%d)",
> +			     (unsigned int)tx_rs_thresh,
> +			     (int)dev->data->port_id,
> +			     (int)queue_idx);
> +		return I40E_ERR_PARAM;
> +	}
>  	if ((tx_rs_thresh > 1) && (tx_conf->tx_thresh.wthresh != 0)) {
>  		PMD_INIT_LOG(ERR, "TX WTHRESH must be set to 0 if "
>  			     "tx_rs_thresh is greater than 1. "
> @@ -2321,6 +2329,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
>  	txq->mz = tz;
>  	txq->nb_tx_desc = nb_desc;
>  	txq->tx_rs_thresh = tx_rs_thresh;
> +	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
>  	txq->tx_free_thresh = tx_free_thresh;
>  	txq->queue_id = queue_idx;
>  	txq->reg_idx = reg_idx;
> @@ -2346,6 +2355,16 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
>  		return -ENOMEM;
>  	}
> 
> +	/* Allocate RS last_id tracking array */
> +	uint16_t num_rs_buckets = nb_desc / tx_rs_thresh;
> +	txq->rs_last_id = rte_zmalloc_socket(NULL, sizeof(txq-
> >rs_last_id[0]) * num_rs_buckets,
> +			RTE_CACHE_LINE_SIZE, socket_id);
> +	if (txq->rs_last_id == NULL) {
> +		i40e_tx_queue_release(txq);
> +		PMD_DRV_LOG(ERR, "Failed to allocate memory for RS last_id
> array");
> +		return -ENOMEM;
> +	}
> +
>  	i40e_reset_tx_queue(txq);
>  	txq->q_set = TRUE;
> 
> @@ -2391,6 +2410,7 @@ i40e_tx_queue_release(void *txq)
> 
>  	ci_txq_release_all_mbufs(q, false);
>  	rte_free(q->sw_ring);
> +	rte_free(q->rs_last_id);
>  	rte_memzone_free(q->mz);
>  	rte_free(q);
>  }
> diff --git a/drivers/net/intel/iavf/iavf_rxtx.c
> b/drivers/net/intel/iavf/iavf_rxtx.c
> index 67906841da..d63590d660 100644
> --- a/drivers/net/intel/iavf/iavf_rxtx.c
> +++ b/drivers/net/intel/iavf/iavf_rxtx.c
> @@ -25,6 +25,7 @@
>  #include <rte_ip.h>
>  #include <rte_net.h>
>  #include <rte_vect.h>
> +#include <rte_bitops.h>
>  #include <rte_vxlan.h>
>  #include <rte_gtp.h>
>  #include <rte_geneve.h>
> @@ -194,6 +195,11 @@ check_tx_thresh(uint16_t nb_desc, uint16_t
> tx_rs_thresh,
>  			     tx_rs_thresh, nb_desc);
>  		return -EINVAL;
>  	}
> +	if (!rte_is_power_of_2(tx_rs_thresh)) {
> +		PMD_INIT_LOG(ERR, "tx_rs_thresh must be a power of 2.
> (tx_rs_thresh=%u)",
> +			     tx_rs_thresh);
> +		return -EINVAL;
> +	}
> 
>  	return 0;
>  }
> @@ -801,6 +807,7 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
> 
>  	txq->nb_tx_desc = nb_desc;
>  	txq->tx_rs_thresh = tx_rs_thresh;
> +	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
>  	txq->tx_free_thresh = tx_free_thresh;
>  	txq->queue_id = queue_idx;
>  	txq->port_id = dev->data->port_id;
> @@ -826,6 +833,17 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
>  		return -ENOMEM;
>  	}
> 
> +	/* Allocate RS last_id tracking array */
> +	uint16_t num_rs_buckets = nb_desc / tx_rs_thresh;
> +	txq->rs_last_id = rte_zmalloc_socket(NULL, sizeof(txq-
> >rs_last_id[0]) * num_rs_buckets,
> +			RTE_CACHE_LINE_SIZE, socket_id);
> +	if (txq->rs_last_id == NULL) {
> +		PMD_INIT_LOG(ERR, "Failed to allocate memory for RS last_id
> array");
> +		rte_free(txq->sw_ring);
> +		rte_free(txq);
> +		return -ENOMEM;
> +	}
> +
>  	/* Allocate TX hardware ring descriptors. */
>  	ring_size = sizeof(struct ci_tx_desc) * IAVF_MAX_RING_DESC;
>  	ring_size = RTE_ALIGN(ring_size, IAVF_DMA_MEM_ALIGN);
> @@ -1050,6 +1068,7 @@ iavf_dev_tx_queue_release(struct rte_eth_dev
> *dev, uint16_t qid)
> 
>  	ci_txq_release_all_mbufs(q, q->use_ctx);
>  	rte_free(q->sw_ring);
> +	rte_free(q->rs_last_id);
>  	rte_memzone_free(q->mz);
>  	rte_free(q);
>  }
> diff --git a/drivers/net/intel/ice/ice_rxtx.c
> b/drivers/net/intel/ice/ice_rxtx.c
> index 111cb5e37f..2915223397 100644
> --- a/drivers/net/intel/ice/ice_rxtx.c
> +++ b/drivers/net/intel/ice/ice_rxtx.c
> @@ -5,6 +5,7 @@
>  #include <ethdev_driver.h>
>  #include <rte_net.h>
>  #include <rte_vect.h>
> +#include <rte_bitops.h>
> 
>  #include "ice_rxtx.h"
>  #include "ice_rxtx_vec_common.h"
> @@ -1589,6 +1590,13 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
>  			     (int)queue_idx);
>  		return -EINVAL;
>  	}
> +	if (!rte_is_power_of_2(tx_rs_thresh)) {
> +		PMD_INIT_LOG(ERR, "tx_rs_thresh must be a power of 2.
> (tx_rs_thresh=%u port=%d queue=%d)",
> +			     (unsigned int)tx_rs_thresh,
> +			     (int)dev->data->port_id,
> +			     (int)queue_idx);
> +		return -EINVAL;
> +	}
>  	if (tx_rs_thresh > 1 && tx_conf->tx_thresh.wthresh != 0) {
>  		PMD_INIT_LOG(ERR, "TX WTHRESH must be set to 0 if "
>  			     "tx_rs_thresh is greater than 1. "
> @@ -1631,6 +1639,7 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
>  	txq->mz = tz;
>  	txq->nb_tx_desc = nb_desc;
>  	txq->tx_rs_thresh = tx_rs_thresh;
> +	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
>  	txq->tx_free_thresh = tx_free_thresh;
>  	txq->queue_id = queue_idx;
> 
> @@ -1657,6 +1666,16 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
>  		return -ENOMEM;
>  	}
> 
> +	/* Allocate RS last_id tracking array */
> +	uint16_t num_rs_buckets = nb_desc / tx_rs_thresh;
> +	txq->rs_last_id = rte_zmalloc_socket(NULL, sizeof(txq-
> >rs_last_id[0]) * num_rs_buckets,
> +			RTE_CACHE_LINE_SIZE, socket_id);
> +	if (txq->rs_last_id == NULL) {
> +		ice_tx_queue_release(txq);
> +		PMD_INIT_LOG(ERR, "Failed to allocate memory for RS last_id
> array");
> +		return -ENOMEM;
> +	}
> +
>  	if (vsi->type == ICE_VSI_PF && (offloads &
> RTE_ETH_TX_OFFLOAD_SEND_ON_TIMESTAMP)) {
>  		if (hw->phy_model != ICE_PHY_E830) {
>  			ice_tx_queue_release(txq);
> @@ -1729,6 +1748,7 @@ ice_tx_queue_release(void *txq)
> 
>  	ci_txq_release_all_mbufs(q, false);
>  	rte_free(q->sw_ring);
> +	rte_free(q->rs_last_id);
>  	if (q->tsq) {
>  		rte_memzone_free(q->tsq->ts_mz);
>  		rte_free(q->tsq);
> diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c
> b/drivers/net/intel/idpf/idpf_common_rxtx.c
> index 77f4099f2b..04db8823eb 100644
> --- a/drivers/net/intel/idpf/idpf_common_rxtx.c
> +++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
> @@ -5,6 +5,7 @@
>  #include <eal_export.h>
>  #include <rte_mbuf_dyn.h>
>  #include <rte_errno.h>
> +#include <rte_bitops.h>
> 
>  #include "idpf_common_rxtx.h"
>  #include "idpf_common_device.h"
> @@ -73,6 +74,11 @@ idpf_qc_tx_thresh_check(uint16_t nb_desc, uint16_t
> tx_rs_thresh,
>  			tx_rs_thresh, nb_desc);
>  		return -EINVAL;
>  	}
> +	if (!rte_is_power_of_2(tx_rs_thresh)) {
> +		DRV_LOG(ERR, "tx_rs_thresh must be a power of 2.
> (tx_rs_thresh=%u)",
> +			tx_rs_thresh);
> +		return -EINVAL;
> +	}
> 
>  	return 0;
>  }
> @@ -333,6 +339,7 @@ idpf_qc_tx_queue_release(void *txq)
>  	}
> 
>  	ci_txq_release_all_mbufs(q, false);
> +	rte_free(q->rs_last_id);
>  	rte_free(q->sw_ring);
>  	rte_memzone_free(q->mz);
>  	rte_free(q);
> diff --git a/drivers/net/intel/idpf/idpf_rxtx.c
> b/drivers/net/intel/idpf/idpf_rxtx.c
> index 7d9c885458..9420200f6d 100644
> --- a/drivers/net/intel/idpf/idpf_rxtx.c
> +++ b/drivers/net/intel/idpf/idpf_rxtx.c
> @@ -447,6 +447,7 @@ idpf_tx_queue_setup(struct rte_eth_dev *dev,
> uint16_t queue_idx,
> 
>  	txq->nb_tx_desc = nb_desc;
>  	txq->tx_rs_thresh = tx_rs_thresh;
> +	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
>  	txq->tx_free_thresh = tx_free_thresh;
>  	txq->queue_id = vport->chunks_info.tx_start_qid + queue_idx;
>  	txq->port_id = dev->data->port_id;
> @@ -480,6 +481,15 @@ idpf_tx_queue_setup(struct rte_eth_dev *dev,
> uint16_t queue_idx,
>  		goto err_sw_ring_alloc;
>  	}
> 
> +	txq->rs_last_id = rte_zmalloc_socket("idpf tx rs_last_id",
> +			sizeof(txq->rs_last_id[0]) * (nb_desc >> txq-
> >log2_rs_thresh),
> +			RTE_CACHE_LINE_SIZE, socket_id);
> +	if (txq->rs_last_id == NULL) {
> +		PMD_INIT_LOG(ERR, "Failed to allocate memory for TX RS
> tracking");
> +		ret = -ENOMEM;
> +		goto err_rs_last_id_alloc;
> +	}
> +
>  	if (!is_splitq) {
>  		txq->ci_tx_ring = mz->addr;
>  		idpf_qc_single_tx_queue_reset(txq);
> @@ -502,6 +512,8 @@ idpf_tx_queue_setup(struct rte_eth_dev *dev,
> uint16_t queue_idx,
>  	return 0;
> 
>  err_complq_setup:
> +	rte_free(txq->rs_last_id);
> +err_rs_last_id_alloc:
>  	rte_free(txq->sw_ring);
>  err_sw_ring_alloc:
>  	idpf_dma_zone_release(mz);
> --
> 2.51.0


^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v5 20/35] net/intel: write descriptors using non-volatile pointers
  2026-02-11 21:14     ` Morten Brørup
@ 2026-02-12  8:43       ` Bruce Richardson
  0 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-12  8:43 UTC (permalink / raw)
  To: Morten Brørup; +Cc: dev

On Wed, Feb 11, 2026 at 10:14:20PM +0100, Morten Brørup wrote:
> > +static inline void
> > +write_txd(volatile void *txd, uint64_t qw0, uint64_t qw1)
> > +{
> > +	/* we use an aligned structure and cast away the volatile to
> > allow the compiler
> > +	 * to opportunistically optimize the two 64-bit writes as a
> > single 128-bit write.
> > +	 */
> > +	__rte_aligned(16) struct txdesc {
> > +		uint64_t qw0, qw1;
> 
> The documentation for __rte_aligned() says [1] it must be between the "struct" keyword and the name of the structure.
> I.e. it should be:
> struct __rte_aligned(16) txdesc {
> 
Thanks. I should read the docs more! If no other patches need rework, I'll
just fix this on apply.

/Bruce

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v5 23/35] net/intel: use separate array for desc status tracking
  2026-02-11 21:51     ` Morten Brørup
@ 2026-02-12  9:15       ` Bruce Richardson
  2026-02-12 12:38         ` Morten Brørup
  0 siblings, 1 reply; 274+ messages in thread
From: Bruce Richardson @ 2026-02-12  9:15 UTC (permalink / raw)
  To: Morten Brørup
  Cc: dev, Anatoly Burakov, Praveen Shetty, Vladimir Medvedkin,
	Jingjing Wu

On Wed, Feb 11, 2026 at 10:51:56PM +0100, Morten Brørup wrote:
> > Rather than writing a last_id for each individual descriptor, we can
> > write one only for places where the "report status" (RS) bit is set,
> > i.e. the descriptors which will be written back when done. The method
> > used for marking what descriptors are free is also changed in the
> > process, even if the last descriptor with the "done" bits set is past
> > the expected point, we only track up to the expected point, and leave
> > the rest to be counted as freed next time. This means that we always
> > have the RS/DD bits set at fixed intervals, and we always track free
> > slots in units of the same tx_free_thresh intervals.
> 
> I'm not saying it's good or bad, I'm simply trying to understand the performance tradeoff...
> I'm wondering if spreading fields over two separate arrays is beneficial when considering cache misses.
> 
> This patch introduces a separate array, uint16_t txq->rs_last_id[], which is not in the same cache line as the txe array.
> 
> So now, two separate cache lines must be updated, rs_last_id and txe.
> 
> Previously, only txe needed updating.
> 
> Assuming both rings are cold, how many cache misses would a burst of 32 (single-segment) packets cause...
> Number of cache misses in the txe ring (before this patch, and after)?
> Number of cache misses in the rs_last_id ring (after this patch)?
> 

The main txe ring has 4 elements per cacheline both before and after this
patch. As an aside, 6 bytes of each 16 is wasted, and I have tried a couple of
times to get us down to an 8 byte ring element and each time performance
has dropped, presumably due to extra computation and branching for ring
wraparound when we remove the precomputed next index values.

Anyway, for 32 single-segment packets, we obviously touch 8 cachelines both
before an after this patch in the txe array. In the next_rs array, we
should just read a single index, of 16 bytes, so the overall array has a
fairly small cache footprint, 2 bytes per 32 ring entries. [Or one
cacheline for a 1024-sized ring]

However, the performance improvements given by the rework of the cleanup
handling are very noticible and not really to do with memory/cache
footprint so much as doing less work, and less branching. For example:

* before this patch, on transmit we stored the index of the last descriptor
  for each segment of each packet. It was to a hot cacheline, but it was still
  (if no coalescing) potentially 32 stores per burst. After this patch, we
  instead only store the index of the last segment for the packet crossing
  the next multiple of 32 boundary. Going from potentially 32 stores to 1.

* When doing cleanup post transmit, we still have an array lookup to
  determine the location of the NIC write back to indicate descriptor done.
  However, while before this was being done to the txe array - which was
  written (ring_size - 32) packets ago - it's now done by reading the new,
  tiny rs_next_id array which is more likely to be in cache.

* finally, when doing the actual buffer freeing, before we would have an
  arbitrary number of buffers - though normally 32 - to be freed from the
  ring starting at an arbitrary location. After this patch, we always have
  exactly 32 (or more correctly rs_thresh, which must divide evenly into
  ring_size) buffers to be freed starting at an index which is a multiple of
  32, and which therefore, most importantly, guarantees that we do not have
  any wraparound as part of the buffer free process. This means no branches
  for idx >= ring_size etc., and that we are free to treat the buffers to
  be freed as being stored in a linear array.

> 
> > 
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
> > ---
> >  drivers/net/intel/common/tx.h             |  4 ++
> >  drivers/net/intel/common/tx_scalar.h      | 66 +++++++++++------------
> >  drivers/net/intel/cpfl/cpfl_rxtx.c        | 16 ++++++
> >  drivers/net/intel/i40e/i40e_rxtx.c        | 20 +++++++
> >  drivers/net/intel/iavf/iavf_rxtx.c        | 19 +++++++
> >  drivers/net/intel/ice/ice_rxtx.c          | 20 +++++++
> >  drivers/net/intel/idpf/idpf_common_rxtx.c |  7 +++
> >  drivers/net/intel/idpf/idpf_rxtx.c        | 12 +++++
> >  8 files changed, 130 insertions(+), 34 deletions(-)
> > 
> > diff --git a/drivers/net/intel/common/tx.h
> > b/drivers/net/intel/common/tx.h
> > index 5da6c7c15d..acd362dca3 100644
> > --- a/drivers/net/intel/common/tx.h
> > +++ b/drivers/net/intel/common/tx.h
> > @@ -134,6 +134,8 @@ struct ci_tx_queue {
> >  		struct ci_tx_entry *sw_ring; /* virtual address of SW ring
> > */
> >  		struct ci_tx_entry_vec *sw_ring_vec;
> >  	};
> > +	/* Scalar TX path: Array tracking last_id at each RS threshold
> > boundary */
> > +	uint16_t *rs_last_id;
> >  	uint16_t nb_tx_desc;           /* number of TX descriptors */
> >  	uint16_t tx_tail; /* current value of tail register */
> >  	uint16_t nb_tx_used; /* number of TX desc used since RS bit set
> > */
> > @@ -147,6 +149,8 @@ struct ci_tx_queue {
> >  	uint16_t tx_free_thresh;
> >  	/* Number of TX descriptors to use before RS bit is set. */
> >  	uint16_t tx_rs_thresh;
> > +	/* Scalar TX path: log2 of tx_rs_thresh for efficient bit
> > operations */
> > +	uint8_t log2_rs_thresh;
> >  	uint16_t port_id;  /* Device port identifier. */
> >  	uint16_t queue_id; /* TX queue index. */
> >  	uint16_t reg_idx;
> > diff --git a/drivers/net/intel/common/tx_scalar.h
> > b/drivers/net/intel/common/tx_scalar.h
> > index 0bc2956dcf..7499e5ed20 100644
> > --- a/drivers/net/intel/common/tx_scalar.h
> > +++ b/drivers/net/intel/common/tx_scalar.h
> > @@ -22,33 +22,25 @@
> >  static __rte_always_inline int
> >  ci_tx_xmit_cleanup(struct ci_tx_queue *txq)
> >  {
> > -	struct ci_tx_entry *sw_ring = txq->sw_ring;
> >  	volatile struct ci_tx_desc *txd = txq->ci_tx_ring;
> > -	uint16_t last_desc_cleaned = txq->last_desc_cleaned;
> > -	uint16_t nb_tx_desc = txq->nb_tx_desc;
> > -	uint16_t desc_to_clean_to;
> > -	uint16_t nb_tx_to_clean;
> > -
> > -	/* Determine the last descriptor needing to be cleaned */
> > -	desc_to_clean_to = (uint16_t)(last_desc_cleaned + txq-
> > >tx_rs_thresh);
> > -	if (desc_to_clean_to >= nb_tx_desc)
> > -		desc_to_clean_to = (uint16_t)(desc_to_clean_to -
> > nb_tx_desc);
> > -
> > -	/* Check if descriptor is done */
> > -	desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
> > -	if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
> > rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
> > -			rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
> > +	const uint16_t last_desc_cleaned = txq->last_desc_cleaned;
> > +	const uint16_t nb_tx_desc = txq->nb_tx_desc;
> > +
> > +	/* Calculate where the next descriptor write-back will occur */
> > +	const uint16_t rs_idx = (last_desc_cleaned == nb_tx_desc - 1) ?
> > +			0 :
> > +			(last_desc_cleaned + 1) >> txq->log2_rs_thresh;
> > +	uint16_t desc_to_clean_to = (rs_idx << txq->log2_rs_thresh) +
> > (txq->tx_rs_thresh - 1);
> > +
> > +	/* Check if descriptor is done  */
> > +	if ((txd[txq->rs_last_id[rs_idx]].cmd_type_offset_bsz &
> > +			rte_cpu_to_le_64(CI_TXD_QW1_DTYPE_M)) !=
> > +				rte_cpu_to_le_64(CI_TX_DESC_DTYPE_DESC_DONE))
> >  		return -1;
> > 
> > -	/* Figure out how many descriptors will be cleaned */
> > -	if (last_desc_cleaned > desc_to_clean_to)
> > -		nb_tx_to_clean = (uint16_t)((nb_tx_desc -
> > last_desc_cleaned) + desc_to_clean_to);
> > -	else
> > -		nb_tx_to_clean = (uint16_t)(desc_to_clean_to -
> > last_desc_cleaned);
> > -
> >  	/* Update the txq to reflect the last descriptor that was cleaned
> > */
> >  	txq->last_desc_cleaned = desc_to_clean_to;
> > -	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + nb_tx_to_clean);
> > +	txq->nb_tx_free += txq->tx_rs_thresh;
> > 
> >  	return 0;
> >  }
> > @@ -219,6 +211,7 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
> >  		uint16_t nb_ipsec = 0;
> >  		uint64_t ipsec_qw0 = 0, ipsec_qw1 = 0;
> >  		uint64_t cd_qw0 = 0, cd_qw1 = 0;
> > +		uint16_t pkt_rs_idx;
> >  		tx_pkt = *tx_pkts++;
> > 
> >  		ol_flags = tx_pkt->ol_flags;
> > @@ -262,6 +255,9 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
> >  		if (tx_last >= txq->nb_tx_desc)
> >  			tx_last = (uint16_t)(tx_last - txq->nb_tx_desc);
> > 
> > +		/* Track the RS threshold bucket at packet start */
> > +		pkt_rs_idx = (uint16_t)(tx_id >> txq->log2_rs_thresh);
> > +
> >  		if (nb_used > txq->nb_tx_free) {
> >  			if (ci_tx_xmit_cleanup(txq) != 0) {
> >  				if (nb_tx == 0)
> > @@ -302,10 +298,7 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
> > 
> >  			if (txe->mbuf)
> >  				rte_pktmbuf_free_seg(txe->mbuf);
> > -			*txe = (struct ci_tx_entry){
> > -				.mbuf = tx_pkt, .last_id = tx_last, .next_id =
> > tx_id
> > -			};
> > -
> > +			txe->mbuf = tx_pkt;
> >  			/* Setup TX Descriptor */
> >  			td_cmd |= CI_TX_DESC_CMD_EOP;
> >  			const uint64_t cmd_type_offset_bsz =
> > CI_TX_DESC_DTYPE_DATA |
> > @@ -332,7 +325,6 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
> > 
> >  			write_txd(ctx_txd, cd_qw0, cd_qw1);
> > 
> > -			txe->last_id = tx_last;
> >  			tx_id = txe->next_id;
> >  			txe = txn;
> >  		}
> > @@ -351,7 +343,6 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
> >  			ipsec_txd[0] = ipsec_qw0;
> >  			ipsec_txd[1] = ipsec_qw1;
> > 
> > -			txe->last_id = tx_last;
> >  			tx_id = txe->next_id;
> >  			txe = txn;
> >  		}
> > @@ -387,7 +378,6 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
> >  				buf_dma_addr += CI_MAX_DATA_PER_TXD;
> >  				slen -= CI_MAX_DATA_PER_TXD;
> > 
> > -				txe->last_id = tx_last;
> >  				tx_id = txe->next_id;
> >  				txe = txn;
> >  				txd = &ci_tx_ring[tx_id];
> > @@ -405,7 +395,6 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
> >  				((uint64_t)td_tag << CI_TXD_QW1_L2TAG1_S);
> >  			write_txd(txd, buf_dma_addr, cmd_type_offset_bsz);
> > 
> > -			txe->last_id = tx_last;
> >  			tx_id = txe->next_id;
> >  			txe = txn;
> >  			m_seg = m_seg->next;
> > @@ -414,13 +403,22 @@ ci_xmit_pkts(struct ci_tx_queue *txq,
> >  		txq->nb_tx_used = (uint16_t)(txq->nb_tx_used + nb_used);
> >  		txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_used);
> > 
> > -		/* set RS bit on the last descriptor of one packet */
> > -		if (txq->nb_tx_used >= txq->tx_rs_thresh) {
> > +		/* Check if packet crosses into a new RS threshold bucket.
> > +		 * The RS bit is set on the last descriptor when we move
> > from one bucket to another.
> > +		 * For example, with tx_rs_thresh=32 and a 5-descriptor
> > packet using slots 30-34:
> > +		 *   - pkt_rs_idx = 30 >> 5 = 0 (started in bucket 0)
> > +		 *   - tx_last = 34, so 35 >> 5 = 1 (next packet is in
> > bucket 1)
> > +		 *   - Since 0 != 1, set RS bit on descriptor 34, and
> > record rs_last_id[0] = 34
> > +		 */
> > +		uint16_t next_rs_idx = ((tx_last + 1) >> txq-
> > >log2_rs_thresh);
> > +
> > +		if (next_rs_idx != pkt_rs_idx) {
> > +			/* Packet crossed into a new bucket - set RS bit on
> > last descriptor */
> >  			txd->cmd_type_offset_bsz |=
> >  					rte_cpu_to_le_64(CI_TX_DESC_CMD_RS <<
> > CI_TXD_QW1_CMD_S);
> > 
> > -			/* Update txq RS bit counters */
> > -			txq->nb_tx_used = 0;
> > +			/* Record the last descriptor ID for the bucket we're
> > leaving */
> > +			txq->rs_last_id[pkt_rs_idx] = tx_last;
> >  		}
> > 
> >  		if (ts_fns != NULL)
> > diff --git a/drivers/net/intel/cpfl/cpfl_rxtx.c
> > b/drivers/net/intel/cpfl/cpfl_rxtx.c
> > index a4d15b7f9c..e7a98ed4f6 100644
> > --- a/drivers/net/intel/cpfl/cpfl_rxtx.c
> > +++ b/drivers/net/intel/cpfl/cpfl_rxtx.c
> > @@ -5,6 +5,7 @@
> >  #include <ethdev_driver.h>
> >  #include <rte_net.h>
> >  #include <rte_vect.h>
> > +#include <rte_bitops.h>
> > 
> >  #include "cpfl_ethdev.h"
> >  #include "cpfl_rxtx.h"
> > @@ -330,6 +331,7 @@ cpfl_tx_queue_release(void *txq)
> > 
> >  	ci_txq_release_all_mbufs(q, q->vector_tx);
> >  	rte_free(q->sw_ring);
> > +	rte_free(q->rs_last_id);
> >  	rte_memzone_free(q->mz);
> >  	rte_free(cpfl_txq);
> >  }
> > @@ -572,6 +574,7 @@ cpfl_tx_queue_setup(struct rte_eth_dev *dev,
> > uint16_t queue_idx,
> > 
> >  	txq->nb_tx_desc = nb_desc;
> >  	txq->tx_rs_thresh = tx_rs_thresh;
> > +	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
> >  	txq->tx_free_thresh = tx_free_thresh;
> >  	txq->queue_id = vport->chunks_info.tx_start_qid + queue_idx;
> >  	txq->port_id = dev->data->port_id;
> > @@ -605,6 +608,17 @@ cpfl_tx_queue_setup(struct rte_eth_dev *dev,
> > uint16_t queue_idx,
> >  		goto err_sw_ring_alloc;
> >  	}
> > 
> > +	/* Allocate RS last_id tracking array */
> > +	uint16_t num_rs_buckets = nb_desc / tx_rs_thresh;
> > +	txq->rs_last_id = rte_zmalloc_socket("cpfl tx rs_last_id",
> > +			sizeof(txq->rs_last_id[0]) * num_rs_buckets,
> > +			RTE_CACHE_LINE_SIZE, socket_id);
> > +	if (txq->rs_last_id == NULL) {
> > +		PMD_INIT_LOG(ERR, "Failed to allocate memory for RS last_id
> > array");
> > +		ret = -ENOMEM;
> > +		goto err_rs_last_id_alloc;
> > +	}
> > +
> >  	if (!is_splitq) {
> >  		txq->ci_tx_ring = mz->addr;
> >  		idpf_qc_single_tx_queue_reset(txq);
> > @@ -628,6 +642,8 @@ cpfl_tx_queue_setup(struct rte_eth_dev *dev,
> > uint16_t queue_idx,
> >  	return 0;
> > 
> >  err_complq_setup:
> > +	rte_free(txq->rs_last_id);
> > +err_rs_last_id_alloc:
> >  	rte_free(txq->sw_ring);
> >  err_sw_ring_alloc:
> >  	cpfl_dma_zone_release(mz);
> > diff --git a/drivers/net/intel/i40e/i40e_rxtx.c
> > b/drivers/net/intel/i40e/i40e_rxtx.c
> > index dfd2213020..b554bc6c31 100644
> > --- a/drivers/net/intel/i40e/i40e_rxtx.c
> > +++ b/drivers/net/intel/i40e/i40e_rxtx.c
> > @@ -24,6 +24,7 @@
> >  #include <rte_ip.h>
> >  #include <rte_net.h>
> >  #include <rte_vect.h>
> > +#include <rte_bitops.h>
> > 
> >  #include "i40e_logs.h"
> >  #include "base/i40e_prototype.h"
> > @@ -2280,6 +2281,13 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
> >  			     (int)queue_idx);
> >  		return I40E_ERR_PARAM;
> >  	}
> > +	if (!rte_is_power_of_2(tx_rs_thresh)) {
> > +		PMD_INIT_LOG(ERR, "tx_rs_thresh must be a power of 2.
> > (tx_rs_thresh=%u port=%d queue=%d)",
> > +			     (unsigned int)tx_rs_thresh,
> > +			     (int)dev->data->port_id,
> > +			     (int)queue_idx);
> > +		return I40E_ERR_PARAM;
> > +	}
> >  	if ((tx_rs_thresh > 1) && (tx_conf->tx_thresh.wthresh != 0)) {
> >  		PMD_INIT_LOG(ERR, "TX WTHRESH must be set to 0 if "
> >  			     "tx_rs_thresh is greater than 1. "
> > @@ -2321,6 +2329,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
> >  	txq->mz = tz;
> >  	txq->nb_tx_desc = nb_desc;
> >  	txq->tx_rs_thresh = tx_rs_thresh;
> > +	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
> >  	txq->tx_free_thresh = tx_free_thresh;
> >  	txq->queue_id = queue_idx;
> >  	txq->reg_idx = reg_idx;
> > @@ -2346,6 +2355,16 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
> >  		return -ENOMEM;
> >  	}
> > 
> > +	/* Allocate RS last_id tracking array */
> > +	uint16_t num_rs_buckets = nb_desc / tx_rs_thresh;
> > +	txq->rs_last_id = rte_zmalloc_socket(NULL, sizeof(txq-
> > >rs_last_id[0]) * num_rs_buckets,
> > +			RTE_CACHE_LINE_SIZE, socket_id);
> > +	if (txq->rs_last_id == NULL) {
> > +		i40e_tx_queue_release(txq);
> > +		PMD_DRV_LOG(ERR, "Failed to allocate memory for RS last_id
> > array");
> > +		return -ENOMEM;
> > +	}
> > +
> >  	i40e_reset_tx_queue(txq);
> >  	txq->q_set = TRUE;
> > 
> > @@ -2391,6 +2410,7 @@ i40e_tx_queue_release(void *txq)
> > 
> >  	ci_txq_release_all_mbufs(q, false);
> >  	rte_free(q->sw_ring);
> > +	rte_free(q->rs_last_id);
> >  	rte_memzone_free(q->mz);
> >  	rte_free(q);
> >  }
> > diff --git a/drivers/net/intel/iavf/iavf_rxtx.c
> > b/drivers/net/intel/iavf/iavf_rxtx.c
> > index 67906841da..d63590d660 100644
> > --- a/drivers/net/intel/iavf/iavf_rxtx.c
> > +++ b/drivers/net/intel/iavf/iavf_rxtx.c
> > @@ -25,6 +25,7 @@
> >  #include <rte_ip.h>
> >  #include <rte_net.h>
> >  #include <rte_vect.h>
> > +#include <rte_bitops.h>
> >  #include <rte_vxlan.h>
> >  #include <rte_gtp.h>
> >  #include <rte_geneve.h>
> > @@ -194,6 +195,11 @@ check_tx_thresh(uint16_t nb_desc, uint16_t
> > tx_rs_thresh,
> >  			     tx_rs_thresh, nb_desc);
> >  		return -EINVAL;
> >  	}
> > +	if (!rte_is_power_of_2(tx_rs_thresh)) {
> > +		PMD_INIT_LOG(ERR, "tx_rs_thresh must be a power of 2.
> > (tx_rs_thresh=%u)",
> > +			     tx_rs_thresh);
> > +		return -EINVAL;
> > +	}
> > 
> >  	return 0;
> >  }
> > @@ -801,6 +807,7 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
> > 
> >  	txq->nb_tx_desc = nb_desc;
> >  	txq->tx_rs_thresh = tx_rs_thresh;
> > +	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
> >  	txq->tx_free_thresh = tx_free_thresh;
> >  	txq->queue_id = queue_idx;
> >  	txq->port_id = dev->data->port_id;
> > @@ -826,6 +833,17 @@ iavf_dev_tx_queue_setup(struct rte_eth_dev *dev,
> >  		return -ENOMEM;
> >  	}
> > 
> > +	/* Allocate RS last_id tracking array */
> > +	uint16_t num_rs_buckets = nb_desc / tx_rs_thresh;
> > +	txq->rs_last_id = rte_zmalloc_socket(NULL, sizeof(txq-
> > >rs_last_id[0]) * num_rs_buckets,
> > +			RTE_CACHE_LINE_SIZE, socket_id);
> > +	if (txq->rs_last_id == NULL) {
> > +		PMD_INIT_LOG(ERR, "Failed to allocate memory for RS last_id
> > array");
> > +		rte_free(txq->sw_ring);
> > +		rte_free(txq);
> > +		return -ENOMEM;
> > +	}
> > +
> >  	/* Allocate TX hardware ring descriptors. */
> >  	ring_size = sizeof(struct ci_tx_desc) * IAVF_MAX_RING_DESC;
> >  	ring_size = RTE_ALIGN(ring_size, IAVF_DMA_MEM_ALIGN);
> > @@ -1050,6 +1068,7 @@ iavf_dev_tx_queue_release(struct rte_eth_dev
> > *dev, uint16_t qid)
> > 
> >  	ci_txq_release_all_mbufs(q, q->use_ctx);
> >  	rte_free(q->sw_ring);
> > +	rte_free(q->rs_last_id);
> >  	rte_memzone_free(q->mz);
> >  	rte_free(q);
> >  }
> > diff --git a/drivers/net/intel/ice/ice_rxtx.c
> > b/drivers/net/intel/ice/ice_rxtx.c
> > index 111cb5e37f..2915223397 100644
> > --- a/drivers/net/intel/ice/ice_rxtx.c
> > +++ b/drivers/net/intel/ice/ice_rxtx.c
> > @@ -5,6 +5,7 @@
> >  #include <ethdev_driver.h>
> >  #include <rte_net.h>
> >  #include <rte_vect.h>
> > +#include <rte_bitops.h>
> > 
> >  #include "ice_rxtx.h"
> >  #include "ice_rxtx_vec_common.h"
> > @@ -1589,6 +1590,13 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
> >  			     (int)queue_idx);
> >  		return -EINVAL;
> >  	}
> > +	if (!rte_is_power_of_2(tx_rs_thresh)) {
> > +		PMD_INIT_LOG(ERR, "tx_rs_thresh must be a power of 2.
> > (tx_rs_thresh=%u port=%d queue=%d)",
> > +			     (unsigned int)tx_rs_thresh,
> > +			     (int)dev->data->port_id,
> > +			     (int)queue_idx);
> > +		return -EINVAL;
> > +	}
> >  	if (tx_rs_thresh > 1 && tx_conf->tx_thresh.wthresh != 0) {
> >  		PMD_INIT_LOG(ERR, "TX WTHRESH must be set to 0 if "
> >  			     "tx_rs_thresh is greater than 1. "
> > @@ -1631,6 +1639,7 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
> >  	txq->mz = tz;
> >  	txq->nb_tx_desc = nb_desc;
> >  	txq->tx_rs_thresh = tx_rs_thresh;
> > +	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
> >  	txq->tx_free_thresh = tx_free_thresh;
> >  	txq->queue_id = queue_idx;
> > 
> > @@ -1657,6 +1666,16 @@ ice_tx_queue_setup(struct rte_eth_dev *dev,
> >  		return -ENOMEM;
> >  	}
> > 
> > +	/* Allocate RS last_id tracking array */
> > +	uint16_t num_rs_buckets = nb_desc / tx_rs_thresh;
> > +	txq->rs_last_id = rte_zmalloc_socket(NULL, sizeof(txq-
> > >rs_last_id[0]) * num_rs_buckets,
> > +			RTE_CACHE_LINE_SIZE, socket_id);
> > +	if (txq->rs_last_id == NULL) {
> > +		ice_tx_queue_release(txq);
> > +		PMD_INIT_LOG(ERR, "Failed to allocate memory for RS last_id
> > array");
> > +		return -ENOMEM;
> > +	}
> > +
> >  	if (vsi->type == ICE_VSI_PF && (offloads &
> > RTE_ETH_TX_OFFLOAD_SEND_ON_TIMESTAMP)) {
> >  		if (hw->phy_model != ICE_PHY_E830) {
> >  			ice_tx_queue_release(txq);
> > @@ -1729,6 +1748,7 @@ ice_tx_queue_release(void *txq)
> > 
> >  	ci_txq_release_all_mbufs(q, false);
> >  	rte_free(q->sw_ring);
> > +	rte_free(q->rs_last_id);
> >  	if (q->tsq) {
> >  		rte_memzone_free(q->tsq->ts_mz);
> >  		rte_free(q->tsq);
> > diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c
> > b/drivers/net/intel/idpf/idpf_common_rxtx.c
> > index 77f4099f2b..04db8823eb 100644
> > --- a/drivers/net/intel/idpf/idpf_common_rxtx.c
> > +++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
> > @@ -5,6 +5,7 @@
> >  #include <eal_export.h>
> >  #include <rte_mbuf_dyn.h>
> >  #include <rte_errno.h>
> > +#include <rte_bitops.h>
> > 
> >  #include "idpf_common_rxtx.h"
> >  #include "idpf_common_device.h"
> > @@ -73,6 +74,11 @@ idpf_qc_tx_thresh_check(uint16_t nb_desc, uint16_t
> > tx_rs_thresh,
> >  			tx_rs_thresh, nb_desc);
> >  		return -EINVAL;
> >  	}
> > +	if (!rte_is_power_of_2(tx_rs_thresh)) {
> > +		DRV_LOG(ERR, "tx_rs_thresh must be a power of 2.
> > (tx_rs_thresh=%u)",
> > +			tx_rs_thresh);
> > +		return -EINVAL;
> > +	}
> > 
> >  	return 0;
> >  }
> > @@ -333,6 +339,7 @@ idpf_qc_tx_queue_release(void *txq)
> >  	}
> > 
> >  	ci_txq_release_all_mbufs(q, false);
> > +	rte_free(q->rs_last_id);
> >  	rte_free(q->sw_ring);
> >  	rte_memzone_free(q->mz);
> >  	rte_free(q);
> > diff --git a/drivers/net/intel/idpf/idpf_rxtx.c
> > b/drivers/net/intel/idpf/idpf_rxtx.c
> > index 7d9c885458..9420200f6d 100644
> > --- a/drivers/net/intel/idpf/idpf_rxtx.c
> > +++ b/drivers/net/intel/idpf/idpf_rxtx.c
> > @@ -447,6 +447,7 @@ idpf_tx_queue_setup(struct rte_eth_dev *dev,
> > uint16_t queue_idx,
> > 
> >  	txq->nb_tx_desc = nb_desc;
> >  	txq->tx_rs_thresh = tx_rs_thresh;
> > +	txq->log2_rs_thresh = rte_log2_u32(tx_rs_thresh);
> >  	txq->tx_free_thresh = tx_free_thresh;
> >  	txq->queue_id = vport->chunks_info.tx_start_qid + queue_idx;
> >  	txq->port_id = dev->data->port_id;
> > @@ -480,6 +481,15 @@ idpf_tx_queue_setup(struct rte_eth_dev *dev,
> > uint16_t queue_idx,
> >  		goto err_sw_ring_alloc;
> >  	}
> > 
> > +	txq->rs_last_id = rte_zmalloc_socket("idpf tx rs_last_id",
> > +			sizeof(txq->rs_last_id[0]) * (nb_desc >> txq-
> > >log2_rs_thresh),
> > +			RTE_CACHE_LINE_SIZE, socket_id);
> > +	if (txq->rs_last_id == NULL) {
> > +		PMD_INIT_LOG(ERR, "Failed to allocate memory for TX RS
> > tracking");
> > +		ret = -ENOMEM;
> > +		goto err_rs_last_id_alloc;
> > +	}
> > +
> >  	if (!is_splitq) {
> >  		txq->ci_tx_ring = mz->addr;
> >  		idpf_qc_single_tx_queue_reset(txq);
> > @@ -502,6 +512,8 @@ idpf_tx_queue_setup(struct rte_eth_dev *dev,
> > uint16_t queue_idx,
> >  	return 0;
> > 
> >  err_complq_setup:
> > +	rte_free(txq->rs_last_id);
> > +err_rs_last_id_alloc:
> >  	rte_free(txq->sw_ring);
> >  err_sw_ring_alloc:
> >  	idpf_dma_zone_release(mz);
> > --
> > 2.51.0
> 

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v5 02/35] net/intel: fix memory leak on TX queue setup failure
  2026-02-11 18:12   ` [PATCH v5 02/35] net/intel: fix memory leak on TX queue setup failure Bruce Richardson
@ 2026-02-12 12:14     ` Burakov, Anatoly
  0 siblings, 0 replies; 274+ messages in thread
From: Burakov, Anatoly @ 2026-02-12 12:14 UTC (permalink / raw)
  To: Bruce Richardson, dev
  Cc: stable, Praveen Shetty, Jingjing Wu, Mingxia Liu, Beilei Xing,
	Qi Zhang

On 2/11/2026 7:12 PM, Bruce Richardson wrote:
> When TX queue setup fails after sw_ring allocation but during
> completion queue setup, the allocated sw_ring memory is not freed,
> causing a memory leak.
> 
> This patch adds the missing rte_free() call in the error path for
> both cpfl and idpf drivers to properly clean up sw_ring before
> returning from the function.
> 
> Fixes: 6c2d333cd418 ("net/cpfl: support Tx queue setup")
> Fixes: c008a5e740bd ("common/idpf: add queue setup/release")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---

Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v5 07/35] net/ice: refactor context descriptor handling
  2026-02-11 18:12   ` [PATCH v5 07/35] net/ice: refactor context descriptor handling Bruce Richardson
@ 2026-02-12 12:16     ` Burakov, Anatoly
  0 siblings, 0 replies; 274+ messages in thread
From: Burakov, Anatoly @ 2026-02-12 12:16 UTC (permalink / raw)
  To: Bruce Richardson, dev

On 2/11/2026 7:12 PM, Bruce Richardson wrote:
> Create a single function to manage all context descriptor handling,
> which returns either 0 or 1 depending on whether a descriptor is needed
> or not, as well as returning directly the descriptor contents if
> relevant.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v5 08/35] net/i40e: refactor context descriptor handling
  2026-02-11 18:12   ` [PATCH v5 08/35] net/i40e: " Bruce Richardson
@ 2026-02-12 12:19     ` Burakov, Anatoly
  0 siblings, 0 replies; 274+ messages in thread
From: Burakov, Anatoly @ 2026-02-12 12:19 UTC (permalink / raw)
  To: Bruce Richardson, dev

On 2/11/2026 7:12 PM, Bruce Richardson wrote:
> Move all context descriptor handling to a single function, as with the
> ice driver, and use the same function signature as that driver.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---

Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v5 15/35] net/intel: support configurable VLAN tag insertion on Tx
  2026-02-11 18:12   ` [PATCH v5 15/35] net/intel: support configurable VLAN tag insertion on Tx Bruce Richardson
@ 2026-02-12 12:20     ` Burakov, Anatoly
  0 siblings, 0 replies; 274+ messages in thread
From: Burakov, Anatoly @ 2026-02-12 12:20 UTC (permalink / raw)
  To: Bruce Richardson, dev

On 2/11/2026 7:12 PM, Bruce Richardson wrote:
> Make the VLAN tag insertion logic configurable in the common code, as to
> where inner/outer tags get placed.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v5 34/35] net/idpf: enable simple Tx function
  2026-02-11 18:13   ` [PATCH v5 34/35] net/idpf: enable simple Tx function Bruce Richardson
@ 2026-02-12 12:28     ` Burakov, Anatoly
  0 siblings, 0 replies; 274+ messages in thread
From: Burakov, Anatoly @ 2026-02-12 12:28 UTC (permalink / raw)
  To: Bruce Richardson, dev; +Cc: Jingjing Wu, Praveen Shetty

On 2/11/2026 7:13 PM, Bruce Richardson wrote:
> The common "simple Tx" function - in some ways a scalar version of the
> vector Tx functions - can be used by the idpf driver as well as i40e and
> ice, so add support for it to the driver.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---

Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v5 35/35] net/cpfl: enable simple Tx function
  2026-02-11 18:13   ` [PATCH v5 35/35] net/cpfl: " Bruce Richardson
@ 2026-02-12 12:30     ` Burakov, Anatoly
  0 siblings, 0 replies; 274+ messages in thread
From: Burakov, Anatoly @ 2026-02-12 12:30 UTC (permalink / raw)
  To: Bruce Richardson, dev; +Cc: Praveen Shetty

On 2/11/2026 7:13 PM, Bruce Richardson wrote:
> The common "simple Tx" function - in some ways a scalar version of the
> vector Tx functions - can be used by the cpfl driver in the same way as
> the idpf driver does.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---

Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 274+ messages in thread

* RE: [PATCH v5 23/35] net/intel: use separate array for desc status tracking
  2026-02-12  9:15       ` Bruce Richardson
@ 2026-02-12 12:38         ` Morten Brørup
  0 siblings, 0 replies; 274+ messages in thread
From: Morten Brørup @ 2026-02-12 12:38 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: dev, Anatoly Burakov, Praveen Shetty, Vladimir Medvedkin,
	Jingjing Wu

> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> Sent: Thursday, 12 February 2026 10.15
> 
> On Wed, Feb 11, 2026 at 10:51:56PM +0100, Morten Brørup wrote:
> > > Rather than writing a last_id for each individual descriptor, we
> can
> > > write one only for places where the "report status" (RS) bit is
> set,
> > > i.e. the descriptors which will be written back when done. The
> method
> > > used for marking what descriptors are free is also changed in the
> > > process, even if the last descriptor with the "done" bits set is
> past
> > > the expected point, we only track up to the expected point, and
> leave
> > > the rest to be counted as freed next time. This means that we
> always
> > > have the RS/DD bits set at fixed intervals, and we always track
> free
> > > slots in units of the same tx_free_thresh intervals.
> >
> > I'm not saying it's good or bad, I'm simply trying to understand the
> performance tradeoff...
> > I'm wondering if spreading fields over two separate arrays is
> beneficial when considering cache misses.
> >
> > This patch introduces a separate array, uint16_t txq->rs_last_id[],
> which is not in the same cache line as the txe array.
> >
> > So now, two separate cache lines must be updated, rs_last_id and txe.
> >
> > Previously, only txe needed updating.
> >
> > Assuming both rings are cold, how many cache misses would a burst of
> 32 (single-segment) packets cause...
> > Number of cache misses in the txe ring (before this patch, and
> after)?
> > Number of cache misses in the rs_last_id ring (after this patch)?
> >
> 
> The main txe ring has 4 elements per cacheline both before and after
> this
> patch. As an aside, 6 bytes of each 16 is wasted, and I have tried a
> couple of
> times to get us down to an 8 byte ring element and each time
> performance
> has dropped, presumably due to extra computation and branching for ring
> wraparound when we remove the precomputed next index values.
> 
> Anyway, for 32 single-segment packets, we obviously touch 8 cachelines
> both
> before an after this patch in the txe array. In the next_rs array, we
> should just read a single index, of 16 bytes, so the overall array has
> a
> fairly small cache footprint, 2 bytes per 32 ring entries. [Or one
> cacheline for a 1024-sized ring]
> 
> However, the performance improvements given by the rework of the
> cleanup
> handling are very noticible and not really to do with memory/cache
> footprint so much as doing less work, and less branching. For example:
> 
> * before this patch, on transmit we stored the index of the last
> descriptor
>   for each segment of each packet. It was to a hot cacheline, but it
> was still
>   (if no coalescing) potentially 32 stores per burst. After this patch,
> we
>   instead only store the index of the last segment for the packet
> crossing
>   the next multiple of 32 boundary. Going from potentially 32 stores to
> 1.
> 
> * When doing cleanup post transmit, we still have an array lookup to
>   determine the location of the NIC write back to indicate descriptor
> done.
>   However, while before this was being done to the txe array - which
> was
>   written (ring_size - 32) packets ago - it's now done by reading the
> new,
>   tiny rs_next_id array which is more likely to be in cache.
> 
> * finally, when doing the actual buffer freeing, before we would have
> an
>   arbitrary number of buffers - though normally 32 - to be freed from
> the
>   ring starting at an arbitrary location. After this patch, we always
> have
>   exactly 32 (or more correctly rs_thresh, which must divide evenly
> into
>   ring_size) buffers to be freed starting at an index which is a
> multiple of
>   32, and which therefore, most importantly, guarantees that we do not
> have
>   any wraparound as part of the buffer free process. This means no
> branches
>   for idx >= ring_size etc., and that we are free to treat the buffers
> to
>   be freed as being stored in a linear array.

OK.
Thank you for the detailed explanation.

Acked-by: Morten Brørup <mb@smartsharesystems.com>

> 
> >
> > >
> > > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > > Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
> > > ---


^ permalink raw reply	[flat|nested] 274+ messages in thread

* Re: [PATCH v5 00/35] combine multiple Intel scalar Tx paths
  2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
                     ` (34 preceding siblings ...)
  2026-02-11 18:13   ` [PATCH v5 35/35] net/cpfl: " Bruce Richardson
@ 2026-02-12 14:45   ` Bruce Richardson
  35 siblings, 0 replies; 274+ messages in thread
From: Bruce Richardson @ 2026-02-12 14:45 UTC (permalink / raw)
  To: dev

On Wed, Feb 11, 2026 at 06:12:29PM +0000, Bruce Richardson wrote:
> The scalar Tx paths, with support for offloads and multiple mbufs
> per packet, are almost identical across drivers ice, i40e, iavf and
> the single-queue mode of idpf. Therefore, we can do some rework to
> combine these code paths into a single function which is parameterized
> by compile-time constants, allowing code saving to give us a single
> path to optimize and maintain - apart from edge cases like IPSec
> support in iavf.
> 
> The ixgbe driver has a number of similarities too, which we take
> advantage of where we can, but the overall descriptor format is
> sufficiently different that its main scalar code path is kept
> separate.
> 
> Once merged, we can then optimize the drivers a bit to improve
> performance, and also easily extend some drivers to use additional
> paths for better performance, e.g. add the "simple scalar" path
> to IDPF driver for better performance on platforms without AVX.
> 
> V5:
> - more updates following review including:
>   * dropped patch 19 for new EAL macro, and used struct alignment instead
>   * added extra comments for some code, e.g. reason to remove volatile
>   * dropped patch 22 marking a branch as unlikely
>   * split bugfix off patch 24
>   * corrected idpf path selection logic to not assume in-order queue setup
> 
Series applied to dpdk-next-net-intel

Thanks for all the reviews.

/Bruce

^ permalink raw reply	[flat|nested] 274+ messages in thread

end of thread, other threads:[~2026-02-12 14:46 UTC | newest]

Thread overview: 274+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-19 17:25 [RFC PATCH 00/27] combine multiple Intel scalar Tx paths Bruce Richardson
2025-12-19 17:25 ` [RFC PATCH 01/27] net/intel: create common Tx descriptor structure Bruce Richardson
2025-12-19 17:25 ` [RFC PATCH 02/27] net/intel: use common tx ring structure Bruce Richardson
2025-12-19 17:25 ` [RFC PATCH 03/27] net/intel: create common post-Tx cleanup function Bruce Richardson
2025-12-19 17:25 ` [RFC PATCH 04/27] net/intel: consolidate definitions for Tx desc fields Bruce Richardson
2025-12-19 17:25 ` [RFC PATCH 05/27] net/intel: create separate header for Tx scalar fns Bruce Richardson
2025-12-19 17:25 ` [RFC PATCH 06/27] net/intel: add common fn to calculate needed descriptors Bruce Richardson
2025-12-19 17:25 ` [RFC PATCH 07/27] net/ice: refactor context descriptor handling Bruce Richardson
2025-12-19 17:25 ` [RFC PATCH 08/27] net/i40e: " Bruce Richardson
2025-12-19 17:25 ` [RFC PATCH 09/27] net/idpf: " Bruce Richardson
2025-12-19 17:25 ` [RFC PATCH 10/27] net/intel: consolidate checksum mask definition Bruce Richardson
2025-12-19 17:25 ` [RFC PATCH 11/27] net/intel: create common checksum Tx offload function Bruce Richardson
2025-12-19 17:25 ` [RFC PATCH 12/27] net/intel: create a common scalar Tx function Bruce Richardson
2025-12-19 17:25 ` [RFC PATCH 13/27] net/i40e: use " Bruce Richardson
2025-12-19 17:25 ` [RFC PATCH 14/27] net/intel: add IPSec hooks to common " Bruce Richardson
2025-12-19 17:25 ` [RFC PATCH 15/27] net/intel: support configurable VLAN tag insertion on Tx Bruce Richardson
2025-12-19 17:25 ` [RFC PATCH 16/27] net/iavf: use common scalar Tx function Bruce Richardson
2025-12-19 17:25 ` [RFC PATCH 17/27] net/i40e: document requirement for QinQ support Bruce Richardson
2025-12-19 17:25 ` [RFC PATCH 18/27] net/idpf: use common scalar Tx function Bruce Richardson
2025-12-19 17:25 ` [RFC PATCH 19/27] net/intel: avoid writing the final pkt descriptor twice Bruce Richardson
2025-12-19 17:25 ` [RFC PATCH 20/27] net/intel: write descriptors using non-volatile pointers Bruce Richardson
2025-12-20  8:43   ` Morten Brørup
2025-12-22  9:50     ` Bruce Richardson
2025-12-19 17:25 ` [RFC PATCH 21/27] net/intel: remove unnecessary flag clearing Bruce Richardson
2025-12-19 17:25 ` [RFC PATCH 22/27] net/intel: mark mid-burst ring cleanup as unlikely Bruce Richardson
2025-12-19 17:25 ` [RFC PATCH 23/27] net/intel: add special handling for single desc packets Bruce Richardson
2025-12-19 17:25 ` [RFC PATCH 24/27] net/intel: use separate array for desc status tracking Bruce Richardson
2025-12-19 17:25 ` [RFC PATCH 25/27] net/ixgbe: " Bruce Richardson
2025-12-19 17:25 ` [RFC PATCH 26/27] net/intel: drop unused Tx queue used count Bruce Richardson
2025-12-19 17:25 ` [RFC PATCH 27/27] net/intel: remove index for tracking end of packet Bruce Richardson
2025-12-20  9:05   ` Morten Brørup
2026-01-13 15:14 ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Bruce Richardson
2026-01-13 15:14   ` [PATCH v2 01/36] net/intel: create common Tx descriptor structure Bruce Richardson
2026-01-13 15:14   ` [PATCH v2 02/36] net/intel: use common Tx ring structure Bruce Richardson
2026-01-13 15:14   ` [PATCH v2 03/36] net/intel: create common post-Tx cleanup function Bruce Richardson
2026-01-13 15:14   ` [PATCH v2 04/36] net/intel: consolidate definitions for Tx desc fields Bruce Richardson
2026-01-13 15:14   ` [PATCH v2 05/36] net/intel: create separate header for Tx scalar fns Bruce Richardson
2026-01-13 15:14   ` [PATCH v2 06/36] net/intel: add common fn to calculate needed descriptors Bruce Richardson
2026-01-13 15:14   ` [PATCH v2 07/36] net/ice: refactor context descriptor handling Bruce Richardson
2026-01-13 15:14   ` [PATCH v2 08/36] net/i40e: " Bruce Richardson
2026-01-13 15:14   ` [PATCH v2 09/36] net/idpf: " Bruce Richardson
2026-01-13 15:14   ` [PATCH v2 10/36] net/intel: consolidate checksum mask definition Bruce Richardson
2026-01-13 15:14   ` [PATCH v2 11/36] net/intel: create common checksum Tx offload function Bruce Richardson
2026-01-13 15:14   ` [PATCH v2 12/36] net/intel: create a common scalar Tx function Bruce Richardson
2026-01-13 15:14   ` [PATCH v2 13/36] net/i40e: use " Bruce Richardson
2026-01-13 15:14   ` [PATCH v2 14/36] net/intel: add IPsec hooks to common " Bruce Richardson
2026-01-13 15:14   ` [PATCH v2 15/36] net/intel: support configurable VLAN tag insertion on Tx Bruce Richardson
2026-01-13 15:14   ` [PATCH v2 16/36] net/iavf: use common scalar Tx function Bruce Richardson
2026-01-13 15:14   ` [PATCH v2 17/36] net/i40e: document requirement for QinQ support Bruce Richardson
2026-01-13 15:14   ` [PATCH v2 18/36] net/idpf: use common scalar Tx function Bruce Richardson
2026-01-13 15:14   ` [PATCH v2 19/36] net/intel: avoid writing the final pkt descriptor twice Bruce Richardson
2026-01-13 15:14   ` [PATCH v2 20/36] eal: add macro for marking assumed alignment Bruce Richardson
2026-01-13 15:14   ` [PATCH v2 21/36] net/intel: write descriptors using non-volatile pointers Bruce Richardson
2026-01-13 15:14   ` [PATCH v2 22/36] net/intel: remove unnecessary flag clearing Bruce Richardson
2026-01-13 15:14   ` [PATCH v2 23/36] net/intel: mark mid-burst ring cleanup as unlikely Bruce Richardson
2026-01-13 15:14   ` [PATCH v2 24/36] net/intel: add special handling for single desc packets Bruce Richardson
2026-01-13 15:14   ` [PATCH v2 25/36] net/intel: use separate array for desc status tracking Bruce Richardson
2026-01-13 15:14   ` [PATCH v2 26/36] net/ixgbe: " Bruce Richardson
2026-01-13 15:14   ` [PATCH v2 27/36] net/intel: drop unused Tx queue used count Bruce Richardson
2026-01-13 15:14   ` [PATCH v2 28/36] net/intel: remove index for tracking end of packet Bruce Richardson
2026-01-13 15:14   ` [PATCH v2 29/36] net/intel: merge ring writes in simple Tx for ice and i40e Bruce Richardson
2026-01-13 15:14   ` [PATCH v2 30/36] net/intel: consolidate ice and i40e buffer free function Bruce Richardson
2026-01-13 15:14   ` [PATCH v2 31/36] net/intel: complete merging simple Tx paths Bruce Richardson
2026-01-13 15:14   ` [PATCH v2 32/36] net/intel: use non-volatile stores in simple Tx function Bruce Richardson
2026-01-13 15:14   ` [PATCH v2 33/36] net/intel: align scalar simple Tx path with vector logic Bruce Richardson
2026-01-13 15:14   ` [PATCH v2 34/36] net/intel: use vector SW ring entry for simple path Bruce Richardson
2026-01-13 15:14   ` [PATCH v2 35/36] net/intel: use vector mbuf cleanup from simple scalar path Bruce Richardson
2026-01-13 15:15   ` [PATCH v2 36/36] net/idpf: enable simple Tx function Bruce Richardson
2026-01-13 17:17   ` [PATCH v2 00/36] combine multiple Intel scalar Tx paths Stephen Hemminger
2026-01-23  6:26   ` Stephen Hemminger
2026-01-26  9:02     ` Bruce Richardson
2026-01-30 11:41 ` [PATCH v3 " Bruce Richardson
2026-01-30 11:41   ` [PATCH v3 01/36] net/intel: create common Tx descriptor structure Bruce Richardson
2026-02-06  9:56     ` Loftus, Ciara
2026-01-30 11:41   ` [PATCH v3 02/36] net/intel: use common Tx ring structure Bruce Richardson
2026-02-06  9:59     ` Loftus, Ciara
2026-01-30 11:41   ` [PATCH v3 03/36] net/intel: create common post-Tx cleanup function Bruce Richardson
2026-02-06 10:07     ` Loftus, Ciara
2026-02-09 10:41       ` Bruce Richardson
2026-01-30 11:41   ` [PATCH v3 04/36] net/intel: consolidate definitions for Tx desc fields Bruce Richardson
2026-02-06 10:14     ` Loftus, Ciara
2026-02-09 10:43       ` Bruce Richardson
2026-01-30 11:41   ` [PATCH v3 05/36] net/intel: create separate header for Tx scalar fns Bruce Richardson
2026-02-06 10:23     ` Loftus, Ciara
2026-02-09 11:04       ` Bruce Richardson
2026-01-30 11:41   ` [PATCH v3 06/36] net/intel: add common fn to calculate needed descriptors Bruce Richardson
2026-02-06 10:25     ` Loftus, Ciara
2026-02-09 11:15       ` Bruce Richardson
2026-01-30 11:41   ` [PATCH v3 07/36] net/ice: refactor context descriptor handling Bruce Richardson
2026-02-06 10:47     ` Loftus, Ciara
2026-02-09 11:16       ` Bruce Richardson
2026-01-30 11:41   ` [PATCH v3 08/36] net/i40e: " Bruce Richardson
2026-02-06 10:54     ` Loftus, Ciara
2026-01-30 11:41   ` [PATCH v3 09/36] net/idpf: " Bruce Richardson
2026-02-06 10:59     ` Loftus, Ciara
2026-01-30 11:41   ` [PATCH v3 10/36] net/intel: consolidate checksum mask definition Bruce Richardson
2026-02-06 11:25     ` Loftus, Ciara
2026-02-09 11:40       ` Bruce Richardson
2026-02-09 15:00         ` Bruce Richardson
2026-01-30 11:41   ` [PATCH v3 11/36] net/intel: create common checksum Tx offload function Bruce Richardson
2026-02-06 11:37     ` Loftus, Ciara
2026-02-09 11:41       ` Bruce Richardson
2026-01-30 11:41   ` [PATCH v3 12/36] net/intel: create a common scalar Tx function Bruce Richardson
2026-02-06 12:01     ` Loftus, Ciara
2026-02-06 12:13       ` Bruce Richardson
2026-01-30 11:41   ` [PATCH v3 13/36] net/i40e: use " Bruce Richardson
2026-01-30 11:41   ` [PATCH v3 14/36] net/intel: add IPsec hooks to common " Bruce Richardson
2026-01-30 11:41   ` [PATCH v3 15/36] net/intel: support configurable VLAN tag insertion on Tx Bruce Richardson
2026-01-30 11:41   ` [PATCH v3 16/36] net/iavf: use common scalar Tx function Bruce Richardson
2026-01-30 11:41   ` [PATCH v3 17/36] net/i40e: document requirement for QinQ support Bruce Richardson
2026-01-30 11:41   ` [PATCH v3 18/36] net/idpf: use common scalar Tx function Bruce Richardson
2026-01-30 11:41   ` [PATCH v3 19/36] net/intel: avoid writing the final pkt descriptor twice Bruce Richardson
2026-01-30 11:41   ` [PATCH v3 20/36] eal: add macro for marking assumed alignment Bruce Richardson
2026-01-30 11:41   ` [PATCH v3 21/36] net/intel: write descriptors using non-volatile pointers Bruce Richardson
2026-01-30 11:41   ` [PATCH v3 22/36] net/intel: remove unnecessary flag clearing Bruce Richardson
2026-01-30 11:41   ` [PATCH v3 23/36] net/intel: mark mid-burst ring cleanup as unlikely Bruce Richardson
2026-01-30 11:41   ` [PATCH v3 24/36] net/intel: add special handling for single desc packets Bruce Richardson
2026-01-30 11:41   ` [PATCH v3 25/36] net/intel: use separate array for desc status tracking Bruce Richardson
2026-01-30 11:41   ` [PATCH v3 26/36] net/ixgbe: " Bruce Richardson
2026-01-30 11:41   ` [PATCH v3 27/36] net/intel: drop unused Tx queue used count Bruce Richardson
2026-01-30 11:41   ` [PATCH v3 28/36] net/intel: remove index for tracking end of packet Bruce Richardson
2026-01-30 11:41   ` [PATCH v3 29/36] net/intel: merge ring writes in simple Tx for ice and i40e Bruce Richardson
2026-01-30 11:41   ` [PATCH v3 30/36] net/intel: consolidate ice and i40e buffer free function Bruce Richardson
2026-01-30 11:41   ` [PATCH v3 31/36] net/intel: complete merging simple Tx paths Bruce Richardson
2026-01-30 11:41   ` [PATCH v3 32/36] net/intel: use non-volatile stores in simple Tx function Bruce Richardson
2026-01-30 11:42   ` [PATCH v3 33/36] net/intel: align scalar simple Tx path with vector logic Bruce Richardson
2026-01-30 11:42   ` [PATCH v3 34/36] net/intel: use vector SW ring entry for simple path Bruce Richardson
2026-01-30 11:42   ` [PATCH v3 35/36] net/intel: use vector mbuf cleanup from simple scalar path Bruce Richardson
2026-01-30 11:42   ` [PATCH v3 36/36] net/idpf: enable simple Tx function Bruce Richardson
2026-01-30 17:56     ` [REVIEW] " Stephen Hemminger
2026-02-09 16:44 ` [PATCH v4 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
2026-02-09 16:44   ` [PATCH v4 01/35] net/intel: create common Tx descriptor structure Bruce Richardson
2026-02-09 16:45   ` [PATCH v4 02/35] net/intel: use common Tx ring structure Bruce Richardson
2026-02-09 16:45   ` [PATCH v4 03/35] net/intel: create common post-Tx cleanup function Bruce Richardson
2026-02-10 12:18     ` Burakov, Anatoly
2026-02-09 16:45   ` [PATCH v4 04/35] net/intel: consolidate definitions for Tx desc fields Bruce Richardson
2026-02-10 12:26     ` Burakov, Anatoly
2026-02-10 16:47       ` Bruce Richardson
2026-02-09 16:45   ` [PATCH v4 05/35] net/intel: add common fn to calculate needed descriptors Bruce Richardson
2026-02-10 12:29     ` Burakov, Anatoly
2026-02-10 14:08       ` Bruce Richardson
2026-02-10 14:17         ` Burakov, Anatoly
2026-02-10 17:25           ` Bruce Richardson
2026-02-11  9:14             ` Burakov, Anatoly
2026-02-09 16:45   ` [PATCH v4 06/35] net/ice: refactor context descriptor handling Bruce Richardson
2026-02-10 12:42     ` Burakov, Anatoly
2026-02-10 17:40       ` Bruce Richardson
2026-02-11  9:17         ` Burakov, Anatoly
2026-02-11 10:38           ` Bruce Richardson
2026-02-09 16:45   ` [PATCH v4 07/35] net/i40e: " Bruce Richardson
2026-02-10 12:48     ` Burakov, Anatoly
2026-02-10 14:10       ` Bruce Richardson
2026-02-10 14:19         ` Burakov, Anatoly
2026-02-10 17:54           ` Bruce Richardson
2026-02-11  9:20             ` Burakov, Anatoly
2026-02-11 12:04               ` Bruce Richardson
2026-02-09 16:45   ` [PATCH v4 08/35] net/idpf: " Bruce Richardson
2026-02-10 12:52     ` Burakov, Anatoly
2026-02-09 16:45   ` [PATCH v4 09/35] net/intel: consolidate checksum mask definition Bruce Richardson
2026-02-10 13:00     ` Burakov, Anatoly
2026-02-09 16:45   ` [PATCH v4 10/35] net/intel: create common checksum Tx offload function Bruce Richardson
2026-02-10 13:04     ` Burakov, Anatoly
2026-02-10 17:56       ` Bruce Richardson
2026-02-09 16:45   ` [PATCH v4 11/35] net/intel: create a common scalar Tx function Bruce Richardson
2026-02-10 13:14     ` Burakov, Anatoly
2026-02-10 18:03       ` Bruce Richardson
2026-02-11  9:26         ` Burakov, Anatoly
2026-02-09 16:45   ` [PATCH v4 12/35] net/i40e: use " Bruce Richardson
2026-02-10 13:14     ` Burakov, Anatoly
2026-02-09 16:45   ` [PATCH v4 13/35] net/intel: add IPsec hooks to common " Bruce Richardson
2026-02-10 13:16     ` Burakov, Anatoly
2026-02-09 16:45   ` [PATCH v4 14/35] net/intel: support configurable VLAN tag insertion on Tx Bruce Richardson
2026-02-10 13:21     ` Burakov, Anatoly
2026-02-10 18:20       ` Bruce Richardson
2026-02-11  9:29         ` Burakov, Anatoly
2026-02-11 14:19           ` Bruce Richardson
2026-02-09 16:45   ` [PATCH v4 15/35] net/iavf: use common scalar Tx function Bruce Richardson
2026-02-10 13:27     ` Burakov, Anatoly
2026-02-10 18:31       ` Bruce Richardson
2026-02-09 16:45   ` [PATCH v4 16/35] net/i40e: document requirement for QinQ support Bruce Richardson
2026-02-10 13:27     ` Burakov, Anatoly
2026-02-09 16:45   ` [PATCH v4 17/35] net/idpf: use common scalar Tx function Bruce Richardson
2026-02-10 13:30     ` Burakov, Anatoly
2026-02-09 16:45   ` [PATCH v4 18/35] net/intel: avoid writing the final pkt descriptor twice Bruce Richardson
2026-02-10 13:31     ` Burakov, Anatoly
2026-02-09 16:45   ` [PATCH v4 19/35] eal: add macro for marking assumed alignment Bruce Richardson
2026-02-09 22:35     ` Morten Brørup
2026-02-11 14:45       ` Bruce Richardson
2026-02-09 16:45   ` [PATCH v4 20/35] net/intel: write descriptors using non-volatile pointers Bruce Richardson
2026-02-09 23:08     ` Morten Brørup
2026-02-10  9:03       ` Bruce Richardson
2026-02-10  9:28         ` Morten Brørup
2026-02-11 14:44           ` Bruce Richardson
2026-02-11 14:44       ` Bruce Richardson
2026-02-09 16:45   ` [PATCH v4 21/35] net/intel: remove unnecessary flag clearing Bruce Richardson
2026-02-10 13:33     ` Burakov, Anatoly
2026-02-09 16:45   ` [PATCH v4 22/35] net/intel: mark mid-burst ring cleanup as unlikely Bruce Richardson
2026-02-10 13:36     ` Burakov, Anatoly
2026-02-10 14:13       ` Bruce Richardson
2026-02-11 18:12         ` Bruce Richardson
2026-02-09 16:45   ` [PATCH v4 23/35] net/intel: add special handling for single desc packets Bruce Richardson
2026-02-10 13:57     ` Burakov, Anatoly
2026-02-09 16:45   ` [PATCH v4 24/35] net/intel: use separate array for desc status tracking Bruce Richardson
2026-02-10 14:11     ` Burakov, Anatoly
2026-02-09 16:45   ` [PATCH v4 25/35] net/ixgbe: " Bruce Richardson
2026-02-10 14:12     ` Burakov, Anatoly
2026-02-09 16:45   ` [PATCH v4 26/35] net/intel: drop unused Tx queue used count Bruce Richardson
2026-02-10 14:14     ` Burakov, Anatoly
2026-02-09 16:45   ` [PATCH v4 27/35] net/intel: remove index for tracking end of packet Bruce Richardson
2026-02-10 14:15     ` Burakov, Anatoly
2026-02-09 16:45   ` [PATCH v4 28/35] net/intel: merge ring writes in simple Tx for ice and i40e Bruce Richardson
2026-02-09 23:18     ` Medvedkin, Vladimir
2026-02-09 16:45   ` [PATCH v4 29/35] net/intel: consolidate ice and i40e buffer free function Bruce Richardson
2026-02-09 23:19     ` Medvedkin, Vladimir
2026-02-09 16:45   ` [PATCH v4 30/35] net/intel: complete merging simple Tx paths Bruce Richardson
2026-02-09 23:19     ` Medvedkin, Vladimir
2026-02-09 16:45   ` [PATCH v4 31/35] net/intel: use non-volatile stores in simple Tx function Bruce Richardson
2026-02-09 23:19     ` Medvedkin, Vladimir
2026-02-09 16:45   ` [PATCH v4 32/35] net/intel: align scalar simple Tx path with vector logic Bruce Richardson
2026-02-09 23:19     ` Medvedkin, Vladimir
2026-02-09 16:45   ` [PATCH v4 33/35] net/intel: use vector SW ring entry for simple path Bruce Richardson
2026-02-09 23:19     ` Medvedkin, Vladimir
2026-02-09 16:45   ` [PATCH v4 34/35] net/intel: use vector mbuf cleanup from simple scalar path Bruce Richardson
2026-02-09 23:19     ` Medvedkin, Vladimir
2026-02-09 16:45   ` [PATCH v4 35/35] net/idpf: enable simple Tx function Bruce Richardson
2026-02-09 23:20     ` Medvedkin, Vladimir
2026-02-11 18:12 ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson
2026-02-11 18:12   ` [PATCH v5 01/35] net/intel: create common Tx descriptor structure Bruce Richardson
2026-02-11 18:12   ` [PATCH v5 02/35] net/intel: fix memory leak on TX queue setup failure Bruce Richardson
2026-02-12 12:14     ` Burakov, Anatoly
2026-02-11 18:12   ` [PATCH v5 03/35] net/intel: use common Tx ring structure Bruce Richardson
2026-02-11 18:12   ` [PATCH v5 04/35] net/intel: create common post-Tx cleanup function Bruce Richardson
2026-02-11 18:12   ` [PATCH v5 05/35] net/intel: consolidate definitions for Tx desc fields Bruce Richardson
2026-02-11 18:12   ` [PATCH v5 06/35] net/intel: add common fn to calculate needed descriptors Bruce Richardson
2026-02-11 18:12   ` [PATCH v5 07/35] net/ice: refactor context descriptor handling Bruce Richardson
2026-02-12 12:16     ` Burakov, Anatoly
2026-02-11 18:12   ` [PATCH v5 08/35] net/i40e: " Bruce Richardson
2026-02-12 12:19     ` Burakov, Anatoly
2026-02-11 18:12   ` [PATCH v5 09/35] net/idpf: " Bruce Richardson
2026-02-11 18:12   ` [PATCH v5 10/35] net/intel: consolidate checksum mask definition Bruce Richardson
2026-02-11 18:12   ` [PATCH v5 11/35] net/intel: create common checksum Tx offload function Bruce Richardson
2026-02-11 18:12   ` [PATCH v5 12/35] net/intel: create a common scalar Tx function Bruce Richardson
2026-02-11 18:12   ` [PATCH v5 13/35] net/i40e: use " Bruce Richardson
2026-02-11 18:12   ` [PATCH v5 14/35] net/intel: add IPsec hooks to common " Bruce Richardson
2026-02-11 18:12   ` [PATCH v5 15/35] net/intel: support configurable VLAN tag insertion on Tx Bruce Richardson
2026-02-12 12:20     ` Burakov, Anatoly
2026-02-11 18:12   ` [PATCH v5 16/35] net/iavf: use common scalar Tx function Bruce Richardson
2026-02-11 18:12   ` [PATCH v5 17/35] net/i40e: document requirement for QinQ support Bruce Richardson
2026-02-11 18:12   ` [PATCH v5 18/35] net/idpf: use common scalar Tx function Bruce Richardson
2026-02-11 18:12   ` [PATCH v5 19/35] net/intel: avoid writing the final pkt descriptor twice Bruce Richardson
2026-02-11 18:12   ` [PATCH v5 20/35] net/intel: write descriptors using non-volatile pointers Bruce Richardson
2026-02-11 21:14     ` Morten Brørup
2026-02-12  8:43       ` Bruce Richardson
2026-02-11 18:12   ` [PATCH v5 21/35] net/intel: remove unnecessary flag clearing Bruce Richardson
2026-02-11 18:12   ` [PATCH v5 22/35] net/intel: add special handling for single desc packets Bruce Richardson
2026-02-11 18:12   ` [PATCH v5 23/35] net/intel: use separate array for desc status tracking Bruce Richardson
2026-02-11 21:51     ` Morten Brørup
2026-02-12  9:15       ` Bruce Richardson
2026-02-12 12:38         ` Morten Brørup
2026-02-11 18:12   ` [PATCH v5 24/35] net/ixgbe: " Bruce Richardson
2026-02-11 18:12   ` [PATCH v5 25/35] net/intel: drop unused Tx queue used count Bruce Richardson
2026-02-11 18:12   ` [PATCH v5 26/35] net/intel: remove index for tracking end of packet Bruce Richardson
2026-02-11 18:12   ` [PATCH v5 27/35] net/intel: merge ring writes in simple Tx for ice and i40e Bruce Richardson
2026-02-11 18:12   ` [PATCH v5 28/35] net/intel: consolidate ice and i40e buffer free function Bruce Richardson
2026-02-11 18:12   ` [PATCH v5 29/35] net/intel: complete merging simple Tx paths Bruce Richardson
2026-02-11 18:12   ` [PATCH v5 30/35] net/intel: use non-volatile stores in simple Tx function Bruce Richardson
2026-02-11 18:13   ` [PATCH v5 31/35] net/intel: align scalar simple Tx path with vector logic Bruce Richardson
2026-02-11 18:13   ` [PATCH v5 32/35] net/intel: use vector SW ring entry for simple path Bruce Richardson
2026-02-11 18:13   ` [PATCH v5 33/35] net/intel: use vector mbuf cleanup from simple scalar path Bruce Richardson
2026-02-11 18:13   ` [PATCH v5 34/35] net/idpf: enable simple Tx function Bruce Richardson
2026-02-12 12:28     ` Burakov, Anatoly
2026-02-11 18:13   ` [PATCH v5 35/35] net/cpfl: " Bruce Richardson
2026-02-12 12:30     ` Burakov, Anatoly
2026-02-12 14:45   ` [PATCH v5 00/35] combine multiple Intel scalar Tx paths Bruce Richardson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox