DPDK-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 4/9] bus/fslmc/dpio: make the portal DQRI epoll optional
From: Maxime Leroy @ 2026-06-11 15:49 UTC (permalink / raw)
  To: hemant.agrawal, sachin.saxena; +Cc: dev, Maxime Leroy
In-Reply-To: <20260611154926.392670-1-maxime@leroys.fr>

dpaa2_dpio_intr_init() builds a private epoll instance the event PMD
sleeps on. The upcoming net rx-queue-interrupt path waits on the
application's own epoll instead, so that instance would be built but
never used.

Add a build_epoll parameter: pass true to build it (event PMD), false
to skip the epoll_create/epoll_ctl. epoll_fd is set to -1 when none is
built and closed in intr_deinit only when valid. The sole caller passes
true: no functional change.

Signed-off-by: Maxime Leroy <maxime@leroys.fr>
---
 drivers/bus/fslmc/portal/dpaa2_hw_dpio.c | 44 +++++++++++++++++-------
 1 file changed, 32 insertions(+), 12 deletions(-)

diff --git a/drivers/bus/fslmc/portal/dpaa2_hw_dpio.c b/drivers/bus/fslmc/portal/dpaa2_hw_dpio.c
index 2a9e519668..3a5abb2e6d 100644
--- a/drivers/bus/fslmc/portal/dpaa2_hw_dpio.c
+++ b/drivers/bus/fslmc/portal/dpaa2_hw_dpio.c
@@ -205,13 +205,12 @@ dpaa2_affine_dpio_intr_to_respective_core(int32_t dpio_id, int cpu_id)
 	fclose(file);
 }
 
-static int dpaa2_dpio_intr_init(struct dpaa2_dpio_dev *dpio_dev)
+static int dpaa2_dpio_intr_init(struct dpaa2_dpio_dev *dpio_dev, bool build_epoll)
 {
 	struct epoll_event epoll_ev;
 	int eventfd, dpio_epoll_fd, ret;
 	int threshold = 0x3, timeout = 0xFF;
 
-	dpio_epoll_fd = epoll_create(1);
 	ret = rte_dpaa2_intr_enable(dpio_dev->intr_handle, 0);
 	if (ret) {
 		DPAA2_BUS_ERR("Interrupt registration failed");
@@ -231,16 +230,34 @@ static int dpaa2_dpio_intr_init(struct dpaa2_dpio_dev *dpio_dev)
 	qbman_swp_dqrr_thrshld_write(dpio_dev->sw_portal, threshold);
 	qbman_swp_intr_timeout_write(dpio_dev->sw_portal, timeout);
 
-	eventfd = rte_intr_fd_get(dpio_dev->intr_handle);
-	epoll_ev.events = EPOLLIN | EPOLLPRI | EPOLLET;
-	epoll_ev.data.fd = eventfd;
+	dpio_dev->epoll_fd = -1;
 
-	ret = epoll_ctl(dpio_epoll_fd, EPOLL_CTL_ADD, eventfd, &epoll_ev);
-	if (ret < 0) {
-		DPAA2_BUS_ERR("epoll_ctl failed");
-		return -1;
+	/* The event PMD dequeues by sleeping on a private epoll instance owned
+	 * by the portal, so build it here. A caller that waits on another
+	 * epoll (the net rx-queue-interrupt path uses the application's) skips
+	 * this.
+	 */
+	if (build_epoll) {
+		dpio_epoll_fd = epoll_create(1);
+		if (dpio_epoll_fd < 0) {
+			DPAA2_BUS_ERR("epoll_create failed");
+			rte_dpaa2_intr_disable(dpio_dev->intr_handle, 0);
+			return -1;
+		}
+
+		eventfd = rte_intr_fd_get(dpio_dev->intr_handle);
+		epoll_ev.events = EPOLLIN | EPOLLPRI | EPOLLET;
+		epoll_ev.data.fd = eventfd;
+
+		ret = epoll_ctl(dpio_epoll_fd, EPOLL_CTL_ADD, eventfd, &epoll_ev);
+		if (ret < 0) {
+			DPAA2_BUS_ERR("epoll_ctl failed");
+			rte_dpaa2_intr_disable(dpio_dev->intr_handle, 0);
+			close(dpio_epoll_fd);
+			return -1;
+		}
+		dpio_dev->epoll_fd = dpio_epoll_fd;
 	}
-	dpio_dev->epoll_fd = dpio_epoll_fd;
 
 	return 0;
 }
@@ -253,7 +270,10 @@ static void dpaa2_dpio_intr_deinit(struct dpaa2_dpio_dev *dpio_dev)
 	if (ret)
 		DPAA2_BUS_ERR("DPIO interrupt disable failed");
 
-	close(dpio_dev->epoll_fd);
+	if (dpio_dev->epoll_fd >= 0) {
+		close(dpio_dev->epoll_fd);
+		dpio_dev->epoll_fd = -1;
+	}
 }
 #endif
 
@@ -277,7 +297,7 @@ dpaa2_configure_stashing(struct dpaa2_dpio_dev *dpio_dev, int cpu_id)
 	}
 
 #ifdef RTE_EVENT_DPAA2
-	if (dpaa2_dpio_intr_init(dpio_dev)) {
+	if (dpaa2_dpio_intr_init(dpio_dev, true)) {
 		DPAA2_BUS_ERR("Interrupt registration failed for dpio");
 		return -1;
 	}
-- 
2.43.0


^ permalink raw reply related

* [PATCH 5/9] net/dpaa2: support Rx queue interrupts
From: Maxime Leroy @ 2026-06-11 15:49 UTC (permalink / raw)
  To: hemant.agrawal, sachin.saxena; +Cc: dev, Maxime Leroy
In-Reply-To: <20260611154926.392670-1-maxime@leroys.fr>

Implement .rx_queue_intr_enable / .rx_queue_intr_disable so a worker
can sleep on a queue's data-availability notification instead of
busy-polling, through the generic rte_eth_dev_rx_intr_* API.

A worker wakes on its software portal's DQRI, which fires when the
portal's DQRR holds frames, so the Rx FQ must be scheduled to a channel
that portal dequeues. The natural dpni_set_queue with a notification
destination holds the global MC lock long enough to wedge the firmware
and must target a disabled dpni. But the polling portal is only known
once a worker affines, after dev_start, so the destination cannot be
the worker's portal.

Bind each Rx FQ to its own DPCON channel instead. The default Rx burst
pulls frames from the FQ with a volatile dequeue and cannot be
interrupt-driven; to wake on the DQRI the FQ must be pushed to the
portal's DQRR. dev_start issues the DEST_DPCON set_queue statically on
the still-disabled dpni with no knowledge of the polling lcore; a worker
later subscribes its own ethrx portal to the channel and arms the DQRI
in rx_queue_intr_enable (a one-shot per-portal MC op plus QBMan, never
the wedging set_queue).

This pushed/DQRR consumption is how the event PMD works, but the DPCON
use differs. The event PMD uses one DPCON per worker, concentrates N
FQs onto it, and lets the QBMan scheduler load-balance events across
cores. Here affinity is static and there is no scheduling, so each FQ
gets its own DPCON (one per FQ, more channels, drawn from the shared
pool that the DPCON move to the fslmc bus now feeds), bound once at
dev_start before the lcore is known. Frames are delivered by
rte_eth_rx_burst (dpaa2_dev_rx_dqrr), not as events via
rte_event_dequeue.

rte_eth_dev_rx_intr_enable(q) subscribes the lcore portal to q's DPCON
and arms the DQRI. rte_eth_dev_rx_intr_ctl_q(q) adds q's eventfd (the
portal DQRI fd) to the thread epoll.

      wire
       |
    [ DPMAC ]
       |
    [ DPNI ]                                     (1)
       |
    TC0:  FQ0   FQ1   FQ2   FQ3                  (2)
           |     |     |     |                   (3)
        [DPCON][DPCON][DPCON][DPCON]
            \     |     |     /                  (4)
          [ DPIO A ]      [ DPIO B ]             (5)
             |               |
            DQRR            DQRR                 (6)
             |               |
            DQRI            DQRI                 (7)
             |               |
          eventfd         eventfd                (8)
             |               |
        rte_epoll_wait  rte_epoll_wait           (9)
             |               |
        dpaa2_dev_rx_dqrr                        (10)

  (1)  WRIOP picks a TC (QoS), then RSS-hashes within the TC to an FQ
  (2)  FQ0..FQ3 are the rte_eth Rx queues
  (3)  dpni_set_queue(DEST_DPCON): one DPCON per FQ
  (4)  the lcore portal subscribes to its DPCONs (push_set)
  (5)  one QBMan software portal per lcore
  (6)  QMan pushes the FDs into the portal DQRR
  (7)  DQRI is raised when the DQRR is non-empty
  (8)  a portal's queues share one fd (its DQRI eventfd)
  (9)  worker sleeps here when all its queues are idle
  (10) dpaa2_dev_rx_dqrr drains the DQRR, demuxes FDs to FQs by fqd_ctx

The DQRI and eventfd are portal-wide: a queue's eventfd is its portal's
DQRI fd, and the inhibit bit is refcounted by armed queues so disabling
one queue never masks a sibling. The static per-queue bind also lets a
queue be re-homed to another lcore at runtime, the new worker
reclaiming the channel, with no set_queue and no port stop.

On single-core 64-byte forwarding this interrupt path runs at ~5.0 Mpps
versus ~5.86 Mpps polling: per-frame DQRR demux and consume cost about
15 percent over the polling batch dequeue.

Signed-off-by: Maxime Leroy <maxime@leroys.fr>
---
 doc/guides/nics/features/dpaa2.ini       |   1 +
 doc/guides/rel_notes/release_26_07.rst   |   1 +
 drivers/bus/fslmc/portal/dpaa2_hw_dpio.c |  11 +-
 drivers/bus/fslmc/portal/dpaa2_hw_dpio.h |   4 +
 drivers/bus/fslmc/portal/dpaa2_hw_pvt.h  |  27 ++-
 drivers/bus/fslmc/qbman/qbman_portal.c   |   1 +
 drivers/net/dpaa2/dpaa2_ethdev.c         | 293 ++++++++++++++++++++++-
 drivers/net/dpaa2/dpaa2_ethdev.h         |   3 +
 drivers/net/dpaa2/dpaa2_rxtx.c           | 122 ++++++++++
 9 files changed, 457 insertions(+), 6 deletions(-)

diff --git a/doc/guides/nics/features/dpaa2.ini b/doc/guides/nics/features/dpaa2.ini
index 5def653d1d..b53353eb77 100644
--- a/doc/guides/nics/features/dpaa2.ini
+++ b/doc/guides/nics/features/dpaa2.ini
@@ -7,6 +7,7 @@
 Speed capabilities   = Y
 Link status          = Y
 Link status event    = Y
+Rx interrupt         = Y
 Burst mode info      = Y
 Queue start/stop     = Y
 Scattered Rx         = Y
diff --git a/doc/guides/rel_notes/release_26_07.rst b/doc/guides/rel_notes/release_26_07.rst
index 103c4034ca..87c7c57bcc 100644
--- a/doc/guides/rel_notes/release_26_07.rst
+++ b/doc/guides/rel_notes/release_26_07.rst
@@ -129,6 +129,7 @@ New Features
 * **Updated NXP dpaa2 driver.**
 
   * Added RSS RETA query and update support.
+  * Added Rx queue interrupt support.
 
 * **Updated PCAP ethernet driver.**
 
diff --git a/drivers/bus/fslmc/portal/dpaa2_hw_dpio.c b/drivers/bus/fslmc/portal/dpaa2_hw_dpio.c
index 3a5abb2e6d..e6b4e74b3b 100644
--- a/drivers/bus/fslmc/portal/dpaa2_hw_dpio.c
+++ b/drivers/bus/fslmc/portal/dpaa2_hw_dpio.c
@@ -204,13 +204,18 @@ dpaa2_affine_dpio_intr_to_respective_core(int32_t dpio_id, int cpu_id)
 
 	fclose(file);
 }
+#endif /* RTE_EVENT_DPAA2 */
 
-static int dpaa2_dpio_intr_init(struct dpaa2_dpio_dev *dpio_dev, bool build_epoll)
+RTE_EXPORT_INTERNAL_SYMBOL(dpaa2_dpio_intr_init)
+int dpaa2_dpio_intr_init(struct dpaa2_dpio_dev *dpio_dev, bool build_epoll)
 {
 	struct epoll_event epoll_ev;
 	int eventfd, dpio_epoll_fd, ret;
 	int threshold = 0x3, timeout = 0xFF;
 
+	if (dpio_dev->intr_enabled)
+		return 0;
+
 	ret = rte_dpaa2_intr_enable(dpio_dev->intr_handle, 0);
 	if (ret) {
 		DPAA2_BUS_ERR("Interrupt registration failed");
@@ -259,9 +264,12 @@ static int dpaa2_dpio_intr_init(struct dpaa2_dpio_dev *dpio_dev, bool build_epol
 		dpio_dev->epoll_fd = dpio_epoll_fd;
 	}
 
+	dpio_dev->intr_enabled = 1;
+
 	return 0;
 }
 
+#ifdef RTE_EVENT_DPAA2
 static void dpaa2_dpio_intr_deinit(struct dpaa2_dpio_dev *dpio_dev)
 {
 	int ret;
@@ -274,6 +282,7 @@ static void dpaa2_dpio_intr_deinit(struct dpaa2_dpio_dev *dpio_dev)
 		close(dpio_dev->epoll_fd);
 		dpio_dev->epoll_fd = -1;
 	}
+	dpio_dev->intr_enabled = 0;
 }
 #endif
 
diff --git a/drivers/bus/fslmc/portal/dpaa2_hw_dpio.h b/drivers/bus/fslmc/portal/dpaa2_hw_dpio.h
index 328e1e788a..10dd968e5f 100644
--- a/drivers/bus/fslmc/portal/dpaa2_hw_dpio.h
+++ b/drivers/bus/fslmc/portal/dpaa2_hw_dpio.h
@@ -50,6 +50,10 @@ int dpaa2_affine_qbman_swp(void);
 __rte_internal
 int dpaa2_affine_qbman_ethrx_swp(void);
 
+/* set up a DPIO portal's DQRI interrupt (rx-queue interrupt mode) */
+__rte_internal
+int dpaa2_dpio_intr_init(struct dpaa2_dpio_dev *dpio_dev, bool build_epoll);
+
 /* allocate memory for FQ - dq storage */
 __rte_internal
 int
diff --git a/drivers/bus/fslmc/portal/dpaa2_hw_pvt.h b/drivers/bus/fslmc/portal/dpaa2_hw_pvt.h
index 79a2ec41e3..af75e96b27 100644
--- a/drivers/bus/fslmc/portal/dpaa2_hw_pvt.h
+++ b/drivers/bus/fslmc/portal/dpaa2_hw_pvt.h
@@ -133,6 +133,8 @@ struct dpaa2_dpio_dev {
 	struct rte_intr_handle *intr_handle; /* Interrupt related info */
 	int32_t	epoll_fd; /**< File descriptor created for interrupt polling */
 	int32_t hw_id; /**< An unique ID of this DPIO device instance */
+	uint8_t intr_enabled; /**< DQRI portal interrupt already set up */
+	uint16_t ethrx_intr_refcnt; /**< rx queues currently armed on this portal */
 	struct dpaa2_portal_dqrr dpaa2_held_bufs;
 };
 
@@ -164,6 +166,20 @@ typedef void (dpaa2_queue_cb_dqrr_t)(struct qbman_swp *swp,
 typedef void (dpaa2_queue_cb_eqresp_free_t)(uint16_t eqresp_ci,
 					struct dpaa2_queue *dpaa2_q);
 
+#define DPAA2_NAPI_FD_STASH_SIZE 64	/*!< power of 2; >= 2x rx burst so the
+					 * peer port's frames fit before HW
+					 * backpressure (2 ports/worker)
+					 */
+
+/* Lcore-local FIFO of raw FDs demuxed to this queue by another queue's burst
+ * on the same portal (see dpaa2_queue::napi_stash).
+ */
+struct dpaa2_napi_stash {
+	uint16_t head;	/*!< pop index (drain) */
+	uint16_t tail;	/*!< push index (park) */
+	struct qbman_fd fd[DPAA2_NAPI_FD_STASH_SIZE];
+};
+
 struct __rte_cache_aligned dpaa2_queue {
 	struct rte_mempool *mb_pool; /**< mbuf pool to populate RX ring. */
 	union {
@@ -176,7 +192,7 @@ struct __rte_cache_aligned dpaa2_queue {
 	uint8_t cgid;		/*! < Congestion Group id for this queue */
 	uint64_t rx_pkts;
 	uint64_t tx_pkts;
-	uint64_t err_pkts;
+	uint64_t err_pkts;	/*!< also counts NAPI stash-full drops (imissed) */
 	union {
 		/**Ingress*/
 		struct queue_storage_info_t *q_storage[RTE_MAX_LCORE];
@@ -195,6 +211,15 @@ struct __rte_cache_aligned dpaa2_queue {
 	uint64_t offloads;
 	uint64_t lpbk_cntx;
 	uint8_t data_stashing_off;
+	/* NAPI rx-interrupt: per-queue DPCON bound to this FQ at dev_start
+	 * (DEST_DPCON, static); the polling worker subscribes its ethrx portal
+	 * to the channel and arms the DQRI, rx_dqrr drains+demuxes by fqd_ctx.
+	 */
+	struct dpaa2_dpcon_dev *napi_dpcon;	/*!< notif channel, NULL = napi off */
+	RTE_ATOMIC(struct dpaa2_dpio_dev *) napi_sub_dpio;	/*!< subscribed portal or NULL */
+	uint8_t napi_channel_index;		/*!< portal-local static-dequeue idx */
+	uint8_t napi_armed;			/*!< this queue requests DQRI wakeups */
+	struct dpaa2_napi_stash napi_stash;	/*!< NAPI/DQRR demux FDs (~2 KB) */
 };
 
 struct swp_active_dqs {
diff --git a/drivers/bus/fslmc/qbman/qbman_portal.c b/drivers/bus/fslmc/qbman/qbman_portal.c
index 84853924e7..947415363a 100644
--- a/drivers/bus/fslmc/qbman/qbman_portal.c
+++ b/drivers/bus/fslmc/qbman/qbman_portal.c
@@ -448,6 +448,7 @@ int qbman_swp_interrupt_get_inhibit(struct qbman_swp *p)
 	return qbman_cinh_read(&p->sys, QBMAN_CINH_SWP_IIR);
 }
 
+RTE_EXPORT_INTERNAL_SYMBOL(qbman_swp_interrupt_set_inhibit)
 void qbman_swp_interrupt_set_inhibit(struct qbman_swp *p, int inhibit)
 {
 	qbman_cinh_write(&p->sys, QBMAN_CINH_SWP_IIR,
diff --git a/drivers/net/dpaa2/dpaa2_ethdev.c b/drivers/net/dpaa2/dpaa2_ethdev.c
index 8589398324..6407c24755 100644
--- a/drivers/net/dpaa2/dpaa2_ethdev.c
+++ b/drivers/net/dpaa2/dpaa2_ethdev.c
@@ -658,6 +658,8 @@ dpaa2_clear_queue_active_dps(struct dpaa2_queue *q, int num_lcores)
 	}
 }
 
+static void dpaa2_dev_rx_queue_intr_unbind(struct dpaa2_queue *dpaa2_q);
+
 static void
 dpaa2_free_rx_tx_queues(struct rte_eth_dev *dev)
 {
@@ -675,6 +677,12 @@ dpaa2_free_rx_tx_queues(struct rte_eth_dev *dev)
 		/* cleaning up queue storage */
 		for (i = 0; i < priv->nb_rx_queues; i++) {
 			dpaa2_q = priv->rx_vq[i];
+			if (dpaa2_q->napi_dpcon) {	/* release the rx-intr channel */
+				dpaa2_dev_rx_queue_intr_unbind(dpaa2_q);
+				rte_dpaa2_free_dpcon_dev(dpaa2_q->napi_dpcon);
+				dpaa2_q->napi_dpcon = NULL;
+				dpaa2_q->napi_sub_dpio = NULL;
+			}
 			dpaa2_clear_queue_active_dps(dpaa2_q,
 						RTE_MAX_LCORE);
 			dpaa2_queue_storage_free(dpaa2_q,
@@ -880,6 +888,21 @@ dpaa2_eth_dev_configure(struct rte_eth_dev *dev)
 		}
 	}
 
+	if (dev->data->dev_conf.intr_conf.rxq) {
+		if (!dev->intr_handle)
+			dev->intr_handle = rte_intr_instance_alloc(
+					RTE_INTR_INSTANCE_F_PRIVATE);
+		if (!dev->intr_handle ||
+		    rte_intr_vec_list_alloc(dev->intr_handle, "rxq_intr",
+				dev->data->nb_rx_queues) ||
+		    rte_intr_nb_efd_set(dev->intr_handle,
+				dev->data->nb_rx_queues) ||
+		    rte_intr_type_set(dev->intr_handle, RTE_INTR_HANDLE_EXT)) {
+			DPAA2_PMD_ERR("Failed to set up rx-queue interrupts");
+			return -rte_errno;
+		}
+	}
+
 	dpaa2_tm_init(dev);
 
 	return 0;
@@ -898,6 +921,7 @@ dpaa2_dev_rx_queue_setup(struct rte_eth_dev *dev,
 {
 	struct dpaa2_dev_priv *priv = dev->data->dev_private;
 	struct fsl_mc_io *dpni = dev->process_private;
+	bool dpcon_allocated = false;
 	struct dpaa2_queue *dpaa2_q;
 	struct dpni_queue cfg;
 	uint8_t options = 0;
@@ -938,6 +962,21 @@ dpaa2_dev_rx_queue_setup(struct rte_eth_dev *dev,
 	dpaa2_q->bp_array = rte_dpaa2_bpid_info;
 	dpaa2_q->offloads = rx_conf->offloads;
 
+	/* NAPI: grab a DPCON channel so dev_start can bind this FQ statically.
+	 * The DQRR burst replaces the poll path for every queue at once, so a
+	 * missing channel is fatal rather than a silent per-queue fallback.
+	 */
+	dpaa2_q->napi_sub_dpio = NULL;
+	if (dev->data->dev_conf.intr_conf.rxq && !dpaa2_q->napi_dpcon) {
+		dpaa2_q->napi_dpcon = rte_dpaa2_alloc_dpcon_dev();
+		if (!dpaa2_q->napi_dpcon) {
+			DPAA2_PMD_ERR("rxq %d: no DPCON for rx-queue interrupts",
+				      rx_queue_id);
+			return -ENODEV;
+		}
+		dpcon_allocated = true;
+	}
+
 	/*Get the flow id from given VQ id*/
 	flow_id = dpaa2_q->flow_id;
 	memset(&cfg, 0, sizeof(struct dpni_queue));
@@ -945,6 +984,10 @@ dpaa2_dev_rx_queue_setup(struct rte_eth_dev *dev,
 	options = options | DPNI_QUEUE_OPT_USER_CTX;
 	cfg.user_context = (size_t)(dpaa2_q);
 
+	/* clear any stale DPIO dest left scheduled by a prior rx-intr run */
+	options |= DPNI_QUEUE_OPT_DEST;
+	cfg.destination.type = DPNI_DEST_NONE;
+
 	/* check if a private cgr available. */
 	for (i = 0; i < priv->max_cgs; i++) {
 		if (!priv->cgid_in_use[i]) {
@@ -985,7 +1028,7 @@ dpaa2_dev_rx_queue_setup(struct rte_eth_dev *dev,
 			dpaa2_q->tc_index, flow_id, options, &cfg);
 	if (ret) {
 		DPAA2_PMD_ERR("Error in setting the rx flow: = %d", ret);
-		return ret;
+		goto err_free_dpcon;
 	}
 
 	dpaa2_q->nb_desc = nb_rx_desc;
@@ -1026,7 +1069,7 @@ dpaa2_dev_rx_queue_setup(struct rte_eth_dev *dev,
 		if (ret) {
 			DPAA2_PMD_ERR("Error in setting taildrop. err=(%d)",
 				ret);
-			return ret;
+			goto err_free_dpcon;
 		}
 	} else { /* Disable tail Drop */
 		struct dpni_taildrop taildrop = {0};
@@ -1046,12 +1089,22 @@ dpaa2_dev_rx_queue_setup(struct rte_eth_dev *dev,
 		if (ret) {
 			DPAA2_PMD_ERR("Error in setting taildrop. err=(%d)",
 				ret);
-			return ret;
+			goto err_free_dpcon;
 		}
 	}
 
 	dev->data->rx_queues[rx_queue_id] = dpaa2_q;
 	return 0;
+
+err_free_dpcon:
+	/* free only the DPCON this call allocated; a pre-existing one belongs to
+	 * an earlier setup and is released at dev_close
+	 */
+	if (dpcon_allocated) {
+		rte_dpaa2_free_dpcon_dev(dpaa2_q->napi_dpcon);
+		dpaa2_q->napi_dpcon = NULL;
+	}
+	return ret;
 }
 
 static int
@@ -1210,6 +1263,62 @@ dpaa2_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	return 0;
 }
 
+/* Fully release a queue's rx-interrupt state: detach the FQ from its DPCON,
+ * unbind the static dequeue channel from the portal and free any stashed FDs.
+ * Teardown only: the port is stopped and the portal quiesced; not a runtime
+ * rx_queue_intr_disable() replacement. Call before freeing the DPCON.
+ */
+static void
+dpaa2_dev_rx_queue_intr_unbind(struct dpaa2_queue *dpaa2_q)
+{
+	struct dpaa2_dev_priv *priv;
+	struct dpaa2_dpio_dev *dpio;
+	struct fsl_mc_io *dpni;
+	struct dpni_queue cfg;
+	int ret;
+
+	if (!dpaa2_q || !dpaa2_q->napi_dpcon)
+		return;
+
+	/* detach the FQ from its DPCON so it no longer points at a channel
+	 * about to be returned to the pool (dpni is disabled at teardown)
+	 */
+	priv = dpaa2_q->eth_data->dev_private;
+	dpni = priv->eth_dev->process_private;
+	memset(&cfg, 0, sizeof(cfg));
+	cfg.destination.type = DPNI_DEST_NONE;
+	ret = dpni_set_queue(dpni, CMD_PRI_LOW, priv->token, DPNI_QUEUE_RX,
+			     dpaa2_q->tc_index, dpaa2_q->flow_id,
+			     DPNI_QUEUE_OPT_DEST, &cfg);
+	if (ret)
+		DPAA2_PMD_ERR("napi: DEST_NONE rxq flow %u: %d",
+			      dpaa2_q->flow_id, ret);
+
+	/* unbind the static dequeue channel from the portal it was armed on */
+	dpio = rte_atomic_load_explicit(&dpaa2_q->napi_sub_dpio,
+			rte_memory_order_acquire);
+	if (dpio) {
+		qbman_swp_push_set(dpio->sw_portal,
+				dpaa2_q->napi_channel_index, 0);
+		if (dpaa2_q->napi_armed) {
+			dpaa2_q->napi_armed = 0;
+			if (dpio->ethrx_intr_refcnt > 0 &&
+			    --dpio->ethrx_intr_refcnt == 0)
+				qbman_swp_interrupt_set_inhibit(dpio->sw_portal, 1);
+		}
+		ret = dpio_remove_static_dequeue_channel(dpio->dpio, CMD_PRI_LOW,
+				dpio->token, dpaa2_q->napi_dpcon->dpcon_id);
+		if (ret)
+			DPAA2_PMD_ERR("napi: remove DPCON %d static dequeue channel: %d",
+				      dpaa2_q->napi_dpcon->dpcon_id, ret);
+		rte_atomic_store_explicit(&dpaa2_q->napi_sub_dpio, NULL,
+				rte_memory_order_release);
+	}
+
+	/* free FDs parked for this queue but never drained by a burst */
+	dpaa2_dev_rx_queue_napi_stash_drain(dpaa2_q);
+}
+
 static void
 dpaa2_dev_rx_queue_release(struct rte_eth_dev *dev, uint16_t rx_queue_id)
 {
@@ -1239,6 +1348,12 @@ dpaa2_dev_rx_queue_release(struct rte_eth_dev *dev, uint16_t rx_queue_id)
 		priv->cgid_in_use[dpaa2_q->cgid] = 0;
 		dpaa2_q->cgid = DPAA2_INVALID_CGID;
 	}
+
+	if (dpaa2_q->napi_dpcon) {
+		dpaa2_dev_rx_queue_intr_unbind(dpaa2_q);
+		rte_dpaa2_free_dpcon_dev(dpaa2_q->napi_dpcon);
+		dpaa2_q->napi_dpcon = NULL;
+	}
 }
 
 static int
@@ -1389,6 +1504,36 @@ dpaa2_dev_start(struct rte_eth_dev *dev)
 	intr_handle = dpaa2_dev->intr_handle;
 
 	PMD_INIT_FUNC_TRACE();
+
+	/* NAPI: bind each rx FQ to its own DPCON channel while the dpni is still
+	 * disabled (a DEST set_queue on an enabled dpni wedges the shared MC).
+	 * Static, affinity-free; the polling worker subscribes its portal later.
+	 */
+	if (dev->data->dev_conf.intr_conf.rxq) {
+		for (i = 0; i < data->nb_rx_queues; i++) {
+			dpaa2_q = data->rx_queues[i];
+			if (!dpaa2_q->napi_dpcon)
+				continue;
+			memset(&cfg, 0, sizeof(cfg));
+			cfg.destination.type = DPNI_DEST_DPCON;
+			cfg.destination.id = dpaa2_q->napi_dpcon->dpcon_id;
+			cfg.user_context = (size_t)dpaa2_q;
+			ret = dpni_set_queue(dpni, CMD_PRI_LOW, priv->token,
+					DPNI_QUEUE_RX, dpaa2_q->tc_index,
+					dpaa2_q->flow_id,
+					DPNI_QUEUE_OPT_DEST | DPNI_QUEUE_OPT_USER_CTX,
+					&cfg);
+			if (ret) {
+				DPAA2_PMD_ERR("napi: DPCON bind rxq %d: %d", i, ret);
+				return ret;
+			}
+		}
+		/* DQRR burst for all queues; a queue only yields frames once
+		 * rx_queue_intr_enable() has subscribed its portal
+		 */
+		dev->rx_pkt_burst = dpaa2_dev_rx_dqrr;
+	}
+
 	ret = dpni_enable(dpni, CMD_PRI_LOW, priv->token);
 	if (ret) {
 		DPAA2_PMD_ERR("Failure in enabling dpni %d device: err=%d",
@@ -1859,6 +2004,13 @@ dpaa2_dev_stats_get(struct rte_eth_dev *dev,
 	stats->oerrors = value.page_2.egress_discarded_frames;
 	stats->imissed = value.page_2.ingress_nobuffer_discards;
 
+	/* software Rx drops (full napi stash) are not in the HW counters */
+	for (i = 0; i < priv->nb_rx_queues; i++) {
+		dpaa2_rxq = priv->rx_vq[i];
+		if (dpaa2_rxq != NULL)
+			stats->imissed += dpaa2_rxq->err_pkts;
+	}
+
 	/* Fill in per queue stats */
 	if (qstats != NULL) {
 		for (i = 0; (i < RTE_ETHDEV_QUEUE_STAT_CNTRS) &&
@@ -2172,8 +2324,10 @@ dpaa2_dev_stats_reset(struct rte_eth_dev *dev)
 	/* Reset the per queue stats in dpaa2_queue structure */
 	for (i = 0; i < priv->nb_rx_queues; i++) {
 		dpaa2_q = priv->rx_vq[i];
-		if (dpaa2_q)
+		if (dpaa2_q) {
 			dpaa2_q->rx_pkts = 0;
+			dpaa2_q->err_pkts = 0;
+		}
 	}
 
 	for (i = 0; i < priv->nb_tx_queues; i++) {
@@ -2901,6 +3055,135 @@ rte_pmd_dpaa2_thread_init(void)
 	}
 }
 
+/* Arm rx-queue interrupts on the worker lcore: subscribe its ethrx portal to
+ * the queue's DPCON channel (one-shot per-portal MC) and unmask the portal DQRI
+ * (pure QBMan).
+ *
+ * Affinity is static queue-to-lcore; a lcore may own several rx queues. The
+ * DQRI and the eventfd are portal-wide, so frames are demuxed by fqd_ctx in the
+ * burst and the portal's inhibit bit is reference-counted by the number of its
+ * queues currently armed (ethrx_intr_refcnt) -- disabling one queue must not
+ * mask wakeups still wanted by its siblings. napi_armed and ethrx_intr_refcnt
+ * are plain (not atomic): these ops run on the queue's owner lcore against its
+ * own portal (one portal per lcore), so per-portal isolation keeps them from
+ * racing, not control-plane serialization.
+ *
+ * A re-home reclaims the channel by poking the old portal, so the caller must
+ * have quiesced the previous owner and disabled the queue there; napi_armed is
+ * then 0 and only the new portal is counted.
+ */
+static int
+dpaa2_dev_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t queue_id)
+{
+	struct dpaa2_dev_priv *priv = dev->data->dev_private;
+	struct dpaa2_queue *dpaa2_q = priv->rx_vq[queue_id];
+	struct dpaa2_dpio_dev *dpio, *old;
+	int ret;
+
+	if (!dpaa2_q->napi_dpcon)
+		return -ENOTSUP;	/* no channel -> caller keeps polling */
+
+	if (dpaa2_affine_qbman_ethrx_swp())
+		return -EIO;
+	dpio = DPAA2_PER_LCORE_ETHRX_DPIO;
+
+	/* build_epoll=false: the generic ethdev rx-intr API waits on the
+	 * application epoll, not the portal's private one (event PMD only).
+	 */
+	ret = dpaa2_dpio_intr_init(dpio, false);	/* VFIO eventfd, no MC */
+	if (ret)
+		return ret;
+
+	old = rte_atomic_load_explicit(&dpaa2_q->napi_sub_dpio, rte_memory_order_acquire);
+	if (old && old != dpio && dpaa2_q->napi_armed) {
+		DPAA2_PMD_ERR("rxq %d still armed on another portal; disable it first",
+			      queue_id);
+		return -EBUSY;
+	}
+	if (old != dpio) {
+		if (old) {	/* reclaim from old portal (quiesced; QBMan MMIO unsynced) */
+			qbman_swp_push_set(old->sw_portal,
+					dpaa2_q->napi_channel_index, 0);
+			ret = dpio_remove_static_dequeue_channel(old->dpio,
+					CMD_PRI_LOW, old->token,
+					dpaa2_q->napi_dpcon->dpcon_id);
+			/* push_set(0) above already stops the old portal from
+			 * dequeuing; a failed unbind only leaks a static-channel
+			 * slot on the old DPIO, so warn and proceed
+			 */
+			if (ret)
+				DPAA2_PMD_WARN("napi: reclaim rxq %d: %d",
+					       queue_id, ret);
+			/* on no portal until the add below succeeds */
+			rte_atomic_store_explicit(&dpaa2_q->napi_sub_dpio, NULL,
+					rte_memory_order_release);
+		}
+		ret = dpio_add_static_dequeue_channel(dpio->dpio, CMD_PRI_LOW,
+				dpio->token, dpaa2_q->napi_dpcon->dpcon_id,
+				&dpaa2_q->napi_channel_index);
+		if (ret) {
+			DPAA2_PMD_ERR("napi: subscribe rxq %d: %d", queue_id, ret);
+			return ret;
+		}
+		qbman_swp_push_set(dpio->sw_portal,
+				dpaa2_q->napi_channel_index, 1);
+		/* point this queue's eventfd at the portal's DQRI fd so the
+		 * generic rte_eth_dev_rx_intr_ctl_q epoll wakes on it
+		 */
+		if (rte_intr_vec_list_index_set(dev->intr_handle, queue_id, queue_id) ||
+		    rte_intr_efds_index_set(dev->intr_handle, queue_id,
+				rte_intr_fd_get(dpio->intr_handle))) {
+			DPAA2_PMD_ERR("napi: efd wiring rxq %d", queue_id);
+			/* unwind the half-done subscription so HW and driver
+			 * state stay consistent
+			 */
+			qbman_swp_push_set(dpio->sw_portal,
+					dpaa2_q->napi_channel_index, 0);
+			dpio_remove_static_dequeue_channel(dpio->dpio,
+					CMD_PRI_LOW, dpio->token,
+					dpaa2_q->napi_dpcon->dpcon_id);
+			return -EIO;
+		}
+		rte_atomic_store_explicit(&dpaa2_q->napi_sub_dpio, dpio, rte_memory_order_release);
+	}
+
+	/* arm this queue; the portal DQRI is unmasked only on the 0 -> 1 edge
+	 * of its armed-queue count
+	 */
+	if (!dpaa2_q->napi_armed) {
+		dpaa2_q->napi_armed = 1;
+		if (dpio->ethrx_intr_refcnt++ == 0) {
+			qbman_swp_interrupt_clear_status(dpio->sw_portal,
+					0xffffffff);
+			qbman_swp_interrupt_set_inhibit(dpio->sw_portal, 0);
+		}
+	}
+
+	return 0;
+}
+
+/* Disarm rx-queue interrupts for this queue. The portal DQRI is masked only
+ * once the last of its queues disarms; act on the portal the queue is actually
+ * subscribed to, not the caller's current portal.
+ */
+static int
+dpaa2_dev_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t queue_id)
+{
+	struct dpaa2_dev_priv *priv = dev->data->dev_private;
+	struct dpaa2_queue *dpaa2_q = priv->rx_vq[queue_id];
+	struct dpaa2_dpio_dev *dpio;
+
+	dpio = rte_atomic_load_explicit(&dpaa2_q->napi_sub_dpio, rte_memory_order_acquire);
+	if (dpio && dpaa2_q->napi_armed) {
+		dpaa2_q->napi_armed = 0;
+		if (dpio->ethrx_intr_refcnt > 0 &&
+		    --dpio->ethrx_intr_refcnt == 0)
+			qbman_swp_interrupt_set_inhibit(dpio->sw_portal, 1);
+	}
+
+	return 0;
+}
+
 static struct eth_dev_ops dpaa2_ethdev_ops = {
 	.dev_configure	  = dpaa2_eth_dev_configure,
 	.dev_start	      = dpaa2_dev_start,
@@ -2929,6 +3212,8 @@ static struct eth_dev_ops dpaa2_ethdev_ops = {
 	.vlan_tpid_set	      = dpaa2_vlan_tpid_set,
 	.rx_queue_setup    = dpaa2_dev_rx_queue_setup,
 	.rx_queue_release  = dpaa2_dev_rx_queue_release,
+	.rx_queue_intr_enable = dpaa2_dev_rx_queue_intr_enable,
+	.rx_queue_intr_disable = dpaa2_dev_rx_queue_intr_disable,
 	.tx_queue_setup    = dpaa2_dev_tx_queue_setup,
 	.rx_burst_mode_get = dpaa2_dev_rx_burst_mode_get,
 	.tx_burst_mode_get = dpaa2_dev_tx_burst_mode_get,
diff --git a/drivers/net/dpaa2/dpaa2_ethdev.h b/drivers/net/dpaa2/dpaa2_ethdev.h
index 3f224c654e..65fb48bd27 100644
--- a/drivers/net/dpaa2/dpaa2_ethdev.h
+++ b/drivers/net/dpaa2/dpaa2_ethdev.h
@@ -500,6 +500,9 @@ uint16_t dpaa2_dev_loopback_rx(void *queue, struct rte_mbuf **bufs,
 
 uint16_t dpaa2_dev_prefetch_rx(void *queue, struct rte_mbuf **bufs,
 			       uint16_t nb_pkts);
+uint16_t dpaa2_dev_rx_dqrr(void *queue, struct rte_mbuf **bufs,
+			   uint16_t nb_pkts);
+void dpaa2_dev_rx_queue_napi_stash_drain(struct dpaa2_queue *dpaa2_q);
 void dpaa2_dev_process_parallel_event(struct qbman_swp *swp,
 				      const struct qbman_fd *fd,
 				      const struct qbman_result *dq,
diff --git a/drivers/net/dpaa2/dpaa2_rxtx.c b/drivers/net/dpaa2/dpaa2_rxtx.c
index b316e23e87..189accc1de 100644
--- a/drivers/net/dpaa2/dpaa2_rxtx.c
+++ b/drivers/net/dpaa2/dpaa2_rxtx.c
@@ -922,6 +922,128 @@ dpaa2_dev_prefetch_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	return num_rx;
 }
 
+/* Convert a DQRR'd FD (single or scatter-gather) to an mbuf and apply software
+ * VLAN strip, like the poll path.
+ */
+static inline struct rte_mbuf *
+dpaa2_dqrr_fd_to_mbuf(const struct qbman_fd *fd,
+		      struct rte_eth_dev_data *eth_data)
+{
+	struct rte_mbuf *m;
+
+	if (unlikely(DPAA2_FD_GET_FORMAT(fd) == qbman_fd_sg))
+		m = eth_sg_fd_to_mbuf(fd, eth_data->port_id);
+	else
+		m = eth_fd_to_mbuf(fd, eth_data->port_id);
+	if (eth_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_VLAN_STRIP)
+		rte_vlan_strip(m);
+	return m;
+}
+
+/* prefetch a DQRR'd FD's HW annotation (parse area) ahead of conversion */
+static inline void
+dpaa2_dqrr_prefetch_annot(const struct qbman_fd *fd)
+{
+	rte_prefetch0((void *)((size_t)DPAA2_IOVA_TO_VADDR(DPAA2_GET_FD_ADDR(fd))
+			       + DPAA2_FD_PTA_SIZE));
+}
+
+/* Free FDs a sibling burst parked in this queue's stash but that were never
+ * drained (queue released/freed while the lcore still held its frames).
+ */
+void
+dpaa2_dev_rx_queue_napi_stash_drain(struct dpaa2_queue *dpaa2_q)
+{
+	struct dpaa2_napi_stash *stash = &dpaa2_q->napi_stash;
+	const struct qbman_fd *fd;
+
+	while (stash->head != stash->tail) {
+		fd = &stash->fd[stash->head & (DPAA2_NAPI_FD_STASH_SIZE - 1)];
+		rte_pktmbuf_free(dpaa2_dqrr_fd_to_mbuf(fd, dpaa2_q->eth_data));
+		stash->head++;
+	}
+	stash->head = 0;
+	stash->tail = 0;
+}
+
+/* rx interrupt/DQRR path: the FQ is scheduled to a channel the lcore's ethrx
+ * portal statically dequeues -- a VDQ on a scheduled FQ never completes, so DQRR
+ * is the only model compatible with interrupt sleep. One portal serves every
+ * queue the lcore owns, so the burst demuxes by fqd_ctx: own frames are
+ * returned, foreign ones have their raw FD parked in the target queue's stash.
+ *
+ * The application must therefore poll all queues assigned to the lcore after a
+ * wakeup -- the same scheduling contract as plain DPDK polling. When a foreign
+ * queue's stash is full the FD is dropped (freed) rather than left on the shared
+ * DQRR ring, which would head-of-line block every other queue on the portal.
+ */
+uint16_t __rte_hot
+dpaa2_dev_rx_dqrr(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+	struct dpaa2_queue *dpaa2_q = queue;
+	struct rte_eth_dev_data *eth_data = dpaa2_q->eth_data;
+	struct dpaa2_napi_stash *stash = &dpaa2_q->napi_stash;
+	const struct qbman_result *dq;
+	const struct qbman_fd *fd;
+	struct dpaa2_queue *rxq;
+	struct qbman_swp *swp;
+	uint16_t num_rx = 0;
+
+	if (unlikely(!DPAA2_PER_LCORE_ETHRX_DPIO)) {
+		if (dpaa2_affine_qbman_ethrx_swp()) {
+			DPAA2_PMD_ERR("Failure in affining portal");
+			return 0;
+		}
+	}
+	swp = DPAA2_PER_LCORE_ETHRX_PORTAL;
+
+	/* our frames parked by another queue's burst -- convert now (hot) */
+	while (num_rx < nb_pkts && stash->head != stash->tail) {
+		fd = &stash->fd[stash->head & (DPAA2_NAPI_FD_STASH_SIZE - 1)];
+		if (dpaa2_svr_family != SVR_LX2160A &&
+		    (uint16_t)(stash->head + 1) != stash->tail)
+			dpaa2_dqrr_prefetch_annot(&stash->fd[(stash->head + 1) &
+					(DPAA2_NAPI_FD_STASH_SIZE - 1)]);
+		bufs[num_rx++] = dpaa2_dqrr_fd_to_mbuf(fd, eth_data);
+		stash->head++;
+	}
+
+	while (num_rx < nb_pkts) {
+		dq = qbman_swp_dqrr_next(swp);
+		if (!dq)
+			break;			/* ring momentarily empty */
+		qbman_swp_prefetch_dqrr_next(swp);
+		fd = qbman_result_DQ_fd(dq);
+		/* parse summary is in the FRC on LX2160A; annotation is HW-stashed */
+		if (dpaa2_svr_family != SVR_LX2160A)
+			dpaa2_dqrr_prefetch_annot(fd);
+		rxq = (struct dpaa2_queue *)(size_t)qbman_result_DQ_fqd_ctx(dq);
+		if (unlikely(!rxq))
+			rxq = dpaa2_q;
+		if (rxq == dpaa2_q) {
+			bufs[num_rx++] = dpaa2_dqrr_fd_to_mbuf(fd, eth_data);
+		} else {
+			struct dpaa2_napi_stash *fs = &rxq->napi_stash;
+
+			if (unlikely((uint16_t)(fs->tail - fs->head) >=
+						DPAA2_NAPI_FD_STASH_SIZE)) {
+				/* stash full: drop rather than leave it on the ring
+				 * and head-of-line block the shared portal
+				 */
+				rte_pktmbuf_free(dpaa2_dqrr_fd_to_mbuf(fd, rxq->eth_data));
+				rxq->err_pkts++;
+			} else {
+				fs->fd[fs->tail & (DPAA2_NAPI_FD_STASH_SIZE - 1)] = *fd;
+				fs->tail++;
+			}
+		}
+		qbman_swp_dqrr_consume(swp, dq);
+	}
+
+	dpaa2_q->rx_pkts += num_rx;
+	return num_rx;
+}
+
 void __rte_hot
 dpaa2_dev_process_parallel_event(struct qbman_swp *swp,
 				 const struct qbman_fd *fd,
-- 
2.43.0


^ permalink raw reply related

* [PATCH 6/9] bus/fslmc/dpio: tune DQRI interrupt coalescing holdoff
From: Maxime Leroy @ 2026-06-11 15:49 UTC (permalink / raw)
  To: hemant.agrawal, sachin.saxena; +Cc: dev, Maxime Leroy
In-Reply-To: <20260611154926.392670-1-maxime@leroys.fr>

The portal DQRI interrupt used a fixed threshold of 3 and a raw 0xFF
timeout. Parameterize dpaa2_dpio_intr_init() with (threshold, timeout) so
each mode supplies its own: the event driver keeps the legacy 3 / 0xFF
and its DPAA2_PORTAL_INTR_THRESHOLD / DPAA2_PORTAL_INTR_TIMEOUT env-var
overrides, while rx-queue interrupts default the threshold to the HW DQRR
ring depth (ring-1, =7 on QBMan >= 4.1) and use a coalescing holdoff in
microseconds, converted to ITP units from the MC-reported QBMan clock
(itp = holdoff_us * clk_MHz / 256, capped at the 12-bit field). The setup
is portal-wide and idempotent, so the first mode to arm a given portal
wins; a portal is normally driven by a single mode.

The net/dpaa2 PMD exposes both rx-queue-interrupt knobs as per-port
devargs: drv_rx_intr_holdoff_us (default 100us) and drv_rx_intr_threshold
(default 0 = ring-1, clamped to [1, ring-1]). Also expose
dpaa2_dpio_intr_deinit() (no longer event-only), and on the intr_init
error paths close the epoll fd and disable the interrupt.

Add qbman_swp_dqrr_size() to expose the ring depth.

Signed-off-by: Maxime Leroy <maxime@leroys.fr>
---
 doc/guides/nics/dpaa2.rst                     | 10 +++
 drivers/bus/fslmc/portal/dpaa2_hw_dpio.c      | 72 +++++++++++++------
 drivers/bus/fslmc/portal/dpaa2_hw_dpio.h      | 12 +++-
 .../fslmc/qbman/include/fsl_qbman_portal.h    |  9 +++
 drivers/bus/fslmc/qbman/qbman_portal.c        |  6 ++
 drivers/net/dpaa2/dpaa2_ethdev.c              | 60 +++++++++++++++-
 drivers/net/dpaa2/dpaa2_ethdev.h              |  7 ++
 7 files changed, 151 insertions(+), 25 deletions(-)

diff --git a/doc/guides/nics/dpaa2.rst b/doc/guides/nics/dpaa2.rst
index 2d70bd0ab9..47a52c9287 100644
--- a/doc/guides/nics/dpaa2.rst
+++ b/doc/guides/nics/dpaa2.rst
@@ -492,6 +492,16 @@ for details.
   packets, so that user can check what is wrong with those packets.
   e.g. ``fslmc:dpni.1,drv_error_queue=1``
 
+* Use dev arg option ``drv_rx_intr_holdoff_us=<uint32>`` to set the Rx queue
+  interrupt coalescing holdoff in microseconds (default 100). Only applies in
+  Rx queue interrupt mode.
+  e.g. ``fslmc:dpni.1,drv_rx_intr_holdoff_us=50``
+
+* Use dev arg option ``drv_rx_intr_threshold=<uint32>`` to set the Rx queue
+  interrupt coalescing frame threshold; 0 (default) means the DQRR ring depth
+  minus one.
+  e.g. ``fslmc:dpni.1,drv_rx_intr_threshold=4``
+
 Enabling logs
 -------------
 
diff --git a/drivers/bus/fslmc/portal/dpaa2_hw_dpio.c b/drivers/bus/fslmc/portal/dpaa2_hw_dpio.c
index e6b4e74b3b..c5525a94fa 100644
--- a/drivers/bus/fslmc/portal/dpaa2_hw_dpio.c
+++ b/drivers/bus/fslmc/portal/dpaa2_hw_dpio.c
@@ -206,12 +206,35 @@ dpaa2_affine_dpio_intr_to_respective_core(int32_t dpio_id, int cpu_id)
 }
 #endif /* RTE_EVENT_DPAA2 */
 
+/* holdoff (us) -> QBMan ITP units (256 cycles each), capped at the 12-bit field */
+RTE_EXPORT_INTERNAL_SYMBOL(dpaa2_dpio_holdoff_to_itp)
+int dpaa2_dpio_holdoff_to_itp(struct dpaa2_dpio_dev *dpio_dev, uint32_t holdoff_us)
+{
+	uint32_t qman_mhz = 0;
+	struct dpio_attr attr;
+	uint64_t itp;
+
+	if (dpio_get_attributes(dpio_dev->dpio, CMD_PRI_LOW, dpio_dev->token, &attr) == 0)
+		qman_mhz = attr.clk / 1000000;
+	itp = qman_mhz ? ((uint64_t)holdoff_us * qman_mhz) / 256 : 0xFF;
+	if (itp > 0xfff)	/* 12-bit ITP field */
+		itp = 0xfff;
+
+	return (int)itp;
+}
+
+/* threshold: DQRR fill raising DQRI (< ring depth); timeout: holdoff in ITP units.
+ * Per-mode values from the caller (eventdev vs rx-queue intr); no env override.
+ * The DQRI config is portal-wide and this is idempotent: the first caller to
+ * arm a portal wins, a later caller's values are ignored (a portal normally
+ * serves a single mode).
+ */
 RTE_EXPORT_INTERNAL_SYMBOL(dpaa2_dpio_intr_init)
-int dpaa2_dpio_intr_init(struct dpaa2_dpio_dev *dpio_dev, bool build_epoll)
+int dpaa2_dpio_intr_init(struct dpaa2_dpio_dev *dpio_dev, int threshold,
+			 int timeout, bool build_epoll)
 {
-	struct epoll_event epoll_ev;
 	int eventfd, dpio_epoll_fd, ret;
-	int threshold = 0x3, timeout = 0xFF;
+	struct epoll_event epoll_ev;
 
 	if (dpio_dev->intr_enabled)
 		return 0;
@@ -222,12 +245,6 @@ int dpaa2_dpio_intr_init(struct dpaa2_dpio_dev *dpio_dev, bool build_epoll)
 		return -1;
 	}
 
-	if (getenv("DPAA2_PORTAL_INTR_THRESHOLD"))
-		threshold = atoi(getenv("DPAA2_PORTAL_INTR_THRESHOLD"));
-
-	if (getenv("DPAA2_PORTAL_INTR_TIMEOUT"))
-		sscanf(getenv("DPAA2_PORTAL_INTR_TIMEOUT"), "%x", &timeout);
-
 	qbman_swp_interrupt_set_trigger(dpio_dev->sw_portal,
 					QBMAN_SWP_INTERRUPT_DQRI);
 	qbman_swp_interrupt_clear_status(dpio_dev->sw_portal, 0xffffffff);
@@ -238,9 +255,9 @@ int dpaa2_dpio_intr_init(struct dpaa2_dpio_dev *dpio_dev, bool build_epoll)
 	dpio_dev->epoll_fd = -1;
 
 	/* The event PMD dequeues by sleeping on a private epoll instance owned
-	 * by the portal, so build it here. A caller that waits on another
-	 * epoll (the net rx-queue-interrupt path uses the application's) skips
-	 * this.
+	 * by the portal, so build it here. The net rx-queue-interrupt path
+	 * exposes the raw eventfd through the generic ethdev API and waits on
+	 * the application's own epoll instead, so it skips this.
 	 */
 	if (build_epoll) {
 		dpio_epoll_fd = epoll_create(1);
@@ -269,11 +286,14 @@ int dpaa2_dpio_intr_init(struct dpaa2_dpio_dev *dpio_dev, bool build_epoll)
 	return 0;
 }
 
-#ifdef RTE_EVENT_DPAA2
-static void dpaa2_dpio_intr_deinit(struct dpaa2_dpio_dev *dpio_dev)
+RTE_EXPORT_INTERNAL_SYMBOL(dpaa2_dpio_intr_deinit)
+void dpaa2_dpio_intr_deinit(struct dpaa2_dpio_dev *dpio_dev)
 {
 	int ret;
 
+	if (!dpio_dev->intr_enabled)
+		return;
+
 	ret = rte_dpaa2_intr_disable(dpio_dev->intr_handle, 0);
 	if (ret)
 		DPAA2_BUS_ERR("DPIO interrupt disable failed");
@@ -284,7 +304,6 @@ static void dpaa2_dpio_intr_deinit(struct dpaa2_dpio_dev *dpio_dev)
 	}
 	dpio_dev->intr_enabled = 0;
 }
-#endif
 
 static int
 dpaa2_configure_stashing(struct dpaa2_dpio_dev *dpio_dev, int cpu_id)
@@ -306,9 +325,18 @@ dpaa2_configure_stashing(struct dpaa2_dpio_dev *dpio_dev, int cpu_id)
 	}
 
 #ifdef RTE_EVENT_DPAA2
-	if (dpaa2_dpio_intr_init(dpio_dev, true)) {
-		DPAA2_BUS_ERR("Interrupt registration failed for dpio");
-		return -1;
+	{
+		int threshold = 3, timeout = 0xFF;
+
+		if (getenv("DPAA2_PORTAL_INTR_THRESHOLD"))
+			threshold = atoi(getenv("DPAA2_PORTAL_INTR_THRESHOLD"));
+		if (getenv("DPAA2_PORTAL_INTR_TIMEOUT"))
+			sscanf(getenv("DPAA2_PORTAL_INTR_TIMEOUT"), "%x", &timeout);
+
+		if (dpaa2_dpio_intr_init(dpio_dev, threshold, timeout, true)) {
+			DPAA2_BUS_ERR("Interrupt registration failed for dpio");
+			return -1;
+		}
 	}
 	dpaa2_affine_dpio_intr_to_respective_core(dpio_dev->hw_id, cpu_id);
 #endif
@@ -319,9 +347,11 @@ dpaa2_configure_stashing(struct dpaa2_dpio_dev *dpio_dev, int cpu_id)
 static void dpaa2_put_qbman_swp(struct dpaa2_dpio_dev *dpio_dev)
 {
 	if (dpio_dev) {
-#ifdef RTE_EVENT_DPAA2
+		/* rx-queue interrupts (net PMD) can arm a portal without the
+		 * event driver; tear it down unconditionally. Safe when never
+		 * armed: intr_deinit returns early if intr is not enabled.
+		 */
 		dpaa2_dpio_intr_deinit(dpio_dev);
-#endif
 		rte_atomic16_clear(&dpio_dev->ref_count);
 	}
 }
@@ -512,6 +542,8 @@ dpaa2_create_dpio_device(int vdev_fd,
 		goto err;
 	}
 
+	DPAA2_BUS_DEBUG("QBMAN clk = %u Hz (%u MHz)", attr.clk, attr.clk / 1000000);
+
 	/* find the SoC type for the first time */
 	if (!dpaa2_svr_family) {
 		struct mc_soc_version mc_plat_info = {0};
diff --git a/drivers/bus/fslmc/portal/dpaa2_hw_dpio.h b/drivers/bus/fslmc/portal/dpaa2_hw_dpio.h
index 10dd968e5f..090fa14410 100644
--- a/drivers/bus/fslmc/portal/dpaa2_hw_dpio.h
+++ b/drivers/bus/fslmc/portal/dpaa2_hw_dpio.h
@@ -50,9 +50,17 @@ int dpaa2_affine_qbman_swp(void);
 __rte_internal
 int dpaa2_affine_qbman_ethrx_swp(void);
 
-/* set up a DPIO portal's DQRI interrupt (rx-queue interrupt mode) */
+/* set up / tear down a DPIO portal's DQRI interrupt (rx-queue interrupt mode) */
 __rte_internal
-int dpaa2_dpio_intr_init(struct dpaa2_dpio_dev *dpio_dev, bool build_epoll);
+int dpaa2_dpio_intr_init(struct dpaa2_dpio_dev *dpio_dev, int threshold,
+			 int timeout, bool build_epoll);
+
+__rte_internal
+void dpaa2_dpio_intr_deinit(struct dpaa2_dpio_dev *dpio_dev);
+
+/* convert a coalescing holdoff (microseconds) to QBMan ITP units */
+__rte_internal
+int dpaa2_dpio_holdoff_to_itp(struct dpaa2_dpio_dev *dpio_dev, uint32_t holdoff_us);
 
 /* allocate memory for FQ - dq storage */
 __rte_internal
diff --git a/drivers/bus/fslmc/qbman/include/fsl_qbman_portal.h b/drivers/bus/fslmc/qbman/include/fsl_qbman_portal.h
index 5375ea386d..842ef6f067 100644
--- a/drivers/bus/fslmc/qbman/include/fsl_qbman_portal.h
+++ b/drivers/bus/fslmc/qbman/include/fsl_qbman_portal.h
@@ -157,6 +157,15 @@ uint32_t qbman_swp_intr_timeout_read_status(struct qbman_swp *p);
  */
 void qbman_swp_intr_timeout_write(struct qbman_swp *p, uint32_t mask);
 
+/**
+ * qbman_swp_dqrr_size() - Get the HW DQRR ring depth of a software portal.
+ * @p: the given software portal object.
+ *
+ * Returns the number of DQRR entries (4 on QBMan < 4.1, 8 on >= 4.1). Useful
+ * as the upper bound for the DQRR interrupt coalescing threshold.
+ */
+uint8_t qbman_swp_dqrr_size(struct qbman_swp *p);
+
 /**
  * qbman_swp_interrupt_get_trigger() - Get the data in software portal
  * interrupt enable register.
diff --git a/drivers/bus/fslmc/qbman/qbman_portal.c b/drivers/bus/fslmc/qbman/qbman_portal.c
index 947415363a..81c2d87e0a 100644
--- a/drivers/bus/fslmc/qbman/qbman_portal.c
+++ b/drivers/bus/fslmc/qbman/qbman_portal.c
@@ -433,6 +433,12 @@ void qbman_swp_intr_timeout_write(struct qbman_swp *p, uint32_t mask)
 	qbman_cinh_write(&p->sys, QBMAN_CINH_SWP_ITPR, mask);
 }
 
+RTE_EXPORT_INTERNAL_SYMBOL(qbman_swp_dqrr_size)
+uint8_t qbman_swp_dqrr_size(struct qbman_swp *p)
+{
+	return p->dqrr.dqrr_size;
+}
+
 uint32_t qbman_swp_interrupt_get_trigger(struct qbman_swp *p)
 {
 	return qbman_cinh_read(&p->sys, QBMAN_CINH_SWP_IER);
diff --git a/drivers/net/dpaa2/dpaa2_ethdev.c b/drivers/net/dpaa2/dpaa2_ethdev.c
index 6407c24755..7ca454eaae 100644
--- a/drivers/net/dpaa2/dpaa2_ethdev.c
+++ b/drivers/net/dpaa2/dpaa2_ethdev.c
@@ -36,6 +36,9 @@
 #define DRIVER_ERROR_QUEUE  "drv_err_queue"
 #define DRIVER_NO_TAILDROP  "drv_no_taildrop"
 #define DRIVER_NO_DATA_STASHING "drv_no_data_stashing"
+#define DRIVER_RX_INTR_HOLDOFF_US "drv_rx_intr_holdoff_us"
+#define DPAA2_RX_INTR_HOLDOFF_US_DEF 100
+#define DRIVER_RX_INTR_THRESHOLD "drv_rx_intr_threshold"
 #define CHECK_INTERVAL         100  /* 100ms */
 #define MAX_REPEAT_TIME        90   /* 9s (90 * 100ms) in total */
 
@@ -3078,7 +3081,7 @@ dpaa2_dev_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t queue_id)
 	struct dpaa2_dev_priv *priv = dev->data->dev_private;
 	struct dpaa2_queue *dpaa2_q = priv->rx_vq[queue_id];
 	struct dpaa2_dpio_dev *dpio, *old;
-	int ret;
+	int ret, threshold, timeout, dqrr_max;
 
 	if (!dpaa2_q->napi_dpcon)
 		return -ENOTSUP;	/* no channel -> caller keeps polling */
@@ -3087,10 +3090,22 @@ dpaa2_dev_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t queue_id)
 		return -EIO;
 	dpio = DPAA2_PER_LCORE_ETHRX_DPIO;
 
+	/* threshold from drv_rx_intr_threshold (0 = ring-1), holdoff from
+	 * drv_rx_intr_holdoff_us. idempotent: no-op if the dpio is already
+	 * armed (e.g. event driver)
+	 */
+	dqrr_max = qbman_swp_dqrr_size(dpio->sw_portal) - 1;
+	threshold = priv->rx_intr_threshold ? (int)priv->rx_intr_threshold : dqrr_max;
+	if (threshold < 1 || threshold > dqrr_max) {
+		DPAA2_PMD_WARN("drv_rx_intr_threshold %d out of [1, %d], clamping",
+			       threshold, dqrr_max);
+		threshold = threshold < 1 ? 1 : dqrr_max;
+	}
+	timeout = dpaa2_dpio_holdoff_to_itp(dpio, priv->rx_intr_holdoff_us);
 	/* build_epoll=false: the generic ethdev rx-intr API waits on the
 	 * application epoll, not the portal's private one (event PMD only).
 	 */
-	ret = dpaa2_dpio_intr_init(dpio, false);	/* VFIO eventfd, no MC */
+	ret = dpaa2_dpio_intr_init(dpio, threshold, timeout, false);
 	if (ret)
 		return ret;
 
@@ -3346,6 +3361,35 @@ dpaa2_get_devargs(struct rte_devargs *devargs, const char *key)
 	return 1;
 }
 
+static int
+u32_devarg_handler(__rte_unused const char *key, const char *value, void *opaque)
+{
+	char *end;
+	unsigned long v = strtoul(value, &end, 0);
+
+	if (*value == '\0' || *end != '\0' || v > UINT32_MAX)
+		return -1;
+	*(uint32_t *)opaque = (uint32_t)v;
+
+	return 0;
+}
+
+/* Read a u32-valued devarg into *out, leaving *out untouched if absent. */
+static void
+dpaa2_get_devargs_u32(struct rte_devargs *devargs, const char *key, uint32_t *out)
+{
+	struct rte_kvargs *kvlist;
+
+	if (!devargs)
+		return;
+	kvlist = rte_kvargs_parse(devargs->args, NULL);
+	if (!kvlist)
+		return;
+	if (rte_kvargs_count(kvlist, key))
+		rte_kvargs_process(kvlist, key, u32_devarg_handler, out);
+	rte_kvargs_free(kvlist);
+}
+
 static int
 dpaa2_dev_init(struct rte_eth_dev *eth_dev)
 {
@@ -3373,6 +3417,14 @@ dpaa2_dev_init(struct rte_eth_dev *eth_dev)
 		DPAA2_PMD_INFO("No RX prefetch mode");
 	}
 
+	priv->rx_intr_holdoff_us = DPAA2_RX_INTR_HOLDOFF_US_DEF;
+	dpaa2_get_devargs_u32(dev->devargs, DRIVER_RX_INTR_HOLDOFF_US,
+			      &priv->rx_intr_holdoff_us);
+
+	priv->rx_intr_threshold = 0;
+	dpaa2_get_devargs_u32(dev->devargs, DRIVER_RX_INTR_THRESHOLD,
+			      &priv->rx_intr_threshold);
+
 	if (dpaa2_get_devargs(dev->devargs, DRIVER_LOOPBACK_MODE)) {
 		priv->flags |= DPAA2_RX_LOOPBACK_MODE;
 		DPAA2_PMD_INFO("Rx loopback mode");
@@ -3888,5 +3940,7 @@ RTE_PMD_REGISTER_PARAM_STRING(NET_DPAA2_PMD_DRIVER_NAME,
 		DRIVER_RX_PARSE_ERR_DROP "=<int>"
 		DRIVER_ERROR_QUEUE "=<int>"
 		DRIVER_NO_TAILDROP "=<int>"
-		DRIVER_NO_DATA_STASHING "=<int>");
+		DRIVER_NO_DATA_STASHING "=<int> "
+		DRIVER_RX_INTR_HOLDOFF_US "=<uint32> "
+		DRIVER_RX_INTR_THRESHOLD "=<uint32>");
 RTE_LOG_REGISTER_DEFAULT(dpaa2_logtype_pmd, NOTICE);
diff --git a/drivers/net/dpaa2/dpaa2_ethdev.h b/drivers/net/dpaa2/dpaa2_ethdev.h
index 65fb48bd27..d8be1f8bce 100644
--- a/drivers/net/dpaa2/dpaa2_ethdev.h
+++ b/drivers/net/dpaa2/dpaa2_ethdev.h
@@ -412,6 +412,13 @@ struct dpaa2_dev_priv {
 	uint8_t max_cgs;
 	uint8_t cgid_in_use[MAX_RX_QUEUES];
 
+	/* DQRI holdoff (us) for rx-queue interrupts (drv_rx_intr_holdoff_us) */
+	uint32_t rx_intr_holdoff_us;
+	/* DQRI threshold for rx-queue interrupts (drv_rx_intr_threshold);
+	 * 0 = auto (DQRR ring depth - 1)
+	 */
+	uint32_t rx_intr_threshold;
+
 	/* Current hash distribution size per RX TC, written by
 	 * dpaa2_setup_flow_dist_size() and read by reta_query / reta_update.
 	 * Zero means "use default" (= nb_rx_queues clamped to dist_queues).
-- 
2.43.0


^ permalink raw reply related

* [PATCH 7/9] net/dpaa2: fix Rx queue count for primary process
From: Maxime Leroy @ 2026-06-11 15:49 UTC (permalink / raw)
  To: hemant.agrawal, sachin.saxena
  Cc: dev, Maxime Leroy, stable, Ferruh Yigit, Andrew Rybchenko,
	David Marchand
In-Reply-To: <20260611154926.392670-1-maxime@leroys.fr>

The rx_queue_count callback was only assigned on the secondary process
path of dpaa2_dev_init(), leaving eth_dev->rx_queue_count NULL for the
primary process. The fast-path rte_eth_rx_queue_count() performs an
unguarded indirect call in non-debug builds, so invoking it on a
primary-process dpaa2 port dereferences a NULL function pointer and
crashes.

Assign the callback once before the process-type split so both the
primary and secondary paths set it.

Fixes: cbfc6111b557 ("ethdev: move inline device operations")
Cc: stable@dpdk.org
Signed-off-by: Maxime Leroy <maxime@leroys.fr>
---
 drivers/net/dpaa2/dpaa2_ethdev.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/dpaa2/dpaa2_ethdev.c b/drivers/net/dpaa2/dpaa2_ethdev.c
index 7ca454eaae..fb117e761f 100644
--- a/drivers/net/dpaa2/dpaa2_ethdev.c
+++ b/drivers/net/dpaa2/dpaa2_ethdev.c
@@ -3617,6 +3617,7 @@ dpaa2_dev_init(struct rte_eth_dev *eth_dev)
 	}
 
 	eth_dev->dev_ops = &dpaa2_ethdev_ops;
+	eth_dev->rx_queue_count = dpaa2_dev_rx_queue_count;
 
 	if (dpaa2_get_devargs(dev->devargs, DRIVER_LOOPBACK_MODE)) {
 		eth_dev->rx_pkt_burst = dpaa2_dev_loopback_rx;
-- 
2.43.0


^ permalink raw reply related

* [PATCH 8/9] ethdev: keep fast-path ops valid after port stop
From: Maxime Leroy @ 2026-06-11 15:49 UTC (permalink / raw)
  To: hemant.agrawal, sachin.saxena
  Cc: dev, Maxime Leroy, stable, Thomas Monjalon, Andrew Rybchenko,
	Morten Brørup, Sunil Kumar Kori
In-Reply-To: <20260611154926.392670-1-maxime@leroys.fr>

eth_dev_fp_ops_reset() restores a port's fast-path ops on stop/release
via a compound literal, so every field it omits is zeroed to NULL. It
sets only rx_pkt_burst/tx_pkt_burst (and the rxq/txq data), leaving
rx_queue_count, tx_queue_count, rx/tx_descriptor_status, tx_pkt_prepare
and the recycle callbacks NULL.

In non-debug builds these ops are reached through an unguarded indirect
call (the NULL check exists only under RTE_ETHDEV_DEBUG_RX/TX). So a
thread calling e.g. rte_eth_rx_queue_count() on a port being stopped
dereferences NULL and crashes, while the same race on rte_eth_rx_burst()
is harmless because the burst ops are reset to dummies. A poll-mode
worker re-checking rx_queue_count before arming the Rx interrupt and
sleeping hits exactly this.

Reset these ops to the same dummies eth_dev_set_dummy_fops() installs,
so a stopped port behaves like a freshly allocated one: every fast-path
op is a safe no-op, none is NULL.

Fixes: 066f3d9cc21c ("ethdev: remove callback checks from fast path")
Cc: stable@dpdk.org
Signed-off-by: Maxime Leroy <maxime@leroys.fr>
---
 lib/ethdev/ethdev_private.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/lib/ethdev/ethdev_private.c b/lib/ethdev/ethdev_private.c
index 72a0723846..75ea3eedff 100644
--- a/lib/ethdev/ethdev_private.c
+++ b/lib/ethdev/ethdev_private.c
@@ -263,6 +263,13 @@ eth_dev_fp_ops_reset(struct rte_eth_fp_ops *fpo)
 	*fpo = (struct rte_eth_fp_ops) {
 		.rx_pkt_burst = dummy_eth_rx_burst,
 		.tx_pkt_burst = dummy_eth_tx_burst,
+		.tx_pkt_prepare = rte_eth_tx_pkt_prepare_dummy,
+		.rx_queue_count = rte_eth_queue_count_dummy,
+		.tx_queue_count = rte_eth_queue_count_dummy,
+		.rx_descriptor_status = rte_eth_descriptor_status_dummy,
+		.tx_descriptor_status = rte_eth_descriptor_status_dummy,
+		.recycle_tx_mbufs_reuse = rte_eth_recycle_tx_mbufs_reuse_dummy,
+		.recycle_rx_descriptors_refill = rte_eth_recycle_rx_descriptors_refill_dummy,
 		.rxq = {
 			.data = (void **)&dummy_queues_array[port_id],
 			.clbk = dummy_data,
-- 
2.43.0


^ permalink raw reply related

* [PATCH 9/9] net/dpaa2: drop the fake software VLAN strip offload
From: Maxime Leroy @ 2026-06-11 15:49 UTC (permalink / raw)
  To: hemant.agrawal, sachin.saxena; +Cc: dev, Maxime Leroy
In-Reply-To: <20260611154926.392670-1-maxime@leroys.fr>

RTE_ETH_RX_OFFLOAD_VLAN_STRIP is advertised, but no hardware VLAN strip
backs it: when enabled, the Rx burst calls rte_vlan_strip() on every
frame, a software op masquerading as a hardware offload.

It saves a forwarding application nothing: the datapath reads the L2
header anyway to classify or strip. The offload does not remove that
read, it relocates it into the driver Rx burst, where it is far more
expensive.

The cost is a matter of timing. rte_vlan_strip() reaches the L2 header
through rte_pktmbuf_mtod(), which dereferences mbuf->buf_addr. On a
freshly recycled buffer that mbuf cacheline is cold. eth_fd_to_mbuf()
has just written other fields of it (data_off, ol_flags), but buf_addr
is a persistent field it does not rewrite. A write does not stall: it
posts to the store buffer while the line fills in the background, and
the rewritten fields are forwarded straight from there. buf_addr has
nothing to forward, so it must be read from the line, whose fill is
still in flight, and the read stalls. The ethertype read that follows,
on the cold payload line, stalls again. Read later by the application,
when the fill has completed, the same read hits. The offload just
performs it at the worst possible moment.

Measured on a single-core port-to-port forwarding test over two 10G
ports (one core at 2 GHz, 64-byte untagged frames):

  - throughput 4.22 -> 5.00 Mpps (+18 percent)
  - IPC 0.93 -> 1.25: the cost was memory stall, not compute
  - L3/DRAM-bound L2 refills 319M -> 200M over 10s (-37 percent)

perf confirms it: with the offload, the buf_addr load (the cold mbuf
field) and the payload load account for about 84 percent of the Rx
burst's L2 refills; removing it, those vanish and only the inherent DQRR
dequeue misses remain.

Stop advertising VLAN_STRIP and remove the rte_vlan_strip() calls from
every Rx path. This is a behavioural change: the tag is left in the
frame, so an application must strip it itself, on the L2 header it
already reads.

Signed-off-by: Maxime Leroy <maxime@leroys.fr>
---
 doc/guides/rel_notes/release_26_07.rst |  3 +++
 drivers/net/dpaa2/dpaa2_ethdev.c       |  1 -
 drivers/net/dpaa2/dpaa2_rxtx.c         | 23 +++--------------------
 3 files changed, 6 insertions(+), 21 deletions(-)

diff --git a/doc/guides/rel_notes/release_26_07.rst b/doc/guides/rel_notes/release_26_07.rst
index 87c7c57bcc..9d01099dad 100644
--- a/doc/guides/rel_notes/release_26_07.rst
+++ b/doc/guides/rel_notes/release_26_07.rst
@@ -130,6 +130,9 @@ New Features
 
   * Added RSS RETA query and update support.
   * Added Rx queue interrupt support.
+  * Removed the software VLAN strip offload: ``RTE_ETH_RX_OFFLOAD_VLAN_STRIP``
+    is no longer advertised, as no hardware strip backs it. An application
+    that needs the tag removed must now strip it itself.
 
 * **Updated PCAP ethernet driver.**
 
diff --git a/drivers/net/dpaa2/dpaa2_ethdev.c b/drivers/net/dpaa2/dpaa2_ethdev.c
index fb117e761f..b3ea826db9 100644
--- a/drivers/net/dpaa2/dpaa2_ethdev.c
+++ b/drivers/net/dpaa2/dpaa2_ethdev.c
@@ -48,7 +48,6 @@ static uint64_t dev_rx_offloads_sup =
 		RTE_ETH_RX_OFFLOAD_SCTP_CKSUM |
 		RTE_ETH_RX_OFFLOAD_OUTER_IPV4_CKSUM |
 		RTE_ETH_RX_OFFLOAD_OUTER_UDP_CKSUM |
-		RTE_ETH_RX_OFFLOAD_VLAN_STRIP |
 		RTE_ETH_RX_OFFLOAD_VLAN_FILTER |
 		RTE_ETH_RX_OFFLOAD_TIMESTAMP;
 
diff --git a/drivers/net/dpaa2/dpaa2_rxtx.c b/drivers/net/dpaa2/dpaa2_rxtx.c
index 189accc1de..d16e4f8f35 100644
--- a/drivers/net/dpaa2/dpaa2_rxtx.c
+++ b/drivers/net/dpaa2/dpaa2_rxtx.c
@@ -890,10 +890,6 @@ dpaa2_dev_prefetch_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		}
 #endif
 
-		if (eth_data->dev_conf.rxmode.offloads &
-				RTE_ETH_RX_OFFLOAD_VLAN_STRIP)
-			rte_vlan_strip(bufs[num_rx]);
-
 		dq_storage++;
 		num_rx++;
 	} while (pending);
@@ -922,22 +918,14 @@ dpaa2_dev_prefetch_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	return num_rx;
 }
 
-/* Convert a DQRR'd FD (single or scatter-gather) to an mbuf and apply software
- * VLAN strip, like the poll path.
- */
+/* Convert a DQRR'd FD (single or scatter-gather) to an mbuf. */
 static inline struct rte_mbuf *
 dpaa2_dqrr_fd_to_mbuf(const struct qbman_fd *fd,
 		      struct rte_eth_dev_data *eth_data)
 {
-	struct rte_mbuf *m;
-
 	if (unlikely(DPAA2_FD_GET_FORMAT(fd) == qbman_fd_sg))
-		m = eth_sg_fd_to_mbuf(fd, eth_data->port_id);
-	else
-		m = eth_fd_to_mbuf(fd, eth_data->port_id);
-	if (eth_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_VLAN_STRIP)
-		rte_vlan_strip(m);
-	return m;
+		return eth_sg_fd_to_mbuf(fd, eth_data->port_id);
+	return eth_fd_to_mbuf(fd, eth_data->port_id);
 }
 
 /* prefetch a DQRR'd FD's HW annotation (parse area) ahead of conversion */
@@ -1222,11 +1210,6 @@ dpaa2_dev_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		}
 #endif
 
-		if (eth_data->dev_conf.rxmode.offloads &
-				RTE_ETH_RX_OFFLOAD_VLAN_STRIP) {
-			rte_vlan_strip(bufs[num_rx]);
-		}
-
 			dq_storage++;
 			num_rx++;
 			num_pulled++;
-- 
2.43.0


^ permalink raw reply related

* RE: [PATCH 9/9] net/dpaa2: drop the fake software VLAN strip offload
From: Morten Brørup @ 2026-06-11 15:56 UTC (permalink / raw)
  To: Maxime Leroy, hemant.agrawal, sachin.saxena; +Cc: dev
In-Reply-To: <20260611154926.392670-10-maxime@leroys.fr>

This patch is unrelated to the series.


^ permalink raw reply

* RE: [PATCH 8/9] ethdev: keep fast-path ops valid after port stop
From: Morten Brørup @ 2026-06-11 16:01 UTC (permalink / raw)
  To: Maxime Leroy, hemant.agrawal, sachin.saxena
  Cc: dev, stable, Thomas Monjalon, Andrew Rybchenko, Sunil Kumar Kori
In-Reply-To: <20260611154926.392670-9-maxime@leroys.fr>

> From: Maxime Leroy [mailto:maxime.leroys@gmail.com] On Behalf Of Maxime
> Leroy
> Sent: Thursday, 11 June 2026 17.49
> 
> eth_dev_fp_ops_reset() restores a port's fast-path ops on stop/release
> via a compound literal, so every field it omits is zeroed to NULL. It
> sets only rx_pkt_burst/tx_pkt_burst (and the rxq/txq data), leaving
> rx_queue_count, tx_queue_count, rx/tx_descriptor_status, tx_pkt_prepare
> and the recycle callbacks NULL.
> 
> In non-debug builds these ops are reached through an unguarded indirect
> call (the NULL check exists only under RTE_ETHDEV_DEBUG_RX/TX). So a
> thread calling e.g. rte_eth_rx_queue_count() on a port being stopped
> dereferences NULL and crashes, while the same race on
> rte_eth_rx_burst()
> is harmless because the burst ops are reset to dummies. A poll-mode
> worker re-checking rx_queue_count before arming the Rx interrupt and
> sleeping hits exactly this.
> 
> Reset these ops to the same dummies eth_dev_set_dummy_fops() installs,
> so a stopped port behaves like a freshly allocated one: every fast-path
> op is a safe no-op, none is NULL.
> 
> Fixes: 066f3d9cc21c ("ethdev: remove callback checks from fast path")
> Cc: stable@dpdk.org
> Signed-off-by: Maxime Leroy <maxime@leroys.fr>
> ---

Good catch.
Acked-by: Morten Brørup <mb@smartsharesystems.com>

Not related to the series, consider sending as separate patch.


^ permalink raw reply

* RE: [PATCH 9/9] net/dpaa2: drop the fake software VLAN strip offload
From: Morten Brørup @ 2026-06-11 16:13 UTC (permalink / raw)
  To: Maxime Leroy, hemant.agrawal, sachin.saxena; +Cc: dev
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35F65908@smartserver.smartshare.dk>

> This patch is unrelated to the series.
And also,
Acked-by: Morten Brørup <mb@smartsharesystems.com>

We should take note of this for other drivers!


^ permalink raw reply

* Re: [PATCH v1 0/6] net/r8169: hardware updates, optimizations, and a bug fix
From: Stephen Hemminger @ 2026-06-11 16:46 UTC (permalink / raw)
  To: Howard Wang; +Cc: dev, pro_nic_dpdk
In-Reply-To: <20260611083521.20669-1-howard_wang@realsil.com.cn>

On Thu, 11 Jun 2026 16:28:27 +0800
Howard Wang <howard_wang@realsil.com.cn> wrote:

> This patch series primarily focuses on updating hardware configurations, 
> optimizing the datapath, and refining device behaviors for the net/r8169 PMD. 
> Additionally, it includes one bug fix for a segmentation fault encountered 
> during initialization.
> 
> Summary of the series:
> 
>   - Patch 1: Updates RX CRC drop behavior for RTL8125BP and later MAC versions
>     to align with device shutdown sequences and prevent cross-driver states.
>   - Patch 2: Optimizes the Tx datapath performance by removing redundant branch
>     checks for malformed packets, replacing them with RTE_ASSERT.
>   - Patch 3: Enhances RTL8125+ flow control by utilizing a new formula for 
>     nearfull and nearempty thresholds.
>   - Patch 4: Removes RTL9151 CSI (DBI) channel support, as firmware handling 
>     latency makes it no longer suitable for the driver.
>   - Patch 5: Updates PHY and MAC MCU configurations for RTL9151A and RTL8125BP.
>   - Patch 6: Fixes a segmentation fault during RTL8168 initialization by 
>     restricting RTL8125-specific RSS/VMQ configurations to the correct hardware.
> 
> Howard Wang (6):
>   net/r8169: disable RX CRC drop for RTL8125BP and later
>   net/r8169: optimize Tx datapath by removing redundant packet checks
>   net/r8169: improve RTL8125+ flow control
>   net/r8169: remove RTL9151 CSI (DBI) channel support
>   net/r8169: update hardware configurations for 8125
>   net/r8169: fix segmentation fault during RTL8168 initialization
> 
>  drivers/net/r8169/base/rtl8125bp_mcu.c | 15 ++--
>  drivers/net/r8169/base/rtl9151a.c      |  8 +++
>  drivers/net/r8169/base/rtl9151a_mcu.c  | 14 +++-
>  drivers/net/r8169/r8169_compat.h       |  1 +
>  drivers/net/r8169/r8169_hw.c           | 98 ++++++++++++++++++++++++--
>  drivers/net/r8169/r8169_hw.h           |  2 +-
>  drivers/net/r8169/r8169_rxtx.c         | 32 ++++-----
>  7 files changed, 137 insertions(+), 33 deletions(-)
> 

Looks good, the CI AI review complaints are noise and will ignore those.
Applied to next-net


^ permalink raw reply

* Re: [PATCH 9/9] net/dpaa2: drop the fake software VLAN strip offload
From: Maxime Leroy @ 2026-06-11 16:58 UTC (permalink / raw)
  To: Morten Brørup; +Cc: Hemant Agrawal, Sachin Saxena, dev
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35F65908@smartserver.smartshare.dk>

[-- Attachment #1: Type: text/plain, Size: 717 bytes --]

Le jeu. 11 juin 2026, 17:56, Morten Brørup <mb@smartsharesystems.com> a
écrit :

> This patch is unrelated to the series.
>
>
> Splitting this would create an ordering problem. If the NAPI series is
merged with a software VLAN strip implementation and the cleanup removing
the fake VLAN_STRIP offload is merged separately, the two can land in
either order and leave the PMD with inconsistent Rx paths.

The new NAPI/DQRR path must match the offloads reported by the PMD at the
end
of the series. Since VLAN_STRIP is not a real dpaa2 hardware offload, this
series removes the advertised offload and the software rte_vlan_strip()
calls together, so all Rx paths remain consistent at each merge point.

[-- Attachment #2: Type: text/html, Size: 1218 bytes --]

^ permalink raw reply

* Re: [PATCH] net/crc: add 4x folding loop for x86 SSE implementation
From: Stephen Hemminger @ 2026-06-11 17:06 UTC (permalink / raw)
  To: Shreesh Adiga; +Cc: Jasvinder Singh, Bruce Richardson, Konstantin Ananyev, dev
In-Reply-To: <20260609075712.247286-1-16567adigashreesh@gmail.com>

On Tue,  9 Jun 2026 13:27:12 +0530
Shreesh Adiga <16567adigashreesh@gmail.com> wrote:

> Add a 64-byte loop that maintains 4 fold registers and processes
> 64 bytes at a time. The 4x fold registers is then reduced to 16 byte
> single fold, similar to AVX512 implementation. This technique is
> described in the paper by Intel:
> "Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction"
> 
> This results in roughly 50% performance improvement due to better ILP
> for large input sizes like 1024.
> 
> Signed-off-by: Shreesh Adiga <16567adigashreesh@gmail.com>
> ---

Looks good applied to next-net.

A couple of nits from more detailed AI review, that you still might want to look at:

The current crc_autotest does not exercise the new 64-byte CRC16 path.
Its CRC32 vectors are 1512 and 348 bytes, so the CRC32 4x loop is
covered — but the largest CRC16 vector is 32 bytes, all three CRC16
tests being ≤32. So the new CRC16 rk1_rk2 (64-byte fold) constants ship
untested in CI. My exhaustive test confirms they're correct, but a
future regression there wouldn't be caught. Suggest adding a CRC16
vector ≥64 bytes, ideally a non-multiple of 64 (e.g. 80 or 100) so it
hits the 4x loop, the single-fold tail, and the partial-bytes path
together.

In partial_bytes the comment /* k = rk1 & rk2 */ is now stale
 — after the patch k holds rk3_rk4 on every path reaching it.
Not introduced by this patch, but the patch is what made it wrong;
worth fixing in passing.


^ permalink raw reply

* Re: [PATCH 01/17] net/cnxk: update mbuf next field for multi segment
From: Stephen Hemminger @ 2026-06-11 17:23 UTC (permalink / raw)
  To: Rahul Bhansali
  Cc: dev, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra, jerinj
In-Reply-To: <20260611073311.3129711-1-rbhansali@marvell.com>

On Thu, 11 Jun 2026 13:02:55 +0530
Rahul Bhansali <rbhansali@marvell.com> wrote:

> As per the requirement of rte_mbuf_raw_reset_bulk(), the mbuf's
> 'next' and 'nb_segs' fields are required to be reset.
> This reset these field for multi-segment mbufs on cn9k platform.
> 
> Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
> ---

Please follow code submission guidelines for DPDK and use
cover letter and threading of replies.
https://doc.dpdk.org/guides/contributing/patches.html#sending-patches

What you get wrong:
  - Please allow at least 24 hours to pass between posting patch revisions.
  - Missing cover letter to explain patchset
  - Use versions and in-reply-to. This keeps mail threads organized and
    helps maintainers track in patchwork as well.


^ permalink raw reply

* Re: [PATCH 9/9] net/dpaa2: drop the fake software VLAN strip offload
From: Stephen Hemminger @ 2026-06-11 17:30 UTC (permalink / raw)
  To: Maxime Leroy; +Cc: hemant.agrawal, sachin.saxena, dev
In-Reply-To: <20260611154926.392670-10-maxime@leroys.fr>

On Thu, 11 Jun 2026 17:49:24 +0200
Maxime Leroy <maxime@leroys.fr> wrote:

> It saves a forwarding application nothing: the datapath reads the L2
> header anyway to classify or strip. The offload does not remove that
> read, it relocates it into the driver Rx burst, where it is far more
> expensive.
> 
> The cost is a matter of timing. rte_vlan_strip() reaches the L2 header
> through rte_pktmbuf_mtod(), which dereferences mbuf->buf_addr. On a
> freshly recycled buffer that mbuf cacheline is cold. eth_fd_to_mbuf()
> has just written other fields of it (data_off, ol_flags), but buf_addr
> is a persistent field it does not rewrite. A write does not stall: it
> posts to the store buffer while the line fills in the background, and
> the rewritten fields are forwarded straight from there. buf_addr has
> nothing to forward, so it must be read from the line, whose fill is
> still in flight, and the read stalls. The ethertype read that follows,
> on the cold payload line, stalls again. Read later by the application,
> when the fill has completed, the same read hits. The offload just
> performs it at the worst possible moment.
> 
> Measured on a single-core port-to-port forwarding test over two 10G
> ports (one core at 2 GHz, 64-byte untagged frames):
> 
>   - throughput 4.22 -> 5.00 Mpps (+18 percent)
>   - IPC 0.93 -> 1.25: the cost was memory stall, not compute
>   - L3/DRAM-bound L2 refills 319M -> 200M over 10s (-37 percent)
> 
> perf confirms it: with the offload, the buf_addr load (the cold mbuf
> field) and the payload load account for about 84 percent of the Rx
> burst's L2 refills; removing it, those vanish and only the inherent DQRR
> dequeue misses remain.
> 
> Stop advertising VLAN_STRIP and remove the rte_vlan_strip() calls from
> every Rx path. This is a behavioural change: the tag is left in the
> frame, so an application must strip it itself, on the L2 header it
> already reads.
> 
> Signed-off-by: Maxime Leroy <maxime@leroys.fr>
> ---

In general I agree, but you overstate the impact. Any real application
is going to look at the mbuf anyway. Relying on testpmd numbers is BS.

The NBL driver does the same thing.
So does PCAP but it has no choice, and is slow anyway.
Virtio/vhost does as well.





^ permalink raw reply

* Re: [PATCH v8 00/18] Support VFIO cdev API in DPDK
From: Stephen Hemminger @ 2026-06-11 17:49 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev
In-Reply-To: <cover.1781190151.git.anatoly.burakov@intel.com>

On Thu, 11 Jun 2026 16:08:52 +0100
Anatoly Burakov <anatoly.burakov@intel.com> wrote:

> This patchset introduces a major refactor of the VFIO subsystem in DPDK to
> support character device (cdev) interface introduced in Linux kernel, as well as
> make the API more streamlined and useful. The goal is to simplify device
> management, improve compatibility, and clarify API responsibilities.
> 
> The following sections outline the key issues addressed by this patchset and the
> corresponding changes introduced.
> 
> 1. Only group mode is supported
> ===============================
> 
> Since kernel version 4.14.327 (LTS), VFIO supports the new character device
> (cdev)-based way of working with VFIO devices (otherwise known as IOMMUFD). This
> is a device-centric mode and does away with all the complexity regarding groups
> and IOMMU types, delegating it all to the kernel, and exposes a much simpler
> interface to userspace.
> 
> The old group interface is still around, and will need to be kept in DPDK both
> for compatibility reasons, as well as supporting special cases (FSLMC bus, NBL
> driver, no-IOMMU mode etc.).
> 
> To enable this, VFIO is heavily refactored, so that the code can support both
> modes while relying on (mostly) common infrastructure.
> 
> Note that the existing `rte_vfio_device_setup/release` model is fundamentally
> incompatible with cdev mode, because for custom container cases, the expected
> flow is that the user binds the IOMMU group (and thus, implicitly, the device
> itself) to a specific container using `rte_vfio_container_group_bind`, whereas
> this step is not needed for cdev as the device fd is assigned to the container
> straight away.
> 
> Therefore, what we do instead is introduce a new API for container device
> assignment which, semantically, will assign a device to specified container, so
> that when it is mapped using `rte_pci_map_device`, the appropriate container is
> selected. Under the hood though, we essentially transition to getting device fd
> straight away at assign stage, so that by the time the PCI bus attempts to map
> the device, it is already mapped and we just return an fd. There is no
> "unassign" API because `release_device` already performs that function.
> 
> Additionally, a new `rte_vfio_get_mode` API is added for those cases that need
> some introspection into VFIO's internals, with three new modes: group
> (old-style), no-iommu (old-style but without IOMMU), and cdev (the new mode).
> Although no-IOMMU is technically a variant of group mode, the distinction is
> largely irrelevant to the user, as all usages of noiommu checks in our codebase
> are for deciding whether to use IOVA or PA, not anything to do with managing
> groups. The current plan for kernel community is to *not* introduce no-IOMMU
> cdev implementation, and IOMMUFD's own group API compatibility layer also does
> not implement no-IOMMU mode, which is why this will be kept for compatibility
> for these use cases.
> 
> There were other users of VFIO which relied on group API but only for convenience
> purposes; no actual VFIO functionality depended on those API's. Therefore, group
> API's are removed and, where appropriate, replaced with the new API's.
> 
> List of removed API's:
> 
> * `rte_vfio_get_group_fd`
> * `rte_vfio_clear_group`
> * `rte_vfio_container_group_bind` (replaced by container assign API)
> * `rte_vfio_container_group_unbind`
> * `rte_vfio_noiommu_is_enabled` (replaced by new mode API)
> 
> 2. The API responsibilities aren't clear and bleed into each other
> ==================================================================
> 
> Some API's do multiple things at once. In particular:
> 
> * `rte_vfio_get_device_info` will setup the device
> * `rte_vfio_setup_device` will get device info
> 
> These API's have been adjusted to do one thing only.
> 
> v8:
> - Rebase
> - Fixed build errors due to variable shadowing
> - Removed duplicate fd check as kernel does not provide a way to distinguish
>   between device fd's
> 
> v7:
> - Rebase
> - Added removal of deprecation notices
> - Fixed implicit numeric comparison in patch 12
> 
> v6:
> - Fixed missing header include in vfio cdev file
> 
> v5:
> - Added back missing uapi patch
> 
> v4:
> - Fixed issues with documenting rte_vfio_mode enum
> - Separated deprecation notices into a separate patchset
> 
> v3:
> - Make API removal cleaner
> - Fix `get_group_num` usages to align with new API
> - Fix issues with function exports
> - Fix issues with `setup_device` returning old-style values in some cases
> 
> v2:
> - Make the entire API internal
> - More aggressive API pruning, complete removal of group API
> - Fixed a bug in group mode where device could not be used
> - Better documentation and deprecation notice patches
> - Moved doc patches to beginning of patchset
> 
> Anatoly Burakov (18):
>   uapi: update to v6.17 and add iommufd.h
>   vfio: make all functions internal
>   vfio: split get device info from setup
>   vfio: add container device assignment API
>   net/nbl: do not use VFIO group bind API
>   net/ntnic: use container device assignment API
>   vdpa/ifc: use container device assignment API
>   vdpa/nfp: use container device assignment API
>   vdpa/sfc: use container device assignment API
>   vhost: remove group-related API from drivers
>   vfio: remove group-based API
>   vfio: cleanup and refactor
>   bus/pci: use the new VFIO mode API
>   bus/fslmc: use the new VFIO mode API
>   net/hinic3: use the new VFIO mode API
>   net/ntnic: use the new VFIO mode API
>   vfio: remove no-IOMMU check API
>   vfio: introduce cdev mode
> 
>  config/arm/meson.build                    |    1 +
>  config/meson.build                        |    1 +
>  doc/guides/prog_guide/vhost_lib.rst       |    4 -
>  doc/guides/rel_notes/deprecation.rst      |   10 -
>  drivers/bus/cdx/cdx_vfio.c                |   25 +-
>  drivers/bus/fslmc/fslmc_bus.c             |   10 +-
>  drivers/bus/fslmc/fslmc_vfio.c            |    6 +-
>  drivers/bus/pci/linux/pci.c               |    2 +-
>  drivers/bus/pci/linux/pci_vfio.c          |   33 +-
>  drivers/bus/platform/platform.c           |    9 +-
>  drivers/crypto/bcmfs/bcmfs_vfio.c         |   14 +-
>  drivers/net/hinic3/base/hinic3_hwdev.c    |    3 +-
>  drivers/net/nbl/nbl_common/nbl_userdev.c  |   20 +-
>  drivers/net/nbl/nbl_include/nbl_include.h |    1 +
>  drivers/net/ntnic/ntnic_ethdev.c          |    2 +-
>  drivers/net/ntnic/ntnic_vfio.c            |   30 +-
>  drivers/vdpa/ifc/ifcvf_vdpa.c             |   34 +-
>  drivers/vdpa/mlx5/mlx5_vdpa.c             |    1 -
>  drivers/vdpa/nfp/nfp_vdpa.c               |   37 +-
>  drivers/vdpa/sfc/sfc_vdpa.c               |   39 +-
>  drivers/vdpa/sfc/sfc_vdpa.h               |    2 -
>  kernel/linux/uapi/linux/iommufd.h         | 1292 +++++++++++
>  kernel/linux/uapi/linux/vduse.h           |    2 +-
>  kernel/linux/uapi/linux/vfio.h            |   12 +-
>  kernel/linux/uapi/version                 |    2 +-
>  lib/eal/freebsd/eal.c                     |   98 +-
>  lib/eal/include/rte_vfio.h                |  387 ++--
>  lib/eal/linux/eal_vfio.c                  | 2437 ++++++++-------------
>  lib/eal/linux/eal_vfio.h                  |  167 +-
>  lib/eal/linux/eal_vfio_cdev.c             |  390 ++++
>  lib/eal/linux/eal_vfio_group.c            |  984 +++++++++
>  lib/eal/linux/eal_vfio_mp_sync.c          |   80 +-
>  lib/eal/linux/meson.build                 |    2 +
>  lib/eal/windows/eal.c                     |    4 +-
>  lib/vhost/vdpa_driver.h                   |    3 -
>  35 files changed, 4248 insertions(+), 1896 deletions(-)
>  create mode 100644 kernel/linux/uapi/linux/iommufd.h
>  create mode 100644 lib/eal/linux/eal_vfio_cdev.c
>  create mode 100644 lib/eal/linux/eal_vfio_group.c
> 

Big patchset so sent the big AI model at it...

Patch 4 (vfio: add container device assignment API)

Warning: header doc for rte_vfio_container_assign_device() says "<0 on
failure, rte_errno is set", but neither rte_vfio_get_group_num() nor
rte_vfio_container_group_bind() sets rte_errno on the Linux failure
paths at this point in the series. The rte_errno contract only becomes
true after the patch 12 rewrite. Either set rte_errno here or defer the
doc claim to patch 12.

Patch 5 (net/nbl: do not use VFIO group bind API)

Info: function definition does not follow DPDK style (return type on
its own line, blank line between declarations and statements):

	static int
	nbl_open_group_fd(int iommu_group_num)
	{
		char path[PATH_MAX];

		snprintf(path, sizeof(path), RTE_VFIO_GROUP_FMT, iommu_group_num);
		return open(path, O_RDWR);
	}

Patch 7 (vdpa/ifc: use container device assignment API)

Warning: this patch removes both the "internal->vfio_group_fd = -1"
initialization and the only assignment, but ifcvf_get_vfio_group_fd()
still returns the field until patch 10. Between patches 7 and 10 the
vdpa op returns 0 (zeroed allocation), i.e. a "valid" fd value. Nothing
in lib/vhost calls the op anymore so it is not reachable in practice,
but for bisectability either keep the -1 initialization here or move
patch 10 ahead of patches 7-9.

Patch 8 (vdpa/nfp: use container device assignment API)

Warning: same staging issue as patch 7, plus nfp_vdpa_vfio_teardown()
still calls rte_vfio_container_group_unbind(fd, device->iommu_group)
with device->iommu_group now never assigned (always 0 from calloc), so
every teardown between patches 8 and 10 issues an unbind for group 0
that fails silently. The teardown unbind removal currently in patch 10
belongs in this patch (patch 9 does this correctly for sfc, removing
the fields and all uses in one patch).

Patch 12 (vfio: cleanup and refactor) -- partial review

Warning: missing release notes. This patch (together with patches 2, 11,
17, 18) removes the public rte_vfio API, removes the group-bind API, and
changes rte_vfio_setup_device()/rte_vfio_get_group_num() return
semantics. None of the series touches the current release notes file;
the entire VFIO API removal and the new cdev mode need entries in
"Removed Items" / "New Features".

Info: rte_errno convention comment at top of eal_vfio.c says "ENOXIO";
the errno is ENXIO (code uses the correct one).

Patch 18 (vfio: introduce cdev mode)

Error: ioas_id is corrupted in secondary processes. struct container
puts vfio_group_config and vfio_cdev_config in a union, and both place
their first member at offset 0 (bool dma_setup_done / uint32_t ioas_id).
In vfio_select_mode(), the secondary path does:

	if (mode == RTE_VFIO_MODE_CDEV && vfio_cdev_sync_ioas(cfg) < 0)
		goto err;

	/* primary handles DMA setup for default containers */
	group_cfg->dma_setup_done = true;

In cdev mode the unconditional dma_setup_done store overwrites the low
byte of the ioas_id just received from the primary. The corrupted id is
then used by VFIO_DEVICE_ATTACH_IOMMUFD_PT and IOMMU_IOAS_MAP/UNMAP in
the secondary. It happens to work only when the primary's IOAS id has
low byte 1. Fix is to make the store mode-conditional:

	if (mode == RTE_VFIO_MODE_GROUP || mode == RTE_VFIO_MODE_NOIOMMU)
		group_cfg->dma_setup_done = true;

^ permalink raw reply

* Re: [PATCH v1 1/1] net/nbl: fix multicast reception in promiscuous mode
From: Stephen Hemminger @ 2026-06-11 18:04 UTC (permalink / raw)
  To: Dimon Zhao; +Cc: dev, stable, Leon Yu, Sam Chen
In-Reply-To: <20260609075143.32695-2-dimon.zhao@nebula-matrix.com>

On Tue,  9 Jun 2026 00:51:43 -0700
Dimon Zhao <dimon.zhao@nebula-matrix.com> wrote:

> When promiscuous mode is enabled on NBL PMD,
> the hardware does not forward multicast frames to the host,
> causing the driver to fail receiving multicast packets.
> This patch fixes the issue.
> 
> Fixes: 80bd3cad22c8 ("net/nbl: support promiscuous mode")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Dimon Zhao <dimon.zhao@nebula-matrix.com>
> ---
Applied to next-net

^ permalink raw reply

* Re: [PATCH v2 0/2] ethdev: fix out-of-bounds writes in rte_flow_conv()
From: Stephen Hemminger @ 2026-06-11 18:15 UTC (permalink / raw)
  To: James Raphael Tiovalen; +Cc: dev, orika, thomas, andrew.rybchenko, stable
In-Reply-To: <20260610113334.277895-1-jamestiotio@gmail.com>

On Wed, 10 Jun 2026 19:33:32 +0800
James Raphael Tiovalen <jamestiotio@gmail.com> wrote:

> rte_flow_conv() is documented to truncate output to the caller-supplied
> buffer size, but two paths handling variable-length trailing data
> ignored that contract and copied the full payload whenever the
> destination pointer was non-NULL. A caller passing a buffer just large
> enough for the fixed-size header had adjacent memory clobbered:
> 
> - GENEVE_OPT: up to option_len * 4 bytes
> - FLEX: up to 4 GiB, since src->length is a uint32_t and the API places
>   no bounds on it
> 
> Patch 1 aligns the GENEVE_OPT guard with the sibling RAW branch, which
> already gates its copy on the remaining buffer size.
> 
> Patch 2 plumbs the remaining buffer size into the flex-item desc_fn
> callback (which previously took no size argument at all) and gates the
> inner rte_memcpy() on it.
> 
> v2 fixes the merge conflict between patch 1 and the main branch.
> 
> James Raphael Tiovalen (2):
>   ethdev: fix out-of-bounds write in GENEVE option conversion
>   ethdev: fix out-of-bounds write in flex item conversion
> 
>  lib/ethdev/rte_flow.c | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 

Applied to next-net, and added you to .mailmap

^ permalink raw reply

* Re: [PATCH 8/9] ethdev: keep fast-path ops valid after port stop
From: Maxime Leroy @ 2026-06-11 18:39 UTC (permalink / raw)
  To: Morten Brørup
  Cc: Hemant Agrawal, Sachin Saxena, dev, stable, Thomas Monjalon,
	Andrew Rybchenko, Sunil Kumar Kori
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35F65909@smartserver.smartshare.dk>

[-- Attachment #1: Type: text/plain, Size: 2289 bytes --]

Le jeu. 11 juin 2026, 18:01, Morten Brørup <mb@smartsharesystems.com> a
écrit :

> > From: Maxime Leroy [mailto:maxime.leroys@gmail.com] On Behalf Of Maxime
> > Leroy
> > Sent: Thursday, 11 June 2026 17.49
> >
> > eth_dev_fp_ops_reset() restores a port's fast-path ops on stop/release
> > via a compound literal, so every field it omits is zeroed to NULL. It
> > sets only rx_pkt_burst/tx_pkt_burst (and the rxq/txq data), leaving
> > rx_queue_count, tx_queue_count, rx/tx_descriptor_status, tx_pkt_prepare
> > and the recycle callbacks NULL.
> >
> > In non-debug builds these ops are reached through an unguarded indirect
> > call (the NULL check exists only under RTE_ETHDEV_DEBUG_RX/TX). So a
> > thread calling e.g. rte_eth_rx_queue_count() on a port being stopped
> > dereferences NULL and crashes, while the same race on
> > rte_eth_rx_burst()
> > is harmless because the burst ops are reset to dummies. A poll-mode
> > worker re-checking rx_queue_count before arming the Rx interrupt and
> > sleeping hits exactly this.
> >
> > Reset these ops to the same dummies eth_dev_set_dummy_fops() installs,
> > so a stopped port behaves like a freshly allocated one: every fast-path
> > op is a safe no-op, none is NULL.
> >
> > Fixes: 066f3d9cc21c ("ethdev: remove callback checks from fast path")
> > Cc: stable@dpdk.org
> > Signed-off-by: Maxime Leroy <maxime@leroys.fr>
> > ---
>
> Good catch.
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
>
> Not related to the series, consider sending as separate patch.
>
Thanks for the review and Ack.

Agreed, this is a generic ethdev fix. I kept it in this series because the
NAPI user depends on it.

The current Grout NAPI loop arms RX queue interrupts and then re-checks
rte_eth_rx_queue_count() before blocking, to avoid sleeping when a packet
arrived between the last empty poll and epoll_wait.

With the current ethdev reset path, rx_burst is replaced by a dummy
callback on stop/release, but rx_queue_count becomes NULL. So if the port
is stopped concurrently, the NAPI worker dereferences a NULL function
pointer and
segfaults on that recheck.

I can split it out if maintainers prefer, but then the dpaa2 NAPI series
has a real dependency on the standalone ethdev fix.

>

[-- Attachment #2: Type: text/html, Size: 3478 bytes --]

^ permalink raw reply

* Re: [PATCH] dts: avoid Scapy MAC resolution in Rx split test
From: Stephen Hemminger @ 2026-06-11 18:54 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Luca Vizzarro, Patrick Robb
In-Reply-To: <20260610183218.751941-1-thomas@monjalon.net>

On Wed, 10 Jun 2026 20:32:18 +0200
Thomas Monjalon <thomas@monjalon.net> wrote:

> The test gets the Ethernet header length from Scapy with len(Ether()).
> 
> When building DTS API documentation, Sphinx imports the test module
> and shows this warning:
> WARNING: MAC address to reach destination not found. Using broadcast.
> 
> Use a dummy MAC address so Scapy no longer performs
> destination resolution during import.
> 
> Fixes: 01c70544cffd ("dts: add selective Rx tests")
> 
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>

Thanks, I previously reported this as:

https://bugs.dpdk.org/show_bug.cgi?id=1951

Acked-by: Stephen Hemminger <stephen@networkplumber.org>

^ permalink raw reply

* [PATCH 00/15] doc: clean up sample application guides
From: Stephen Hemminger @ 2026-06-11 21:18 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger
In-Reply-To: <20250216230903.124496-1-nandinipersad361@gmail.com>

This series revises the sample application user guides for clarity,
grammar, and formatting consistency. Changes are documentation only:
wording is simplified, command-line options are normalised, Linux and
Ethernet capitalisation is corrected, and definition lists replace
ad hoc bullet lists where a term/description structure fits.

This work started from edits by Nandini Persad and was extended and
reworked across the rest of the sample application guides.

Stephen Hemminger (15):
  doc: cleanups to bbdev sample application
  doc: cleanup cmd_line example documentation
  doc: cleanup the distribution sample application guide
  doc: improve structure and clarity of compiling guide
  doc: improve clarity and consistency in DMA sample app guide
  doc: correct capitalization and formatting in ethtool guide
  doc: improve clarity in eventdev, FIPS, and flow filtering
  doc: enhance hello_world, intro, IP frag and pipeline
  doc: improve IP reassembly, IPsec, multicast, and keep-alive
  doc: enhance L2 forwarding sample application guides
  doc: enhance multi-process, NTB, ordering, and PTP guides
  doc: improve QoS, callbacks, EFD, and service cores guides
  doc: enhance skeleton, pipeline, timer, and vhost guides
  doc: improve vhost, VM power, and VMDq sample guides
  doc: correct grammar and punctuation consistency issues

 doc/guides/sample_app_ug/bbdev_app.rst        |  73 +++--
 doc/guides/sample_app_ug/cmd_line.rst         |  37 +--
 doc/guides/sample_app_ug/compiling.rst        |  84 +++---
 doc/guides/sample_app_ug/dist_app.rst         |  52 ++--
 doc/guides/sample_app_ug/dma.rst              |  60 ++--
 doc/guides/sample_app_ug/ethtool.rst          |  18 +-
 .../sample_app_ug/eventdev_pipeline.rst       |  51 ++--
 doc/guides/sample_app_ug/fips_validation.rst  |  65 +++--
 doc/guides/sample_app_ug/flow_filtering.rst   |  51 ++--
 doc/guides/sample_app_ug/hello_world.rst      |   8 +-
 doc/guides/sample_app_ug/intro.rst            |  34 +--
 doc/guides/sample_app_ug/ip_frag.rst          |  46 +--
 doc/guides/sample_app_ug/ip_pipeline.rst      | 109 ++++----
 doc/guides/sample_app_ug/ip_reassembly.rst    |  57 ++--
 doc/guides/sample_app_ug/ipsec_secgw.rst      | 128 ++++-----
 doc/guides/sample_app_ug/ipv4_multicast.rst   |  26 +-
 doc/guides/sample_app_ug/keep_alive.rst       |  12 +-
 doc/guides/sample_app_ug/l2_forward_cat.rst   |  34 +--
 .../sample_app_ug/l2_forward_crypto.rst       |  70 +++--
 doc/guides/sample_app_ug/l2_forward_event.rst |  20 +-
 .../sample_app_ug/l2_forward_job_stats.rst    |  46 ++-
 .../sample_app_ug/l2_forward_macsec.rst       |  38 +--
 .../sample_app_ug/l2_forward_real_virtual.rst |   4 +-
 doc/guides/sample_app_ug/link_status_intr.rst |   2 +-
 doc/guides/sample_app_ug/multi_process.rst    |  49 ++--
 doc/guides/sample_app_ug/ntb.rst              |   4 +-
 doc/guides/sample_app_ug/packet_ordering.rst  |  42 +--
 doc/guides/sample_app_ug/pipeline.rst         |  26 +-
 doc/guides/sample_app_ug/ptp_tap_relay_sw.rst |   2 +-
 doc/guides/sample_app_ug/ptpclient.rst        |  51 ++--
 doc/guides/sample_app_ug/qos_metering.rst     |  11 +-
 doc/guides/sample_app_ug/qos_scheduler.rst    |  16 +-
 doc/guides/sample_app_ug/rxtx_callbacks.rst   |  11 +-
 doc/guides/sample_app_ug/server_node_efd.rst  |   4 +-
 doc/guides/sample_app_ug/service_cores.rst    |  67 ++---
 doc/guides/sample_app_ug/skeleton.rst         |   8 +-
 doc/guides/sample_app_ug/test_pipeline.rst    |  17 +-
 doc/guides/sample_app_ug/timer.rst            |  19 +-
 doc/guides/sample_app_ug/vdpa.rst             |  51 ++--
 doc/guides/sample_app_ug/vhost.rst            | 178 ++++++------
 doc/guides/sample_app_ug/vhost_blk.rst        |  66 +++--
 doc/guides/sample_app_ug/vhost_crypto.rst     |  64 ++---
 .../sample_app_ug/vm_power_management.rst     | 262 ++++++++----------
 .../sample_app_ug/vmdq_dcb_forwarding.rst     | 101 ++++---
 doc/guides/sample_app_ug/vmdq_forwarding.rst  |  38 ++-
 45 files changed, 1121 insertions(+), 1091 deletions(-)

-- 
2.53.0


^ permalink raw reply

* [PATCH 01/15] doc: cleanups to bbdev sample application
From: Stephen Hemminger @ 2026-06-11 21:18 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Nicolas Chautru
In-Reply-To: <20260611212119.1026721-1-stephen@networkplumber.org>

Semi-automated cleanup of wording of bbdev sample guide.

Refactored the bbdev sample application documentation for better clarity:
- Simplified the overview section with clearer flow description
- Improved formatting of command-line options using definition list
- Clarified hardware/software device requirements
- Enhanced example command explanation with bullet points
- Fixed grammatical issues and improved readability

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 doc/guides/sample_app_ug/bbdev_app.rst | 69 +++++++++++++-------------
 1 file changed, 34 insertions(+), 35 deletions(-)

diff --git a/doc/guides/sample_app_ug/bbdev_app.rst b/doc/guides/sample_app_ug/bbdev_app.rst
index a699e8a61d..00bbd1aa27 100644
--- a/doc/guides/sample_app_ug/bbdev_app.rst
+++ b/doc/guides/sample_app_ug/bbdev_app.rst
@@ -14,14 +14,13 @@ Overview
 --------
 
 The Baseband device sample application performs a loop-back operation using a
-baseband device capable of transceiving data packets.
-A packet is received on an ethernet port -> enqueued for downlink baseband
-operation -> dequeued from the downlink baseband device -> enqueued for uplink
-baseband operation -> dequeued from the baseband device -> then the received
-packet is compared with the baseband operations output. Then it's looped back to
-the ethernet port.
+baseband device capable of performing encoding and decoding operations.
+A packet is received on an Ethernet port, enqueued for downlink baseband
+operation, dequeued from the downlink baseband device, enqueued for uplink
+baseband operation, dequeued from the baseband device, compared with the
+expected output, and then transmitted back to the Ethernet port.
 
-*   The MAC header is preserved in the packet
+The MAC header is preserved in the packet throughout the loop-back operation.
 
 Limitations
 -----------
@@ -33,7 +32,7 @@ Compiling the Application
 
 DPDK needs to be built with ``baseband_turbo_sw`` PMD enabled along
 with ``FLEXRAN SDK`` Libraries. Refer to *SW Turbo Poll Mode Driver*
-documentation for more details on this.
+documentation for more details.
 
 To compile the sample application see :doc:`compiling`.
 
@@ -48,40 +47,41 @@ The application accepts a number of command line options:
     $ ./<build_dir>/examples/dpdk-bbdev [EAL options] -- [-e ENCODING_CORES] /
     [-d DECODING_CORES] [-p ETH_PORT_ID] [-b BBDEV_ID]
 
-where:
+Where:
+
+``-e ENCODING_CORES``
+   Hexadecimal bitmask specifying lcores for encoding operations (default: 0x2).
+
+``-d DECODING_CORES``
+   Hexadecimal bitmask specifying lcores for decoding operations (default: 0x4).
 
-* ``e ENCODING_CORES``: hexmask for encoding lcores (default = 0x2)
-* ``d DECODING_CORES``: hexmask for decoding lcores (default = 0x4)
-* ``p ETH_PORT_ID``: ethernet port ID (default = 0)
-* ``b BBDEV_ID``: BBDev ID (default = 0)
+``-p ETH_PORT_ID``
+   Ethernet port ID (default: 0).
 
-The application requires that baseband devices is capable of performing
-the specified baseband operation are available on application initialization.
-This means that HW baseband device/s must be bound to a DPDK driver or
-a SW baseband device/s (virtual BBdev) must be created (using --vdev).
+``-b BBDEV_ID``
+   Baseband device ID (default: 0).
 
-To run the application in linux environment with the turbo_sw baseband device
-using the allow option for pci device running on 1 encoding lcore and 1 decoding lcore
-issue the command:
+The application requires that baseband devices are capable of performing
+the specified baseband operations at initialization time. Hardware baseband
+devices must be bound to a DPDK driver, or software baseband devices (virtual
+BBdev) must be created using the ``--vdev`` option.
+
+To run the application in a Linux environment with the turbo_sw baseband device,
+using one encoding lcore and one decoding lcore:
 
 .. code-block:: console
 
     $ ./<build_dir>/examples/dpdk-bbdev --vdev='baseband_turbo_sw' -a <NIC0PCIADDR> \
     -l 3,4,5 --numa-mem=2,2 --file-prefix=bbdev -- -e 0x10 -d 0x20
 
-where, NIC0PCIADDR is the PCI address of the Rx port
-
-This command creates one virtual bbdev devices ``baseband_turbo_sw`` where the
-device gets linked to a corresponding ethernet port as allowed by
-the parameter -a.
-3 cores are allocated to the application, and assigned as:
-
- - core 3 is the main and used to print the stats live on screen,
+Where ``NIC0PCIADDR`` is the PCI address of the Ethernet port.
 
- - core 4 is the encoding lcore performing Rx and Turbo Encode operations
+This command creates one virtual BBdev device (``baseband_turbo_sw``) and
+allows access to the specified Ethernet port. Three cores are allocated:
 
- - core 5 is the downlink lcore performing Turbo Decode, validation and Tx
-   operations
+- Core 3: Main lcore, prints statistics to screen
+- Core 4: Encoding lcore, performs Rx and Turbo Encode operations
+- Core 5: Decoding lcore, performs Turbo Decode, validation, and Tx operations
 
 
 Refer to the *DPDK Getting Started Guide* for general information on running
@@ -91,9 +91,8 @@ Using Packet Generator with baseband device sample application
 --------------------------------------------------------------
 
 To allow the bbdev sample app to do the loopback, an influx of traffic is required.
-This can be done by using DPDK Pktgen to burst traffic on two ethernet ports, and
-it will print the transmitted along with the looped-back traffic on Rx ports.
-Executing the command below will generate traffic on the two allowed ethernet
+This can be done using DPDK Pktgen to generate traffic on Ethernet ports.
+Executing the command below will generate traffic on the allowed Ethernet
 ports.
 
 .. code-block:: console
@@ -111,5 +110,5 @@ where:
 * ``-P``: PROMISCUOUS mode
 
 
-Refer to *The Pktgen Application* documents for general information on running
+Refer to *The Pktgen Application* documentation for general information on running
 Pktgen with DPDK applications.
-- 
2.53.0


^ permalink raw reply related

* [PATCH 02/15] doc: cleanup cmd_line example documentation
From: Stephen Hemminger @ 2026-06-11 21:18 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger
In-Reply-To: <20260611212119.1026721-1-stephen@networkplumber.org>

Semi-automated review of sample application for cmd_line.

Enhanced the command line sample application guide:
- Simplified the production code warning note for clarity
- Converted command descriptions to definition list format
- Fixed typo in "Ethernet Address Token" description
- Clarified the parsing and callback mechanism description
- Improved overall readability and consistency

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 doc/guides/sample_app_ug/cmd_line.rst | 35 +++++++++++++++------------
 1 file changed, 19 insertions(+), 16 deletions(-)

diff --git a/doc/guides/sample_app_ug/cmd_line.rst b/doc/guides/sample_app_ug/cmd_line.rst
index e038667bd5..5b192bc615 100644
--- a/doc/guides/sample_app_ug/cmd_line.rst
+++ b/doc/guides/sample_app_ug/cmd_line.rst
@@ -13,29 +13,32 @@ Overview
 The Command Line sample application is a simple application that
 demonstrates the use of the command line interface in the DPDK.
 This application is a readline-like interface that can be used
-to debug a DPDK application in a Linux* application environment.
+to debug DPDK applications in a Linux application environment.
 
 .. note::
 
     The rte_cmdline library should not be used in production code since
     it is not validated to the same standard as other DPDK libraries.
-    See also the "rte_cmdline library should not be used in production code due to limited testing" item
-    in the "Known Issues" section of the Release Notes.
+    See also the Known Issues section of the Release Notes for the item
+    regarding limited testing of the rte_cmdline library.
 
 The Command Line sample application supports some of the features of the GNU readline library
 such as completion, cut/paste and other special bindings
-that make configuration and debug faster and easier.
+that make configuration and debugging faster and easier.
 
-The application shows how the ``cmdline`` library can be extended
+The application demonstrates how the ``cmdline`` library can be extended
 to handle a list of objects.
 
 There are three simple commands:
 
-*   add obj_name IP: Add a new object with an IP/IPv6 address associated to it.
+``add obj_name IP``
+   Add a new object with an IP/IPv6 address associated with it.
 
-*   del obj_name: Delete the specified object.
+``del obj_name``
+   Delete the specified object.
 
-*   show obj_name: Show the IP associated with the specified object.
+``show obj_name``
+   Show the IP associated with the specified object.
 
 .. note::
 
@@ -63,7 +66,7 @@ and the Environment Abstraction Layer (EAL) options.
 Explanation
 -----------
 
-The following sections provide explanation of the code.
+The following sections provide an explanation of the code.
 
 EAL Initialization and cmdline Start
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -84,7 +87,7 @@ Then, a new command line object is created and starts to interact with the user
     :end-before: >8 End of creating a new command line object.
     :dedent: 1
 
-The ``cmdline_interact()`` function returns when the user types **Ctrl-d** and,
+The ``cmdline_interact()`` function returns when the user types **Ctrl-d**, and
 in this case, the application exits.
 
 Defining a cmdline Context
@@ -102,15 +105,15 @@ Each command (of type cmdline_parse_inst_t) is defined statically.
 It contains a pointer to a callback function that is executed when the command is parsed,
 an opaque pointer, a help string and a list of tokens in a NULL-terminated table.
 
-The rte_cmdline application provides a list of pre-defined token types:
+The rte_cmdline library provides a list of predefined token types:
 
-*   String Token: Match a static string, a list of static strings or any string.
+*   String Token: Match a static string, a list of static strings, or any string.
 
-*   Number Token: Match a number that can be signed or unsigned, from 8-bit to 32-bit.
+*   Number Token: Match a number that can be signed or unsigned, from 8 bits to 32 bits.
 
 *   IP Address Token: Match an IPv4 or IPv6 address or network.
 
-*   Ethernet* Address Token: Match a MAC address.
+*   Ethernet Address Token: Match a MAC address.
 
 In this example, a new token type obj_list is defined and implemented
 in the parse_obj_list.c and parse_obj_list.h files.
@@ -128,5 +131,5 @@ This command is composed of two tokens:
 
 *   The second token is an object that was previously added using the add command in the global_obj_list variable.
 
-Once the command is parsed, the rte_cmdline application fills a cmd_obj_del_show_result structure.
-A pointer to this structure is given as an argument to the callback function and can be used in the body of this function.
+Once the command is parsed, the rte_cmdline library fills a cmd_obj_del_show_result structure
+and passes a pointer to it as an argument to the callback function.
-- 
2.53.0


^ permalink raw reply related

* [PATCH 03/15] doc: cleanup the distribution sample application guide
From: Stephen Hemminger @ 2026-06-11 21:18 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Nandini Persad
In-Reply-To: <20260611212119.1026721-1-stephen@networkplumber.org>

Fix punctuation, clarity, and removing repetition when necessary.

Signed-off-by: Nandini Persad <nandinipersad361@gmail.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 doc/guides/sample_app_ug/dist_app.rst | 52 +++++++++++++--------------
 1 file changed, 26 insertions(+), 26 deletions(-)

diff --git a/doc/guides/sample_app_ug/dist_app.rst b/doc/guides/sample_app_ug/dist_app.rst
index 30b4184d40..11496471ae 100644
--- a/doc/guides/sample_app_ug/dist_app.rst
+++ b/doc/guides/sample_app_ug/dist_app.rst
@@ -4,7 +4,7 @@
 Distributor Sample Application
 ==============================
 
-The distributor sample application is a simple example of packet distribution
+The distributor sample application is an example of packet distribution
 to cores using the Data Plane Development Kit (DPDK). It also makes use of
 Intel Speed Select Technology - Base Frequency (Intel SST-BF) to pin the
 distributor to the higher frequency core if available.
@@ -31,7 +31,7 @@ generator as shown in the figure below.
 Compiling the Application
 -------------------------
 
-To compile the sample application see :doc:`compiling`.
+To compile the sample application, see :doc:`compiling`.
 
 The application is located in the ``distributor`` sub-directory.
 
@@ -49,7 +49,7 @@ Running the Application
    *   -p PORTMASK: Hexadecimal bitmask of ports to configure
    *   -c: Combines the RX core with distribution core
 
-#. To run the application in linux environment with 10 lcores, 4 ports,
+#. To run the application in a Linux environment with 10 lcores, 4 ports,
    issue the command:
 
    ..  code-block:: console
@@ -64,19 +64,19 @@ Explanation
 
 The distributor application consists of four types of threads: a receive
 thread (``lcore_rx()``), a distributor thread (``lcore_dist()``), a set of
-worker threads (``lcore_worker()``), and a transmit thread(``lcore_tx()``).
+worker threads (``lcore_worker()``), and a transmit thread (``lcore_tx()``).
 How these threads work together is shown in :numref:`figure_dist_app` below.
-The ``main()`` function launches  threads of these four types.  Each thread
-has a while loop which will be doing processing and which is terminated
+The ``main()`` function launches threads of these four types. Each thread
+has a while loop that performs processing and is terminated
 only upon SIGINT or ctrl+C.
 
 The receive thread receives the packets using ``rte_eth_rx_burst()`` and will
-enqueue them to an rte_ring. The distributor thread will dequeue the packets
-from the ring and assign them to workers (using ``rte_distributor_process()`` API).
-This assignment is based on the tag (or flow ID) of the packet - indicated by
-the hash field in the mbuf. For IP traffic, this field is automatically filled
-by the NIC with the "usr" hash value for the packet, which works as a per-flow
-tag.  The distributor thread communicates with the worker threads using a
+enqueue them to an rte_ring. The distributor thread dequeues the packets
+from the ring and assigns them to workers using the ``rte_distributor_process()``
+API. This assignment is based on the tag (or flow ID) of the packet, indicated
+by the hash field in the mbuf. For IP traffic, this field is automatically
+filled by the NIC with the "user" hash value for the packet, which works as a
+per-flow tag. The distributor thread communicates with the worker threads using a
 cache-line swapping mechanism, passing up to 8 mbuf pointers at a time
 (one cache line) to each worker.
 
@@ -86,11 +86,11 @@ the distributor, doing a simple XOR operation on the input port mbuf field
 (to indicate the output port which will be used later for packet transmission)
 and then finally returning the packets back to the distributor thread.
 
-The distributor thread will then call the distributor api
-``rte_distributor_returned_pkts()`` to get the processed packets, and will enqueue
-them to another rte_ring for transfer to the TX thread for transmission on the
-output port. The transmit thread will dequeue the packets from the ring and
-transmit them on the output port specified in packet mbuf.
+The distributor thread will then call the distributor API
+``rte_distributor_returned_pkts()`` to get the processed packets and enqueue
+them to another rte_ring for transfer to the TX thread. The transmit thread
+dequeues the packets from the ring and transmits them on the output port
+specified in the packet mbuf.
 
 Users who wish to terminate the running of the application have to press ctrl+C
 (or send SIGINT to the app). Upon this signal, a signal handler provided
@@ -105,29 +105,29 @@ final statistics to the user.
 
 
 Intel SST-BF Support
---------------------
+~~~~~~~~~~~~~~~~~~~~
 
 In DPDK 19.05, support was added to the power management library for
-Intel-SST-BF, a technology that allows some cores to run at a higher
+Intel SST-BF, a technology that allows some cores to run at a higher
 frequency than others. An application note for Intel SST-BF is available,
 and is entitled
 `Intel Speed Select Technology – Base Frequency - Enhancing Performance <https://builders.intel.com/docs/networkbuilders/intel-speed-select-technology-base-frequency-enhancing-performance.pdf>`_
 
 The distributor application was also enhanced to be aware of these higher
-frequency SST-BF cores, and when starting the application, if high frequency
+frequency SST-BF cores. When starting the application, if high frequency
 SST-BF cores are present in the core mask, the application will identify these
 cores and pin the workloads appropriately. The distributor core is usually
 the bottleneck, so this is given first choice of the high frequency SST-BF
-cores, followed by the rx core and the tx core.
+cores, followed by the Rx core and the Tx core.
 
 Debug Logging Support
----------------------
+~~~~~~~~~~~~~~~~~~~~~
 
 Debug logging is provided as part of the application; the user needs to uncomment
 the line "#define DEBUG" defined in start of the application in main.c to enable debug logs.
 
 Statistics
-----------
+~~~~~~~~~~
 
 The main function will print statistics on the console every second. These
 statistics include the number of packets enqueued and dequeued at each stage
@@ -135,7 +135,7 @@ in the application, and also key statistics per worker, including how many
 packets of each burst size (1-8) were sent to each worker thread.
 
 Application Initialization
---------------------------
+~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Command line parsing is done in the same way as it is done in the L2 Forwarding Sample
 Application. See :ref:`l2_fwd_app_cmd_arguments`.
@@ -146,8 +146,8 @@ Sample Application. See :ref:`l2_fwd_app_mbuf_init`.
 Driver Initialization is done in same way as it is done in the L2 Forwarding Sample
 Application. See :ref:`l2_fwd_app_dvr_init`.
 
-RX queue initialization is done in the same way as it is done in the L2 Forwarding
+Rx queue initialization is done in the same way as it is done in the L2 Forwarding
 Sample Application. See :ref:`l2_fwd_app_rx_init`.
 
-TX queue initialization is done in the same way as it is done in the L2 Forwarding
+Tx queue initialization is done in the same way as it is done in the L2 Forwarding
 Sample Application. See :ref:`l2_fwd_app_tx_init`.
-- 
2.53.0


^ permalink raw reply related

* [PATCH 04/15] doc: improve structure and clarity of compiling guide
From: Stephen Hemminger @ 2026-06-11 21:18 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger
In-Reply-To: <20260611212119.1026721-1-stephen@networkplumber.org>

Restructured the sample applications compiling guide:
- Added clearer section headers with proper capitalization
- Improved command block formatting and indentation consistency
- Added setup instructions for build directory creation
- Clarified the distinction between meson and make approaches
- Enhanced explanations with better context for each step
- Added note about flexible build directory naming
- Improved overall document flow and readability

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 doc/guides/sample_app_ug/compiling.rst | 84 ++++++++++++++++----------
 1 file changed, 51 insertions(+), 33 deletions(-)

diff --git a/doc/guides/sample_app_ug/compiling.rst b/doc/guides/sample_app_ug/compiling.rst
index adde775d4e..a68a9e713c 100644
--- a/doc/guides/sample_app_ug/compiling.rst
+++ b/doc/guides/sample_app_ug/compiling.rst
@@ -5,79 +5,97 @@ Compiling the Sample Applications
 =================================
 
 This section explains how to compile the DPDK sample applications.
+Sample applications are located in ``dpdk/examples/``.
 
-To compile all the sample applications
---------------------------------------
+To Compile All the Sample Applications
+---------------------------------------
 
-Go to DPDK build directory:
+Set up the build directory (if not already done):
 
-    .. code-block:: console
+.. code-block:: console
 
-       cd dpdk/<build_dir>
+   cd dpdk
+   meson setup build
 
-Enable examples compilation:
+.. note::
 
-   .. code-block:: console
+   The build directory name (``build`` in this example) can be chosen freely.
+   Replace ``<build_dir>`` in subsequent commands with your chosen directory name.
 
-      meson configure -Dexamples=all
+Go to the build directory:
 
-Build:
+.. code-block:: console
 
-   .. code-block:: console
+   cd build
 
-      ninja
+.. code-block:: console
+
+   meson configure -Dexamples=all
+
+Compile:
+
+.. code-block:: console
+
+   ninja
 
 For additional information on compiling see
 :ref:`Compiling DPDK on Linux <linux_gsg_compiling_dpdk>` or
 :ref:`Compiling DPDK on FreeBSD <building_from_source>`.
-Applications are output to: ``dpdk/<build_dir>/examples``.
 
+Compiled applications are output to ``dpdk/<build_dir>/examples``.
 
-To compile a single application
--------------------------------
 
+To Compile a Single Application
+--------------------------------
+
+A single application can be compiled using meson during the DPDK build,
+or standalone using make with an installed DPDK.
 
 Using meson
 ~~~~~~~~~~~
 
-Go to DPDK build directory:
+Go to the build directory (after ``meson setup`` as shown above):
 
-    .. code-block:: console
+.. code-block:: console
 
-       cd dpdk/<build_dir>
+   cd dpdk/build
 
 Enable example app compilation:
 
-   .. code-block:: console
+.. code-block:: console
+
+   meson configure -Dexamples=helloworld
+
+Compile:
 
-      meson configure -Dexamples=helloworld
+.. code-block:: console
 
-Build:
+   ninja
 
-   .. code-block:: console
 
-      ninja
+Using make (standalone)
+~~~~~~~~~~~~~~~~~~~~~~~
 
+To compile a sample application standalone using make, DPDK must first
+be installed on the system and pkg-config must be configured.
+See :ref:`building_app_using_installed_dpdk` for installation instructions.
 
-Using Make
-~~~~~~~~~~
+Go to the sample application directory:
 
-Pkg-config is used when building an example app standalone using make, please
-see :ref:`building_app_using_installed_dpdk` for more information.
+.. code-block:: console
 
-Go to the sample application directory. Unless otherwise specified the sample
-applications are located in ``dpdk/examples/``.
+   cd dpdk/examples/helloworld
 
 Build the application:
 
-    .. code-block:: console
+.. code-block:: console
 
-        make
+   make
 
 To build the application for debugging use the ``DEBUG`` option.
 This option adds some extra flags, disables compiler optimizations and
-sets verbose output.
+sets verbose output:
 
-    .. code-block:: console
+.. code-block:: console
 
-       make DEBUG=1
+   make DEBUG=1
-- 
2.53.0


^ permalink raw reply related

* [PATCH 05/15] doc: improve clarity and consistency in DMA sample app guide
From: Stephen Hemminger @ 2026-06-11 21:18 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Chengwen Feng, Kevin Laatz, Bruce Richardson
In-Reply-To: <20260611212119.1026721-1-stephen@networkplumber.org>

Enhanced the DMA sample application documentation:
- Simplified MAC address modification description using bullet points
- Improved grammar and readability throughout
- Standardized terminology (DMAdev, Tx/Rx port formatting)
- Fixed article usage and clarified technical explanations
- Enhanced sentence structure for better flow
- Corrected minor grammatical issues and typos

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 doc/guides/sample_app_ug/dma.rst | 60 +++++++++++++++-----------------
 1 file changed, 29 insertions(+), 31 deletions(-)

diff --git a/doc/guides/sample_app_ug/dma.rst b/doc/guides/sample_app_ug/dma.rst
index 9605996c6c..484fe27d92 100644
--- a/doc/guides/sample_app_ug/dma.rst
+++ b/doc/guides/sample_app_ug/dma.rst
@@ -13,16 +13,15 @@ This sample is intended as a demonstration of the basic components of a DPDK
 forwarding application and an example of how to use the DMAdev API to make a packet
 copy application.
 
-Also, while forwarding, the MAC addresses are affected as follows:
+While forwarding, the application modifies MAC addresses:
 
-*   The source MAC address is replaced by the TX port MAC address
+*   Source MAC address: replaced with the Tx port MAC address
+*   Destination MAC address: replaced with ``02:00:00:00:00:TX_PORT_ID``
 
-*   The destination MAC address is replaced by  02:00:00:00:00:TX_PORT_ID
-
-This application can be used to compare performance of using software packet
-copy with copy done using a DMA device for different sizes of packets.
-The example will print out statistics each second. The stats shows
-received/send packets and packets dropped or failed to copy.
+This application can be used to compare the performance of using software packet
+copy versus DMA device copy for different packet sizes.
+The application prints statistics at a configurable interval.
+The statistics show received/sent packets and packets dropped or failed to copy.
 
 Compiling the Application
 -------------------------
@@ -35,7 +34,7 @@ The application is located in the ``dma`` sub-directory.
 Running the Application
 -----------------------
 
-In order to run the hardware copy application, the copying device
+To run the hardware copy application, the copying device
 needs to be bound to user-space IO driver.
 
 Refer to the :doc:`../prog_guide/dmadev` for information on using the library.
@@ -54,10 +53,10 @@ where,
 *   q NQ: Number of Rx queues used per port equivalent to DMA channels
     per port (default is 1)
 
-*   c CT: Performed packet copy type: software (sw) or hardware using
+*   c CT: Packet copy type: software (``sw``) or hardware using
     DMA (hw) (default is hw)
 
-*   s RS: Size of dmadev descriptor ring for hardware copy mode or rte_ring for
+*   s RS: Size of DMAdev descriptor ring for hardware copy mode or rte_ring for
     software copy mode (default is 2048)
 
 *   --[no-]mac-updating: Whether MAC address of packets should be changed
@@ -71,9 +70,9 @@ where,
 
 The application can be launched in various configurations depending on the
 provided parameters. The app can use up to 2 lcores: one of them receives
-incoming traffic and makes a copy of each packet. The second lcore then
+incoming traffic and makes a copy of each packet, and the second lcore
 updates the MAC address and sends the copy. If one lcore per port is used,
-both operations are done sequentially. For each configuration, an additional
+both operations are performed sequentially. For each configuration, an additional
 lcore is needed since the main lcore does not handle traffic but is
 responsible for configuration, statistics printing and safe shutdown of
 all ports and devices.
@@ -89,7 +88,7 @@ updating issue the command:
     $ ./<build_dir>/examples/dpdk-dma -l 0-2 -n 2 -- -p 0x1 --mac-updating -c sw
 
 To run the application in a Linux environment with 2 lcores (the main lcore,
-plus one forwarding core), 2 ports (ports 0 and 1), hardware copying and no MAC
+plus one forwarding core), 2 ports (ports 0 and 1), hardware copying, and no MAC
 updating issue the command:
 
 .. code-block:: console
@@ -146,7 +145,7 @@ The ``main()`` function also initializes the ports:
     :end-before: >8 End of initializing each port.
     :dedent: 1
 
-Each port is configured using ``port_init()`` function. The Ethernet
+Each port is configured using the ``port_init()`` function. The Ethernet
 ports are configured with local settings using the ``rte_eth_dev_configure()``
 function and the ``port_conf`` struct. The RSS is enabled so that
 multiple Rx queues could be used for packet receiving and copying by
@@ -198,7 +197,7 @@ and HW copy modes.
     :dedent: 0
 
 
-When using hardware copy each Rx queue of the port is assigned a DMA device
+When using hardware copy, each Rx queue of the port is assigned a DMA device
 (``assign_dmadevs()``) using DMAdev library API functions:
 
 .. literalinclude:: ../../../examples/dma/dmafwd.c
@@ -220,7 +219,7 @@ using ``rte_dma_start()`` function. Each of the above operations is done in
     :end-before: >8 End of configuration of device.
     :dedent: 0
 
-If initialization is successful, memory for hardware device
+If initialization is successful, memory for the hardware device
 statistics is allocated.
 
 Finally, the ``main()`` function starts all packet handling lcores and starts
@@ -228,7 +227,7 @@ printing stats in a loop on the main lcore. The application can be
 interrupted and closed using ``Ctrl-C``. The main lcore waits for
 all worker lcores to finish, deallocates resources and exits.
 
-The processing lcores launching function are described below.
+The processing lcore launching functions are described below.
 
 The Lcores Launching Functions
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -259,10 +258,10 @@ corresponding to ports and lcores configuration provided by the user.
 The Lcores Processing Functions
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-For receiving packets on each port, the ``dma_rx_port()`` function is used.
-The function receives packets on each configured Rx queue. Depending on the
+The ``dma_rx_port()`` function receives packets on each port.
+It receives packets on each configured Rx queue. Depending on the
 mode the user chose, it will enqueue packets to DMA channels and
-then invoke copy process (hardware copy), or perform software copy of each
+then invoke the copy process (hardware copy), or perform software copy of each
 packet using ``pktmbuf_sw_copy()`` function and enqueue them to an rte_ring:
 
 .. literalinclude:: ../../../examples/dma/dmafwd.c
@@ -271,13 +270,13 @@ packet using ``pktmbuf_sw_copy()`` function and enqueue them to an rte_ring:
     :end-before: >8 End of receive packets on one port and enqueue to dmadev or rte_ring.
     :dedent: 0
 
-The packets are received in burst mode using ``rte_eth_rx_burst()``
-function. When using hardware copy mode the packets are enqueued in the
-copying device's buffer using ``dma_enqueue_packets()`` which calls
+Packets are received in burst mode using the ``rte_eth_rx_burst()``
+function. When using hardware copy mode, packets are enqueued in the
+DMA device buffer using ``dma_enqueue_packets()``, which calls
 ``rte_dma_copy()``. When all received packets are in the
 buffer, the copy operations are started by calling ``rte_dma_submit()``.
-Function ``rte_dma_copy()`` operates on physical address of
-the packet. Structure ``rte_mbuf`` contains only physical address to
+The ``rte_dma_copy()`` function operates on the physical address of
+the packet. The ``rte_mbuf`` structure contains only the physical address to the
 start of the data buffer (``buf_iova``). Thus, the ``rte_pktmbuf_iova()`` API is
 used to get the address of the start of the data within the mbuf.
 
@@ -287,12 +286,11 @@ used to get the address of the start of the data within the mbuf.
     :end-before: >8 End of receive packets on one port and enqueue to dmadev or rte_ring.
     :dedent: 0
 
-
-Once the copies have been completed (this includes gathering the completions in
+Once the copies have been completed (which includes gathering the completions in
 HW copy mode), the copied packets are enqueued to the ``rx_to_tx_ring``, which
 is used to pass the packets to the Tx function.
 
-All completed copies are processed by ``dma_tx_port()`` function. This function
+All completed copies are processed by the ``dma_tx_port()`` function. This function
 dequeues copied packets from the ``rx_to_tx_ring``. Then, each packet MAC address is changed
 if it was enabled. After that, copies are sent in burst mode using ``rte_eth_tx_burst()``.
 
@@ -306,7 +304,7 @@ if it was enabled. After that, copies are sent in burst mode using ``rte_eth_tx_
 The Packet Copying Functions
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-In order to perform SW packet copy, there are user-defined functions to the first copy
+To perform software packet copy, the user-defined functions first copy
 the packet metadata (``pktmbuf_metadata_copy()``) and then the packet data
 (``pktmbuf_sw_copy()``):
 
@@ -319,5 +317,5 @@ the packet metadata (``pktmbuf_metadata_copy()``) and then the packet data
 The metadata in this example is copied from ``rx_descriptor_fields1`` marker of
 ``rte_mbuf`` struct up to ``buf_len`` member.
 
-In order to understand why software packet copying is done as shown
+To understand why software packet copying is performed as shown
 above, please refer to the :doc:`../prog_guide/mbuf_lib`.
-- 
2.53.0


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox