* [PATCH v21 17/25] net/pcap: reject non-Ethernet interfaces
From: Stephen Hemminger @ 2026-03-25 2:37 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger, stable
In-Reply-To: <20260325024018.1275209-1-stephen@networkplumber.org>
The pcap PMD sends and receives raw Ethernet frames. If used with
an interface that has a different link type, packets will be malformed.
On FreeBSD and macOS, the loopback interface uses DLT_NULL which expects
a 4-byte address family header instead of an Ethernet header. Sending
Ethernet frames to such interfaces causes kernel warnings like:
looutput: af=-1 unexpected
Add a check after pcap_activate() to verify the interface uses
DLT_EN10MB (Ethernet) link type and reject others with a clear error.
Fixes: 4c173302c307 ("pcap: add new driver")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
drivers/net/pcap/pcap_ethdev.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/net/pcap/pcap_ethdev.c b/drivers/net/pcap/pcap_ethdev.c
index 86a7d22cc6..dd640a82c4 100644
--- a/drivers/net/pcap/pcap_ethdev.c
+++ b/drivers/net/pcap/pcap_ethdev.c
@@ -638,6 +638,17 @@ open_iface_live(const char *iface, pcap_t **pcap)
goto error;
}
+ /*
+ * Verify interface supports Ethernet link type.
+ * Loopback on FreeBSD/macOS uses DLT_NULL which expects a 4-byte
+ * address family header instead of Ethernet, causing kernel warnings.
+ */
+ if (pcap_datalink(pc) != DLT_EN10MB) {
+ PMD_LOG(ERR, "%s: not Ethernet (link type %d)",
+ iface, pcap_datalink(pc));
+ goto error;
+ }
+
if (pcap_setnonblock(pc, 1, errbuf)) {
PMD_LOG(ERR, "Couldn't set non-blocking on %s: %s", iface, errbuf);
goto error;
--
2.53.0
^ permalink raw reply related
* [PATCH v21 18/25] net/pcap: reduce scope of file-level variables
From: Stephen Hemminger @ 2026-03-25 2:37 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger, Marat Khalili, Bruce Richardson
In-Reply-To: <20260325024018.1275209-1-stephen@networkplumber.org>
Move errbuf from file scope to local variables in the two functions that
use it (open_iface_live and open_single_rx_pcap). This avoids potential
issues if these functions were called concurrently, since each call now
has its own error buffer. Move iface_idx to a static local variable
within pmd_init_internals(), the only function that uses it. The
variable remains static to preserve the MAC address uniqueness counter
across calls.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Marat Khalili <marat.khalili@huawei.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
drivers/net/pcap/pcap_ethdev.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/pcap/pcap_ethdev.c b/drivers/net/pcap/pcap_ethdev.c
index dd640a82c4..0ac8b90ce3 100644
--- a/drivers/net/pcap/pcap_ethdev.c
+++ b/drivers/net/pcap/pcap_ethdev.c
@@ -46,11 +46,9 @@
#define RTE_PMD_PCAP_MAX_QUEUES 16
-static char errbuf[PCAP_ERRBUF_SIZE];
static struct timespec start_time;
static uint64_t start_cycles;
static uint64_t hz;
-static uint8_t iface_idx;
static uint64_t timestamp_rx_dynflag;
static int timestamp_dynfield_offset = -1;
@@ -596,6 +594,7 @@ eth_pcap_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
static inline int
open_iface_live(const char *iface, pcap_t **pcap)
{
+ char errbuf[PCAP_ERRBUF_SIZE];
pcap_t *pc;
int status;
@@ -707,6 +706,8 @@ open_single_tx_pcap(const char *pcap_filename, pcap_dumper_t **dumper)
static int
open_single_rx_pcap(const char *pcap_filename, pcap_t **pcap)
{
+ char errbuf[PCAP_ERRBUF_SIZE];
+
*pcap = pcap_open_offline_with_tstamp_precision(pcap_filename,
PCAP_TSTAMP_PRECISION_NANO, errbuf);
if (*pcap == NULL) {
@@ -1464,6 +1465,7 @@ pmd_init_internals(struct rte_vdev_device *vdev,
* derived from: 'locally administered':'p':'c':'a':'p':'iface_idx'
* where the middle 4 characters are converted to hex.
*/
+ static uint8_t iface_idx;
(*internals)->eth_addr = (struct rte_ether_addr) {
.addr_bytes = { 0x02, 0x70, 0x63, 0x61, 0x70, iface_idx++ }
};
--
2.53.0
^ permalink raw reply related
* [PATCH v21 19/25] net/pcap: clarify maximum received packet
From: Stephen Hemminger @ 2026-03-25 2:37 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger, Bruce Richardson
In-Reply-To: <20260325024018.1275209-1-stephen@networkplumber.org>
The driver has constant RTE_ETH_PCAP_SNAPSHOT_LEN with is set
to the largest value the pcap library will return, so that should
also be the largest receive buffer.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
drivers/net/pcap/pcap_ethdev.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/pcap/pcap_ethdev.c b/drivers/net/pcap/pcap_ethdev.c
index 0ac8b90ce3..0da3062d2f 100644
--- a/drivers/net/pcap/pcap_ethdev.c
+++ b/drivers/net/pcap/pcap_ethdev.c
@@ -890,10 +890,11 @@ eth_dev_info(struct rte_eth_dev *dev,
dev_info->if_index = internals->if_index;
dev_info->max_mac_addrs = 1;
- dev_info->max_rx_pktlen = (uint32_t) -1;
+ dev_info->max_rx_pktlen = RTE_ETH_PCAP_SNAPSHOT_LEN;
dev_info->max_rx_queues = dev->data->nb_rx_queues;
dev_info->max_tx_queues = dev->data->nb_tx_queues;
dev_info->min_rx_bufsize = 0;
+ dev_info->max_mtu = RTE_ETH_PCAP_SNAPSHOT_LEN - RTE_ETHER_HDR_LEN;
dev_info->tx_offload_capa = RTE_ETH_TX_OFFLOAD_MULTI_SEGS |
RTE_ETH_TX_OFFLOAD_VLAN_INSERT;
dev_info->rx_offload_capa = RTE_ETH_RX_OFFLOAD_VLAN_STRIP |
--
2.53.0
^ permalink raw reply related
* [PATCH v21 20/25] eal/windows: add wrapper for access function
From: Stephen Hemminger @ 2026-03-25 2:37 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger, Bruce Richardson, Dmitry Kozlyuk
In-Reply-To: <20260325024018.1275209-1-stephen@networkplumber.org>
Like other Posix functions in unistd.h add wrapper
using the Windows equivalent.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
lib/eal/windows/include/rte_os_shim.h | 1 +
lib/eal/windows/include/unistd.h | 7 +++++++
2 files changed, 8 insertions(+)
diff --git a/lib/eal/windows/include/rte_os_shim.h b/lib/eal/windows/include/rte_os_shim.h
index f16b2230c8..44664a5062 100644
--- a/lib/eal/windows/include/rte_os_shim.h
+++ b/lib/eal/windows/include/rte_os_shim.h
@@ -33,6 +33,7 @@
#define unlink(path) _unlink(path)
#define fileno(f) _fileno(f)
#define isatty(fd) _isatty(fd)
+#define access(path, mode) _access(path, mode)
#define IPVERSION 4
diff --git a/lib/eal/windows/include/unistd.h b/lib/eal/windows/include/unistd.h
index 78150c6480..f95888f4e1 100644
--- a/lib/eal/windows/include/unistd.h
+++ b/lib/eal/windows/include/unistd.h
@@ -23,4 +23,11 @@
#define STDERR_FILENO _fileno(stderr)
#endif
+/* Mode values for the _access() function. */
+#ifndef F_OK
+#define F_OK 0 /* test for existence of file */
+#define W_OK 0x02 /* test for write permission */
+#define R_OK 0x04 /* test for read permission */
+#endif
+
#endif /* _UNISTD_H_ */
--
2.53.0
^ permalink raw reply related
* [PATCH v21 21/25] net/pcap: add snapshot length devarg
From: Stephen Hemminger @ 2026-03-25 2:37 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger, Bruce Richardson
In-Reply-To: <20260325024018.1275209-1-stephen@networkplumber.org>
Add a new devarg 'snaplen' to configure the pcap snapshot length,
which controls the maximum packet size for capture and output.
The snapshot length affects:
- The pcap_set_snaplen() call when capturing from interfaces
- The pcap_open_dead() snapshot parameter for output files
- The reported max_rx_pktlen in device info
- The reported max_mtu in device info (snaplen - ethernet header)
The default value is 65535 bytes, preserving backward compatibility
with previous driver behavior.
The snaplen argument is parsed before interface and file arguments
so that its value is available when pcap handles are opened during
device creation.
Example usage:
--vdev 'net_pcap0,snaplen=1518,iface=eth0'
--vdev 'net_pcap0,snaplen=9000,rx_pcap=in.pcap,tx_pcap=out.pcap'
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
doc/guides/nics/pcap.rst | 17 ++
doc/guides/rel_notes/release_26_03.rst | 1 +
drivers/net/pcap/pcap_ethdev.c | 208 +++++++++++++++++--------
3 files changed, 157 insertions(+), 69 deletions(-)
diff --git a/doc/guides/nics/pcap.rst b/doc/guides/nics/pcap.rst
index 2709c6d017..2754e205c7 100644
--- a/doc/guides/nics/pcap.rst
+++ b/doc/guides/nics/pcap.rst
@@ -15,6 +15,23 @@ For more information about the pcap library, see the
The pcap-based PMD requires the libpcap development files to be installed.
This applies to all supported operating systems: Linux, FreeBSD, and Windows.
+* Set the snapshot length for packet capture
+
+ The snapshot length controls the maximum number of bytes captured per packet.
+ This affects both interface capture and pcap file output. The default value is
+ 65535 bytes, which captures complete packets up to the maximum Ethernet jumbo
+ frame size. Reducing this value can improve performance when only packet headers
+ are needed.
+
+ The ``snaplen`` argument is used when opening capture handles, so it should
+ be specified before the interface or file arguments. Example::
+
+ --vdev 'net_pcap0,snaplen=1518,iface=eth0'
+ --vdev 'net_pcap0,snaplen=9000,rx_pcap=in.pcap,tx_pcap=out.pcap'
+
+ The snapshot length also determines the reported ``max_rx_pktlen``
+ and ``max_mtu`` in device info.
+
Using the Driver from the EAL Command Line
------------------------------------------
diff --git a/doc/guides/rel_notes/release_26_03.rst b/doc/guides/rel_notes/release_26_03.rst
index b46402064f..869084d4cd 100644
--- a/doc/guides/rel_notes/release_26_03.rst
+++ b/doc/guides/rel_notes/release_26_03.rst
@@ -141,6 +141,7 @@ New Features
* Added support for VLAN insertion and stripping.
* Added support for reporting link state in ``iface`` mode.
* Receive timestamps support nanosecond precision.
+ * Added ``snaplen`` devarg to configure packet capture snapshot length.
Removed Items
diff --git a/drivers/net/pcap/pcap_ethdev.c b/drivers/net/pcap/pcap_ethdev.c
index 0da3062d2f..2de9c85124 100644
--- a/drivers/net/pcap/pcap_ethdev.c
+++ b/drivers/net/pcap/pcap_ethdev.c
@@ -13,6 +13,8 @@
#include <errno.h>
#include <sys/time.h>
#include <sys/types.h>
+#include <unistd.h>
+
#include <pcap.h>
#include <rte_cycles.h>
@@ -31,8 +33,6 @@
#include "pcap_osdep.h"
-#define RTE_ETH_PCAP_SNAPSHOT_LEN 65535
-
#define ETH_PCAP_RX_PCAP_ARG "rx_pcap"
#define ETH_PCAP_TX_PCAP_ARG "tx_pcap"
#define ETH_PCAP_RX_IFACE_ARG "rx_iface"
@@ -41,6 +41,12 @@
#define ETH_PCAP_IFACE_ARG "iface"
#define ETH_PCAP_PHY_MAC_ARG "phy_mac"
#define ETH_PCAP_INFINITE_RX_ARG "infinite_rx"
+#define ETH_PCAP_SNAPSHOT_LEN_ARG "snaplen"
+
+#define ETH_PCAP_SNAPSHOT_LEN_DEFAULT 65535
+
+/* This is defined in libpcap but not exposed in headers */
+#define ETH_PCAP_MAXIMUM_SNAPLEN 262144
#define ETH_PCAP_ARG_MAXLEN 64
@@ -101,6 +107,7 @@ struct pmd_internals {
char devargs[ETH_PCAP_ARG_MAXLEN];
struct rte_ether_addr eth_addr;
int if_index;
+ uint32_t snapshot_len;
bool single_iface;
bool phy_mac;
bool infinite_rx;
@@ -119,15 +126,18 @@ struct pmd_devargs {
bool phy_mac;
struct devargs_queue {
pcap_dumper_t *dumper;
+ /* pcap and name/type fields... */
pcap_t *pcap;
const char *name;
const char *type;
} queue[RTE_PMD_PCAP_MAX_QUEUES];
+ uint32_t snapshot_len;
};
struct pmd_devargs_all {
struct pmd_devargs rx_queues;
struct pmd_devargs tx_queues;
+ uint32_t snapshot_len;
bool single_iface;
bool is_tx_pcap;
bool is_tx_iface;
@@ -145,6 +155,7 @@ static const char *valid_arguments[] = {
ETH_PCAP_IFACE_ARG,
ETH_PCAP_PHY_MAC_ARG,
ETH_PCAP_INFINITE_RX_ARG,
+ ETH_PCAP_SNAPSHOT_LEN_ARG,
NULL
};
@@ -447,20 +458,19 @@ eth_pcap_tx_prepare(void *queue __rte_unused, struct rte_mbuf **tx_pkts, uint16_
static uint16_t
eth_pcap_tx_dumper(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
{
- unsigned int i;
- struct pmd_process_private *pp;
struct pcap_tx_queue *dumper_q = queue;
+ struct rte_eth_dev *dev = &rte_eth_devices[dumper_q->port_id];
+ struct pmd_internals *internals = dev->data->dev_private;
+ struct pmd_process_private *pp = dev->process_private;
+ pcap_dumper_t *dumper = pp->tx_dumper[dumper_q->queue_id];
+ unsigned char *temp_data = dumper_q->bounce_buf;
+ uint32_t snaplen = internals->snapshot_len;
uint16_t num_tx = 0;
uint32_t tx_bytes = 0;
struct pcap_pkthdr header;
- pcap_dumper_t *dumper;
- unsigned char *temp_data;
-
- pp = rte_eth_devices[dumper_q->port_id].process_private;
- dumper = pp->tx_dumper[dumper_q->queue_id];
- temp_data = dumper_q->bounce_buf;
+ unsigned int i;
- if (dumper == NULL || nb_pkts == 0)
+ if (unlikely(dumper == NULL))
return 0;
/* all packets in burst have same timestamp */
@@ -473,7 +483,7 @@ eth_pcap_tx_dumper(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
const uint8_t *data;
len = rte_pktmbuf_pkt_len(mbuf);
- caplen = RTE_MIN(len, RTE_ETH_PCAP_SNAPSHOT_LEN);
+ caplen = RTE_MIN(len, snaplen);
header.len = len;
header.caplen = caplen;
@@ -539,19 +549,18 @@ eth_tx_drop(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
static uint16_t
eth_pcap_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
{
- unsigned int i;
- struct pmd_process_private *pp;
struct pcap_tx_queue *tx_queue = queue;
+ struct rte_eth_dev *dev = &rte_eth_devices[tx_queue->port_id];
+ struct pmd_process_private *pp = dev->process_private;
+ struct pmd_internals *internals = dev->data->dev_private;
+ uint32_t snaplen = internals->snapshot_len;
+ pcap_t *pcap = pp->tx_pcap[tx_queue->queue_id];
+ unsigned char *temp_data = tx_queue->bounce_buf;
uint16_t num_tx = 0;
uint32_t tx_bytes = 0;
- pcap_t *pcap;
- unsigned char *temp_data;
-
- pp = rte_eth_devices[tx_queue->port_id].process_private;
- pcap = pp->tx_pcap[tx_queue->queue_id];
- temp_data = tx_queue->bounce_buf;
+ unsigned int i;
- if (unlikely(nb_pkts == 0 || pcap == NULL))
+ if (unlikely(pcap == NULL))
return 0;
for (i = 0; i < nb_pkts; i++) {
@@ -559,13 +568,16 @@ eth_pcap_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
uint32_t len = rte_pktmbuf_pkt_len(mbuf);
const uint8_t *data;
- if (unlikely(!rte_pktmbuf_is_contiguous(mbuf) &&
- len > RTE_ETH_PCAP_SNAPSHOT_LEN)) {
+ /*
+ * multi-segment transmit that has to go through bounce buffer.
+ * Make sure it fits; don't want to truncate the packet.
+ */
+ if (unlikely(!rte_pktmbuf_is_contiguous(mbuf) && len > snaplen)) {
PMD_TX_LOG(ERR,
- "Dropping multi segment PCAP packet. Size (%u) > max size (%u).",
- len, RTE_ETH_PCAP_SNAPSHOT_LEN);
- tx_queue->tx_stat.err_pkts++;
+ "Multi segment len (%u) > snaplen (%u)",
+ len, snaplen);
rte_pktmbuf_free(mbuf);
+ tx_queue->tx_stat.err_pkts++;
continue;
}
@@ -592,7 +604,7 @@ eth_pcap_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
* pcap_open_live wrapper function
*/
static inline int
-open_iface_live(const char *iface, pcap_t **pcap)
+open_iface_live(const char *iface, pcap_t **pcap, uint32_t snaplen)
{
char errbuf[PCAP_ERRBUF_SIZE];
pcap_t *pc;
@@ -609,6 +621,9 @@ open_iface_live(const char *iface, pcap_t **pcap)
PMD_LOG(ERR, "%s: Could not set to ns precision: %s",
iface, pcap_statustostr(status));
goto error;
+ } else if (status > 0) {
+ /* Warning condition - log but continue */
+ PMD_LOG(WARNING, "%s: %s", iface, pcap_statustostr(status));
}
status = pcap_set_immediate_mode(pc, 1);
@@ -621,7 +636,7 @@ open_iface_live(const char *iface, pcap_t **pcap)
PMD_LOG(WARNING, "%s: Could not set to promiscuous: %s",
iface, pcap_statustostr(status));
- status = pcap_set_snaplen(pc, RTE_ETH_PCAP_SNAPSHOT_LEN);
+ status = pcap_set_snaplen(pc, snaplen);
if (status != 0)
PMD_LOG(WARNING, "%s: Could not set snapshot length: %s",
iface, pcap_statustostr(status));
@@ -635,6 +650,9 @@ open_iface_live(const char *iface, pcap_t **pcap)
else
PMD_LOG(ERR, "%s: %s (%s)", iface, pcap_statustostr(status), cp);
goto error;
+ } else if (status > 0) {
+ /* Warning condition - log but continue */
+ PMD_LOG(WARNING, "%s: %s", iface, pcap_statustostr(status));
}
/*
@@ -663,9 +681,9 @@ open_iface_live(const char *iface, pcap_t **pcap)
}
static int
-open_single_iface(const char *iface, pcap_t **pcap)
+open_single_iface(const char *iface, pcap_t **pcap, uint32_t snaplen)
{
- if (open_iface_live(iface, pcap) < 0) {
+ if (open_iface_live(iface, pcap, snaplen) < 0) {
PMD_LOG(ERR, "Couldn't open interface %s", iface);
return -1;
}
@@ -674,7 +692,8 @@ open_single_iface(const char *iface, pcap_t **pcap)
}
static int
-open_single_tx_pcap(const char *pcap_filename, pcap_dumper_t **dumper)
+open_single_tx_pcap(const char *pcap_filename, pcap_dumper_t **dumper,
+ uint32_t snaplen)
{
pcap_t *tx_pcap;
@@ -684,7 +703,7 @@ open_single_tx_pcap(const char *pcap_filename, pcap_dumper_t **dumper)
* pcap holder.
*/
tx_pcap = pcap_open_dead_with_tstamp_precision(DLT_EN10MB,
- RTE_ETH_PCAP_SNAPSHOT_LEN, PCAP_TSTAMP_PRECISION_NANO);
+ snaplen, PCAP_TSTAMP_PRECISION_NANO);
if (tx_pcap == NULL) {
PMD_LOG(ERR, "Couldn't create dead pcap");
return -1;
@@ -693,9 +712,9 @@ open_single_tx_pcap(const char *pcap_filename, pcap_dumper_t **dumper)
/* The dumper is created using the previous pcap_t reference */
*dumper = pcap_dump_open(tx_pcap, pcap_filename);
if (*dumper == NULL) {
+ PMD_LOG(ERR, "Couldn't open %s for writing: %s",
+ pcap_filename, pcap_geterr(tx_pcap));
pcap_close(tx_pcap);
- PMD_LOG(ERR, "Couldn't open %s for writing.",
- pcap_filename);
return -1;
}
@@ -737,6 +756,21 @@ count_packets_in_pcap(pcap_t **pcap, struct pcap_rx_queue *pcap_q)
return pcap_pkt_count;
}
+static int
+set_iface_direction(const char *iface, pcap_t *pcap,
+ pcap_direction_t direction)
+{
+ const char *direction_str = (direction == PCAP_D_IN) ? "IN" : "OUT";
+ if (pcap_setdirection(pcap, direction) < 0) {
+ PMD_LOG(ERR, "Setting %s pcap direction %s failed - %s",
+ iface, direction_str, pcap_geterr(pcap));
+ return -1;
+ }
+ PMD_LOG(INFO, "Setting %s pcap direction %s",
+ iface, direction_str);
+ return 0;
+}
+
static int
eth_dev_start(struct rte_eth_dev *dev)
{
@@ -745,15 +779,15 @@ eth_dev_start(struct rte_eth_dev *dev)
struct pmd_process_private *pp = dev->process_private;
struct pcap_tx_queue *tx;
struct pcap_rx_queue *rx;
+ uint32_t snaplen = internals->snapshot_len;
/* Special iface case. Single pcap is open and shared between tx/rx. */
if (internals->single_iface) {
tx = &internals->tx_queue[0];
rx = &internals->rx_queue[0];
- if (!pp->tx_pcap[0] &&
- strcmp(tx->type, ETH_PCAP_IFACE_ARG) == 0) {
- if (open_single_iface(tx->name, &pp->tx_pcap[0]) < 0)
+ if (!pp->tx_pcap[0] && strcmp(tx->type, ETH_PCAP_IFACE_ARG) == 0) {
+ if (open_single_iface(tx->name, &pp->tx_pcap[0], snaplen) < 0)
return -1;
pp->rx_pcap[0] = pp->tx_pcap[0];
}
@@ -765,14 +799,11 @@ eth_dev_start(struct rte_eth_dev *dev)
for (i = 0; i < dev->data->nb_tx_queues; i++) {
tx = &internals->tx_queue[i];
- if (!pp->tx_dumper[i] &&
- strcmp(tx->type, ETH_PCAP_TX_PCAP_ARG) == 0) {
- if (open_single_tx_pcap(tx->name,
- &pp->tx_dumper[i]) < 0)
+ if (!pp->tx_dumper[i] && strcmp(tx->type, ETH_PCAP_TX_PCAP_ARG) == 0) {
+ if (open_single_tx_pcap(tx->name, &pp->tx_dumper[i], snaplen) < 0)
return -1;
- } else if (!pp->tx_pcap[i] &&
- strcmp(tx->type, ETH_PCAP_TX_IFACE_ARG) == 0) {
- if (open_single_iface(tx->name, &pp->tx_pcap[i]) < 0)
+ } else if (!pp->tx_pcap[i] && strcmp(tx->type, ETH_PCAP_TX_IFACE_ARG) == 0) {
+ if (open_single_iface(tx->name, &pp->tx_pcap[i], snaplen) < 0)
return -1;
}
}
@@ -787,9 +818,14 @@ eth_dev_start(struct rte_eth_dev *dev)
if (strcmp(rx->type, ETH_PCAP_RX_PCAP_ARG) == 0) {
if (open_single_rx_pcap(rx->name, &pp->rx_pcap[i]) < 0)
return -1;
- } else if (strcmp(rx->type, ETH_PCAP_RX_IFACE_ARG) == 0) {
- if (open_single_iface(rx->name, &pp->rx_pcap[i]) < 0)
+ } else if (strcmp(rx->type, ETH_PCAP_RX_IFACE_ARG) == 0 ||
+ strcmp(rx->type, ETH_PCAP_RX_IFACE_IN_ARG) == 0) {
+ if (open_single_iface(rx->name, &pp->rx_pcap[i], snaplen) < 0)
return -1;
+ /* Set direction for rx_iface_in */
+ if (strcmp(rx->type, ETH_PCAP_RX_IFACE_IN_ARG) == 0)
+ set_iface_direction(rx->name, pp->rx_pcap[i],
+ PCAP_D_IN);
}
}
@@ -890,11 +926,11 @@ eth_dev_info(struct rte_eth_dev *dev,
dev_info->if_index = internals->if_index;
dev_info->max_mac_addrs = 1;
- dev_info->max_rx_pktlen = RTE_ETH_PCAP_SNAPSHOT_LEN;
+ dev_info->max_rx_pktlen = internals->snapshot_len;
dev_info->max_rx_queues = dev->data->nb_rx_queues;
dev_info->max_tx_queues = dev->data->nb_tx_queues;
- dev_info->min_rx_bufsize = 0;
- dev_info->max_mtu = RTE_ETH_PCAP_SNAPSHOT_LEN - RTE_ETHER_HDR_LEN;
+ dev_info->min_rx_bufsize = RTE_ETHER_MIN_LEN;
+ dev_info->max_mtu = internals->snapshot_len - RTE_ETHER_HDR_LEN;
dev_info->tx_offload_capa = RTE_ETH_TX_OFFLOAD_MULTI_SEGS |
RTE_ETH_TX_OFFLOAD_VLAN_INSERT;
dev_info->rx_offload_capa = RTE_ETH_RX_OFFLOAD_VLAN_STRIP |
@@ -1152,7 +1188,7 @@ eth_tx_queue_setup(struct rte_eth_dev *dev,
pcap_q->port_id = dev->data->port_id;
pcap_q->queue_id = tx_queue_id;
- pcap_q->bounce_buf = rte_malloc_socket(NULL, RTE_ETH_PCAP_SNAPSHOT_LEN,
+ pcap_q->bounce_buf = rte_malloc_socket(NULL, internals->snapshot_len,
RTE_CACHE_LINE_SIZE, socket_id);
if (pcap_q->bounce_buf == NULL)
return -ENOMEM;
@@ -1304,7 +1340,8 @@ open_tx_pcap(const char *key, const char *value, void *extra_args)
struct pmd_devargs *dumpers = extra_args;
pcap_dumper_t *dumper;
- if (open_single_tx_pcap(pcap_filename, &dumper) < 0)
+ if (open_single_tx_pcap(pcap_filename, &dumper,
+ dumpers->snapshot_len) < 0)
return -1;
if (add_queue(dumpers, pcap_filename, key, NULL, dumper) < 0) {
@@ -1325,7 +1362,7 @@ open_rx_tx_iface(const char *key, const char *value, void *extra_args)
struct pmd_devargs *tx = extra_args;
pcap_t *pcap = NULL;
- if (open_single_iface(iface, &pcap) < 0)
+ if (open_single_iface(iface, &pcap, tx->snapshot_len) < 0)
return -1;
tx->queue[0].pcap = pcap;
@@ -1335,21 +1372,6 @@ open_rx_tx_iface(const char *key, const char *value, void *extra_args)
return 0;
}
-static inline int
-set_iface_direction(const char *iface, pcap_t *pcap,
- pcap_direction_t direction)
-{
- const char *direction_str = (direction == PCAP_D_IN) ? "IN" : "OUT";
- if (pcap_setdirection(pcap, direction) < 0) {
- PMD_LOG(ERR, "Setting %s pcap direction %s failed - %s",
- iface, direction_str, pcap_geterr(pcap));
- return -1;
- }
- PMD_LOG(INFO, "Setting %s pcap direction %s",
- iface, direction_str);
- return 0;
-}
-
static inline int
open_iface(const char *key, const char *value, void *extra_args)
{
@@ -1357,7 +1379,7 @@ open_iface(const char *key, const char *value, void *extra_args)
struct pmd_devargs *pmd = extra_args;
pcap_t *pcap = NULL;
- if (open_single_iface(iface, &pcap) < 0)
+ if (open_single_iface(iface, &pcap, pmd->snapshot_len) < 0)
return -1;
if (add_queue(pmd, iface, key, pcap, NULL) < 0) {
pcap_close(pcap);
@@ -1425,6 +1447,31 @@ process_bool_flag(const char *key, const char *value, void *extra_args)
return 0;
}
+static int
+process_snapshot_len(const char *key, const char *value, void *extra_args)
+{
+ uint32_t *snaplen = extra_args;
+ unsigned long val;
+ char *endptr;
+
+ if (value == NULL || *value == '\0') {
+ PMD_LOG(ERR, "Argument '%s' requires a value", key);
+ return -1;
+ }
+
+ errno = 0;
+ val = strtoul(value, &endptr, 10);
+ if (errno != 0 || *endptr != '\0' ||
+ val < RTE_ETHER_HDR_LEN ||
+ val > ETH_PCAP_MAXIMUM_SNAPLEN) {
+ PMD_LOG(ERR, "Invalid '%s' value '%s'", key, value);
+ return -1;
+ }
+
+ *snaplen = (uint32_t)val;
+ return 0;
+}
+
static int
pmd_init_internals(struct rte_vdev_device *vdev,
const unsigned int nb_rx_queues,
@@ -1588,6 +1635,8 @@ eth_from_pcaps(struct rte_vdev_device *vdev,
}
internals->infinite_rx = infinite_rx;
+ internals->snapshot_len = devargs_all->snapshot_len;
+
/* Assign rx ops. */
if (infinite_rx)
eth_dev->rx_pkt_burst = eth_pcap_rx_infinite;
@@ -1649,6 +1698,7 @@ pmd_pcap_probe(struct rte_vdev_device *dev)
int ret = 0;
struct pmd_devargs_all devargs_all = {
+ .snapshot_len = ETH_PCAP_SNAPSHOT_LEN_DEFAULT,
.single_iface = 0,
.is_tx_pcap = 0,
.is_tx_iface = 0,
@@ -1688,6 +1738,25 @@ pmd_pcap_probe(struct rte_vdev_device *dev)
return -1;
}
+ /*
+ * Process optional snapshot length argument first, so the value
+ * is available when opening pcap handles for files and interfaces.
+ */
+ if (rte_kvargs_count(kvlist, ETH_PCAP_SNAPSHOT_LEN_ARG) == 1) {
+ ret = rte_kvargs_process(kvlist, ETH_PCAP_SNAPSHOT_LEN_ARG,
+ &process_snapshot_len,
+ &devargs_all.snapshot_len);
+ if (ret < 0)
+ goto free_kvlist;
+ }
+
+ /*
+ * Propagate snapshot length to per-queue devargs so that
+ * the open callbacks can access it.
+ */
+ devargs_all.rx_queues.snapshot_len = devargs_all.snapshot_len;
+ devargs_all.tx_queues.snapshot_len = devargs_all.snapshot_len;
+
/*
* If iface argument is passed we open the NICs and use them for
* reading / writing
@@ -1893,4 +1962,5 @@ RTE_PMD_REGISTER_PARAM_STRING(net_pcap,
ETH_PCAP_TX_IFACE_ARG "=<ifc> "
ETH_PCAP_IFACE_ARG "=<ifc> "
ETH_PCAP_PHY_MAC_ARG "=<0|1> "
- ETH_PCAP_INFINITE_RX_ARG "=<0|1>");
+ ETH_PCAP_INFINITE_RX_ARG "=<0|1> "
+ ETH_PCAP_SNAPSHOT_LEN_ARG "=<int>");
--
2.53.0
^ permalink raw reply related
* [PATCH v21 22/25] net/pcap: add Rx scatter offload
From: Stephen Hemminger @ 2026-03-25 2:37 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger, Bruce Richardson
In-Reply-To: <20260325024018.1275209-1-stephen@networkplumber.org>
Add RTE_ETH_RX_OFFLOAD_SCATTER to the advertised receive offload
capabilities if not using infinite_rx mode.
Validate in rx_queue_setup that the mbuf pool data room is
large enough when scatter is not enabled, following the
same pattern as the virtio driver.
Gate the multi-segment receive path on the scatter offload flag
and drop oversized packets when scatter is disabled.
Reject scatter with infinite_rx mode since the ring-based replay
path does not support multi-segment mbufs.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
drivers/net/pcap/pcap_ethdev.c | 47 ++++++++++++++++++++++++++++++++--
1 file changed, 45 insertions(+), 2 deletions(-)
diff --git a/drivers/net/pcap/pcap_ethdev.c b/drivers/net/pcap/pcap_ethdev.c
index 2de9c85124..9b1fbdba3d 100644
--- a/drivers/net/pcap/pcap_ethdev.c
+++ b/drivers/net/pcap/pcap_ethdev.c
@@ -79,6 +79,7 @@ struct pcap_rx_queue {
uint16_t port_id;
uint16_t queue_id;
bool vlan_strip;
+ bool rx_scatter;
bool timestamp_offloading;
struct rte_mempool *mb_pool;
struct queue_stat rx_stat;
@@ -112,6 +113,7 @@ struct pmd_internals {
bool phy_mac;
bool infinite_rx;
bool vlan_strip;
+ bool rx_scatter;
bool timestamp_offloading;
};
@@ -342,14 +344,19 @@ eth_pcap_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
/* pcap packet will fit in the mbuf, can copy it */
rte_memcpy(rte_pktmbuf_mtod(mbuf, void *), packet, len);
mbuf->data_len = len;
- } else {
- /* Try read jumbo frame into multi mbufs. */
+ } else if (pcap_q->rx_scatter) {
+ /* Scatter into multi-segment mbufs. */
if (unlikely(eth_pcap_rx_jumbo(pcap_q->mb_pool,
mbuf, packet, len) == -1)) {
pcap_q->rx_stat.err_pkts++;
rte_pktmbuf_free(mbuf);
break;
}
+ } else {
+ /* Packet too large and scatter not enabled, drop it. */
+ pcap_q->rx_stat.err_pkts++;
+ rte_pktmbuf_free(mbuf);
+ continue;
}
mbuf->pkt_len = len;
@@ -904,6 +911,7 @@ eth_dev_configure(struct rte_eth_dev *dev)
const struct rte_eth_rxmode *rxmode = &dev_conf->rxmode;
internals->vlan_strip = !!(rxmode->offloads & RTE_ETH_RX_OFFLOAD_VLAN_STRIP);
+ internals->rx_scatter = !!(rxmode->offloads & RTE_ETH_RX_OFFLOAD_SCATTER);
internals->timestamp_offloading = !!(rxmode->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP);
if (internals->timestamp_offloading && timestamp_rx_dynflag == 0) {
@@ -936,6 +944,9 @@ eth_dev_info(struct rte_eth_dev *dev,
dev_info->rx_offload_capa = RTE_ETH_RX_OFFLOAD_VLAN_STRIP |
RTE_ETH_RX_OFFLOAD_TIMESTAMP;
+ if (!internals->infinite_rx)
+ dev_info->rx_offload_capa |= RTE_ETH_RX_OFFLOAD_SCATTER;
+
return 0;
}
@@ -1095,11 +1106,37 @@ eth_rx_queue_setup(struct rte_eth_dev *dev,
{
struct pmd_internals *internals = dev->data->dev_private;
struct pcap_rx_queue *pcap_q = &internals->rx_queue[rx_queue_id];
+ uint16_t buf_size;
+ bool rx_scatter;
+
+ buf_size = rte_pktmbuf_data_room_size(mb_pool) - RTE_PKTMBUF_HEADROOM;
+ rx_scatter = !!(dev->data->dev_conf.rxmode.offloads &
+ RTE_ETH_RX_OFFLOAD_SCATTER);
+
+ /*
+ * If Rx scatter is not enabled, verify that the mbuf data room
+ * can hold the largest received packet in a single segment.
+ * Use the MTU-derived frame size as the expected maximum, not
+ * snapshot_len which is a capture truncation limit rather than
+ * an expected packet size.
+ */
+ if (!rx_scatter) {
+ uint32_t max_rx_pktlen = dev->data->mtu + RTE_ETHER_HDR_LEN;
+
+ if (max_rx_pktlen > buf_size) {
+ PMD_LOG(ERR,
+ "Rx scatter is disabled and RxQ mbuf pool object size is too small "
+ "(buf_size=%u, max_rx_pkt_len=%u)",
+ buf_size, max_rx_pktlen);
+ return -EINVAL;
+ }
+ }
pcap_q->mb_pool = mb_pool;
pcap_q->port_id = dev->data->port_id;
pcap_q->queue_id = rx_queue_id;
pcap_q->vlan_strip = internals->vlan_strip;
+ pcap_q->rx_scatter = rx_scatter;
dev->data->rx_queues[rx_queue_id] = pcap_q;
pcap_q->timestamp_offloading = internals->timestamp_offloading;
@@ -1112,6 +1149,12 @@ eth_rx_queue_setup(struct rte_eth_dev *dev,
pcap_t **pcap;
bool save_vlan_strip;
+ if (rx_scatter) {
+ PMD_LOG(ERR,
+ "Rx scatter is not supported with infinite_rx mode");
+ return -EINVAL;
+ }
+
pp = rte_eth_devices[pcap_q->port_id].process_private;
pcap = &pp->rx_pcap[pcap_q->queue_id];
--
2.53.0
^ permalink raw reply related
* [PATCH v21 23/25] net/pcap: add link status change support for iface mode
From: Stephen Hemminger @ 2026-03-25 2:37 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger
In-Reply-To: <20260325024018.1275209-1-stephen@networkplumber.org>
Add LSC interrupt support for pass-through (iface=) mode so
applications can receive link state change notifications via
the standard ethdev callback mechanism.
Uses alarm-based polling to periodically check the underlying
interface state via osdep_iface_link_get(). The LSC flag is
advertised only for iface mode devices, and polling is gated
on the application enabling intr_conf.lsc in port configuration.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
doc/guides/nics/features/pcap.ini | 1 +
doc/guides/nics/pcap.rst | 5 +++
doc/guides/rel_notes/release_26_03.rst | 1 +
drivers/net/pcap/pcap_ethdev.c | 45 ++++++++++++++++++++++++++
4 files changed, 52 insertions(+)
diff --git a/doc/guides/nics/features/pcap.ini b/doc/guides/nics/features/pcap.ini
index 9f234aa7b9..38dd298603 100644
--- a/doc/guides/nics/features/pcap.ini
+++ b/doc/guides/nics/features/pcap.ini
@@ -5,6 +5,7 @@
;
[Features]
Link status = Y
+Link status event = Y
Queue start/stop = Y
Timestamp offload = P
Basic stats = Y
diff --git a/doc/guides/nics/pcap.rst b/doc/guides/nics/pcap.rst
index 2754e205c7..72f2250790 100644
--- a/doc/guides/nics/pcap.rst
+++ b/doc/guides/nics/pcap.rst
@@ -278,3 +278,8 @@ Features and Limitations
* The PMD will insert the pcap header packet timestamp with nanoseconds resolution and
UNIX origin, i.e. time since 1-JAN-1970 UTC, if ``RTE_ETH_RX_OFFLOAD_TIMESTAMP`` is enabled.
+
+* In ``iface`` mode, the PMD supports link status change (LSC) notifications.
+ When the application enables ``intr_conf.lsc`` in the port configuration,
+ the driver polls the underlying network interface once per second and generates an
+ ``RTE_ETH_EVENT_INTR_LSC`` callback when the link state changes.
diff --git a/doc/guides/rel_notes/release_26_03.rst b/doc/guides/rel_notes/release_26_03.rst
index 869084d4cd..feb080aa3f 100644
--- a/doc/guides/rel_notes/release_26_03.rst
+++ b/doc/guides/rel_notes/release_26_03.rst
@@ -142,6 +142,7 @@ New Features
* Added support for reporting link state in ``iface`` mode.
* Receive timestamps support nanosecond precision.
* Added ``snaplen`` devarg to configure packet capture snapshot length.
+ * Added support for Link State interrupt in ``iface`` mode.
Removed Items
diff --git a/drivers/net/pcap/pcap_ethdev.c b/drivers/net/pcap/pcap_ethdev.c
index 9b1fbdba3d..aca9ff189c 100644
--- a/drivers/net/pcap/pcap_ethdev.c
+++ b/drivers/net/pcap/pcap_ethdev.c
@@ -17,6 +17,7 @@
#include <pcap.h>
+#include <rte_alarm.h>
#include <rte_cycles.h>
#include <rte_ring.h>
#include <rte_ethdev.h>
@@ -48,6 +49,8 @@
/* This is defined in libpcap but not exposed in headers */
#define ETH_PCAP_MAXIMUM_SNAPLEN 262144
+#define ETH_PCAP_LSC_POLL_INTERVAL_US (1000 * 1000) /* 1 second */
+
#define ETH_PCAP_ARG_MAXLEN 64
#define RTE_PMD_PCAP_MAX_QUEUES 16
@@ -115,6 +118,7 @@ struct pmd_internals {
bool vlan_strip;
bool rx_scatter;
bool timestamp_offloading;
+ bool lsc_active;
};
struct pmd_process_private {
@@ -163,6 +167,9 @@ static const char *valid_arguments[] = {
RTE_LOG_REGISTER_DEFAULT(eth_pcap_logtype, NOTICE);
+/* Forward declarations */
+static int eth_link_update(struct rte_eth_dev *dev, int wait_to_complete);
+
static struct queue_missed_stat*
queue_missed_stat_update(struct rte_eth_dev *dev, unsigned int qid)
{
@@ -763,6 +770,28 @@ count_packets_in_pcap(pcap_t **pcap, struct pcap_rx_queue *pcap_q)
return pcap_pkt_count;
}
+/*
+ * Periodic alarm to poll link state.
+ * Enabled when link state interrupt is enabled in single_iface mode.
+ */
+static void
+eth_pcap_lsc_alarm(void *arg)
+{
+ struct rte_eth_dev *dev = arg;
+ struct pmd_internals *internals = dev->data->dev_private;
+ struct rte_eth_link old_link, new_link;
+
+ rte_eth_linkstatus_get(dev, &old_link);
+ eth_link_update(dev, 0);
+ rte_eth_linkstatus_get(dev, &new_link);
+
+ if (old_link.link_status != new_link.link_status)
+ rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_LSC, NULL);
+
+ if (internals->lsc_active)
+ rte_eal_alarm_set(ETH_PCAP_LSC_POLL_INTERVAL_US, eth_pcap_lsc_alarm, dev);
+}
+
static int
set_iface_direction(const char *iface, pcap_t *pcap,
pcap_direction_t direction)
@@ -845,6 +874,13 @@ eth_dev_start(struct rte_eth_dev *dev)
dev->data->dev_link.link_status = RTE_ETH_LINK_UP;
+ /* Start LSC polling for iface mode if application requested it */
+ if (internals->single_iface && dev->data->dev_conf.intr_conf.lsc) {
+ internals->lsc_active = true;
+ rte_eal_alarm_set(ETH_PCAP_LSC_POLL_INTERVAL_US,
+ eth_pcap_lsc_alarm, dev);
+ }
+
return 0;
}
@@ -862,6 +898,12 @@ eth_dev_stop(struct rte_eth_dev *dev)
/* Special iface case. Single pcap is open and shared between tx/rx. */
if (internals->single_iface) {
+ /* Cancel LSC polling before closing pcap handles */
+ if (internals->lsc_active) {
+ internals->lsc_active = false;
+ rte_eal_alarm_cancel(eth_pcap_lsc_alarm, dev);
+ }
+
queue_missed_stat_on_stop_update(dev, 0);
if (pp->tx_pcap[0] != NULL) {
pcap_close(pp->tx_pcap[0]);
@@ -1669,6 +1711,9 @@ eth_from_pcaps(struct rte_vdev_device *vdev,
internals->if_index =
osdep_iface_index_get(rx_queues->queue[0].name);
+ /* Enable LSC interrupt support for iface mode */
+ eth_dev->data->dev_flags |= RTE_ETH_DEV_INTR_LSC;
+
/* phy_mac arg is applied only if "iface" devarg is provided */
if (rx_queues->phy_mac) {
if (eth_pcap_update_mac(rx_queues->queue[0].name,
--
2.53.0
^ permalink raw reply related
* [PATCH v21 24/25] net/pcap: add EOF notification via link status change
From: Stephen Hemminger @ 2026-03-25 2:37 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger
In-Reply-To: <20260325024018.1275209-1-stephen@networkplumber.org>
Add an "eof" devarg for rx_pcap mode that signals end-of-file by
setting link down and generating an LSC event. This allows
applications to detect when a pcap file has been fully consumed
using the standard ethdev callback mechanism.
The eof and infinite_rx options are mutually exclusive. On device
restart, the EOF state is reset so the file can be replayed.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
doc/guides/nics/pcap.rst | 12 ++++
doc/guides/rel_notes/release_26_03.rst | 2 +
drivers/net/pcap/pcap_ethdev.c | 80 +++++++++++++++++++++++++-
3 files changed, 91 insertions(+), 3 deletions(-)
diff --git a/doc/guides/nics/pcap.rst b/doc/guides/nics/pcap.rst
index 72f2250790..18a9a04652 100644
--- a/doc/guides/nics/pcap.rst
+++ b/doc/guides/nics/pcap.rst
@@ -161,6 +161,18 @@ Runtime Config Options
so all queues on a device will either have this enabled or disabled.
This option should only be provided once per device.
+* Signal end-of-file via link status change
+
+ In case ``rx_pcap=`` configuration is set, the user may want to be notified when
+ all packets in the pcap file have been read. This can be done with the ``eof``
+ devarg, for example::
+
+ --vdev 'net_pcap0,rx_pcap=file_rx.pcap,eof=1'
+
+ When enabled, the driver sets link down and generates an LSC event at end of file.
+ If the device is stopped and restarted, the EOF state is reset.
+ This option cannot be combined with ``infinite_rx``.
+
* Drop all packets on transmit
To drop all packets on transmit for a device,
diff --git a/doc/guides/rel_notes/release_26_03.rst b/doc/guides/rel_notes/release_26_03.rst
index feb080aa3f..6752cf599a 100644
--- a/doc/guides/rel_notes/release_26_03.rst
+++ b/doc/guides/rel_notes/release_26_03.rst
@@ -143,6 +143,8 @@ New Features
* Receive timestamps support nanosecond precision.
* Added ``snaplen`` devarg to configure packet capture snapshot length.
* Added support for Link State interrupt in ``iface`` mode.
+ * Added ``eof`` devarg to use link state to signal end of receive
+ file input.
Removed Items
diff --git a/drivers/net/pcap/pcap_ethdev.c b/drivers/net/pcap/pcap_ethdev.c
index aca9ff189c..a9b0ed2f60 100644
--- a/drivers/net/pcap/pcap_ethdev.c
+++ b/drivers/net/pcap/pcap_ethdev.c
@@ -42,6 +42,7 @@
#define ETH_PCAP_IFACE_ARG "iface"
#define ETH_PCAP_PHY_MAC_ARG "phy_mac"
#define ETH_PCAP_INFINITE_RX_ARG "infinite_rx"
+#define ETH_PCAP_EOF_ARG "eof"
#define ETH_PCAP_SNAPSHOT_LEN_ARG "snaplen"
#define ETH_PCAP_SNAPSHOT_LEN_DEFAULT 65535
@@ -115,6 +116,8 @@ struct pmd_internals {
bool single_iface;
bool phy_mac;
bool infinite_rx;
+ bool eof;
+ RTE_ATOMIC(bool) eof_signaled;
bool vlan_strip;
bool rx_scatter;
bool timestamp_offloading;
@@ -150,6 +153,7 @@ struct pmd_devargs_all {
bool is_rx_pcap;
bool is_rx_iface;
bool infinite_rx;
+ bool eof;
};
static const char *valid_arguments[] = {
@@ -161,6 +165,7 @@ static const char *valid_arguments[] = {
ETH_PCAP_IFACE_ARG,
ETH_PCAP_PHY_MAC_ARG,
ETH_PCAP_INFINITE_RX_ARG,
+ ETH_PCAP_EOF_ARG,
ETH_PCAP_SNAPSHOT_LEN_ARG,
NULL
};
@@ -308,15 +313,33 @@ eth_pcap_rx_infinite(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
return i;
}
+/*
+ * Deferred EOF alarm callback.
+ *
+ * Scheduled from the RX burst path when end-of-file is reached,
+ * so that rte_eth_dev_callback_process() runs outside the datapath.
+ * This avoids holding any locks that the application callback
+ * might also need, preventing potential deadlocks.
+ */
+static void
+eth_pcap_eof_alarm(void *arg)
+{
+ struct rte_eth_dev *dev = arg;
+
+ rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_LSC, NULL);
+}
+
static uint16_t
eth_pcap_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
{
+ struct pcap_rx_queue *pcap_q = queue;
+ struct rte_eth_dev *dev = &rte_eth_devices[pcap_q->port_id];
+ struct pmd_internals *internals = dev->data->dev_private;
unsigned int i;
struct pcap_pkthdr *header;
struct pmd_process_private *pp;
const u_char *packet;
struct rte_mbuf *mbuf;
- struct pcap_rx_queue *pcap_q = queue;
uint16_t num_rx = 0;
uint32_t rx_bytes = 0;
pcap_t *pcap;
@@ -337,6 +360,23 @@ eth_pcap_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
if (ret == PCAP_ERROR)
pcap_q->rx_stat.err_pkts++;
+ /*
+ * EOF: if eof mode is enabled, set link down and
+ * defer notification via alarm to avoid calling
+ * rte_eth_dev_callback_process() from the datapath.
+ */
+ else if (ret == PCAP_ERROR_BREAK) {
+ bool expected = false;
+
+ if (internals->eof &&
+ rte_atomic_compare_exchange_strong_explicit(
+ &internals->eof_signaled, &expected, true,
+ rte_memory_order_relaxed, rte_memory_order_relaxed)) {
+ eth_link_update(dev, 0);
+ rte_eal_alarm_set(1, eth_pcap_eof_alarm, dev);
+ }
+ }
+
break;
}
@@ -872,6 +912,7 @@ eth_dev_start(struct rte_eth_dev *dev)
for (i = 0; i < dev->data->nb_tx_queues; i++)
dev->data->tx_queue_state[i] = RTE_ETH_QUEUE_STATE_STARTED;
+ rte_atomic_store_explicit(&internals->eof_signaled, false, rte_memory_order_relaxed);
dev->data->dev_link.link_status = RTE_ETH_LINK_UP;
/* Start LSC polling for iface mode if application requested it */
@@ -895,6 +936,7 @@ eth_dev_stop(struct rte_eth_dev *dev)
unsigned int i;
struct pmd_internals *internals = dev->data->dev_private;
struct pmd_process_private *pp = dev->process_private;
+ bool expected;
/* Special iface case. Single pcap is open and shared between tx/rx. */
if (internals->single_iface) {
@@ -934,6 +976,13 @@ eth_dev_stop(struct rte_eth_dev *dev)
}
status_down:
+ /* Cancel any pending EOF alarm */
+ expected = true;
+ if (rte_atomic_compare_exchange_strong_explicit(
+ &internals->eof_signaled, &expected, false,
+ rte_memory_order_relaxed, rte_memory_order_relaxed))
+ rte_eal_alarm_cancel(eth_pcap_eof_alarm, dev);
+
for (i = 0; i < dev->data->nb_rx_queues; i++)
dev->data->rx_queue_state[i] = RTE_ETH_QUEUE_STATE_STOPPED;
@@ -1130,9 +1179,10 @@ eth_link_update(struct rte_eth_dev *dev, int wait_to_complete __rte_unused)
if (internals->single_iface) {
link.link_status = (osdep_iface_link_status(internals->rx_queue[0].name) > 0) ?
RTE_ETH_LINK_UP : RTE_ETH_LINK_DOWN;
+ } else if (rte_atomic_load_explicit(&internals->eof_signaled, rte_memory_order_relaxed)) {
+ link.link_status = RTE_ETH_LINK_DOWN;
} else {
- link.link_status = dev->data->dev_started ?
- RTE_ETH_LINK_UP : RTE_ETH_LINK_DOWN;
+ link.link_status = dev->data->dev_started ? RTE_ETH_LINK_UP : RTE_ETH_LINK_DOWN;
}
return rte_eth_linkstatus_set(dev, &link);
@@ -1723,8 +1773,13 @@ eth_from_pcaps(struct rte_vdev_device *vdev,
}
internals->infinite_rx = infinite_rx;
+ internals->eof = devargs_all->eof;
internals->snapshot_len = devargs_all->snapshot_len;
+ /* Enable LSC for eof mode (already set above for single_iface) */
+ if (internals->eof)
+ eth_dev->data->dev_flags |= RTE_ETH_DEV_INTR_LSC;
+
/* Assign rx ops. */
if (infinite_rx)
eth_dev->rx_pkt_burst = eth_pcap_rx_infinite;
@@ -1913,6 +1968,24 @@ pmd_pcap_probe(struct rte_vdev_device *dev)
"for %s", name);
}
+ /*
+ * Check whether to signal EOF via link status change.
+ */
+ if (rte_kvargs_count(kvlist, ETH_PCAP_EOF_ARG) == 1) {
+ ret = rte_kvargs_process(kvlist, ETH_PCAP_EOF_ARG,
+ &process_bool_flag,
+ &devargs_all.eof);
+ if (ret < 0)
+ goto free_kvlist;
+ }
+
+ if (devargs_all.infinite_rx && devargs_all.eof) {
+ PMD_LOG(ERR, "Cannot use both infinite_rx and eof for %s",
+ name);
+ ret = -EINVAL;
+ goto free_kvlist;
+ }
+
ret = rte_kvargs_process(kvlist, ETH_PCAP_RX_PCAP_ARG,
&open_rx_pcap, &pcaps);
} else if (devargs_all.is_rx_iface) {
@@ -2051,4 +2124,5 @@ RTE_PMD_REGISTER_PARAM_STRING(net_pcap,
ETH_PCAP_IFACE_ARG "=<ifc> "
ETH_PCAP_PHY_MAC_ARG "=<0|1> "
ETH_PCAP_INFINITE_RX_ARG "=<0|1> "
+ ETH_PCAP_EOF_ARG "=<0|1> "
ETH_PCAP_SNAPSHOT_LEN_ARG "=<int>");
--
2.53.0
^ permalink raw reply related
* [PATCH v21 25/25] test: add comprehensive test suite for pcap PMD
From: Stephen Hemminger @ 2026-03-25 2:37 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger
In-Reply-To: <20260325024018.1275209-1-stephen@networkplumber.org>
Add unit tests for the pcap PMD covering file and interface modes.
Tests include:
- basic TX to file and RX from file
- varied packet sizes and jumbo frames
- infinite RX mode
- TX drop mode
- statistics (ipackets, opackets, ibytes, obytes)
- interface (iface=) pass-through mode
- asymmetric rx_iface/tx_iface mode
- rx_iface_in direction filtering
- link status reporting for file and iface modes
- link status change (LSC) with interface toggle
- EOF notification via LSC
- RX timestamps and timestamp with infinite RX
- multiple TX/RX queues
- VLAN strip, insert, and runtime offload configuration
- snapshot length (snaplen) and truncation
- scatter RX and oversized packet drop
- per-queue start/stop
- imissed statistic (pcap_stats kernel drops)
Cross-platform helpers handle temp file creation, interface
discovery, and VLAN packet generation.
The LSC link toggle test requires a pre-created dummy interface
(Linux: dummy0, FreeBSD: disc0) and is skipped if unavailable.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
app/test/meson.build | 2 +
app/test/test_pmd_pcap.c | 4122 ++++++++++++++++++++++++
doc/guides/rel_notes/release_26_03.rst | 1 +
3 files changed, 4125 insertions(+)
create mode 100644 app/test/test_pmd_pcap.c
diff --git a/app/test/meson.build b/app/test/meson.build
index 7d458f9c07..7a4daeb0c7 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -141,6 +141,7 @@ source_file_deps = {
'test_per_lcore.c': [],
'test_pflock.c': [],
'test_pie.c': ['sched'],
+ 'test_pmd_pcap.c': ['net_pcap', 'ethdev', 'bus_vdev'] + packet_burst_generator_deps,
'test_pmd_perf.c': ['ethdev', 'net'] + packet_burst_generator_deps,
'test_pmd_ring.c': ['net_ring', 'ethdev', 'bus_vdev'],
'test_pmd_ring_perf.c': ['ethdev', 'net_ring', 'bus_vdev'],
@@ -217,6 +218,7 @@ source_file_deps = {
source_file_ext_deps = {
'test_compressdev.c': ['zlib'],
'test_pcapng.c': ['pcap'],
+ 'test_pmd_pcap.c': ['pcap'],
}
# the NULL ethdev is used by a number of tests, in some cases as an optional dependency.
diff --git a/app/test/test_pmd_pcap.c b/app/test/test_pmd_pcap.c
new file mode 100644
index 0000000000..1a19ad70c3
--- /dev/null
+++ b/app/test/test_pmd_pcap.c
@@ -0,0 +1,4122 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2026 Stephen Hemminger
+ */
+
+#include "test.h"
+
+#include "packet_burst_generator.h"
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stdint.h>
+
+#ifdef RTE_EXEC_ENV_WINDOWS
+#include <io.h>
+#include <windows.h>
+#define F_OK 0
+#define usleep(us) Sleep((us) / 1000 ? (us) / 1000 : 1)
+#else
+#include <unistd.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <net/if.h>
+#endif
+
+#include <pcap/pcap.h>
+
+#include <rte_ethdev.h>
+#include <rte_bus_vdev.h>
+#include <rte_mbuf.h>
+#include <rte_mbuf_dyn.h>
+#include <rte_mempool.h>
+#include <rte_ether.h>
+#include <rte_string_fns.h>
+#include <rte_cycles.h>
+#include <rte_ip.h>
+#include <rte_udp.h>
+
+#define SOCKET0 0
+#define RING_SIZE 256
+#define NB_MBUF 1024
+#define NUM_PACKETS 64
+#define MAX_PKT_BURST 32
+#define PCAP_SNAPLEN 65535
+
+/* Packet sizes to test */
+#define PKT_SIZE_MIN 60
+#define PKT_SIZE_SMALL 128
+#define PKT_SIZE_MEDIUM 512
+#define PKT_SIZE_LARGE 1024
+#define PKT_SIZE_MTU 1500
+#define PKT_SIZE_JUMBO 9000
+
+static struct rte_mempool *mp;
+
+/* Timestamp dynamic field access */
+static int timestamp_dynfield_offset = -1;
+static uint64_t timestamp_rx_dynflag;
+
+/* Temporary file paths shared between tests */
+static char tx_pcap_path[PATH_MAX]; /* test_tx_to_file -> test_rx_from_file */
+static char vlan_rx_pcap_path[PATH_MAX]; /* test_vlan_strip_rx -> test_vlan_no_strip_rx */
+
+/* Constants for multi-queue tests */
+#define MULTI_QUEUE_NUM_QUEUES 4U
+#define MULTI_QUEUE_NUM_PACKETS 100U
+#define MULTI_QUEUE_BURST_SIZE 32U
+
+/* Test VLAN parameters */
+#define TEST_VLAN_ID 100
+#define TEST_VLAN_PCP 3
+
+/* MAC addresses for packet generation */
+static struct rte_ether_addr src_mac;
+static struct rte_ether_addr dst_mac = {
+ .addr_bytes = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff }
+};
+
+/* Sample Ethernet/IPv4/UDP packet for testing */
+static const uint8_t test_packet[] = {
+ /* Ethernet header */
+ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, /* dst MAC (broadcast) */
+ 0x00, 0x11, 0x22, 0x33, 0x44, 0x55, /* src MAC */
+ 0x08, 0x00, /* EtherType: IPv4 */
+ /* IPv4 header */
+ 0x45, 0x00, 0x00, 0x2e, /* ver, ihl, tos, len */
+ 0x00, 0x01, 0x00, 0x00, /* id, flags, frag */
+ 0x40, 0x11, 0x00, 0x00, /* ttl, proto(UDP), csum */
+ 0x0a, 0x00, 0x00, 0x01, /* src: 10.0.0.1 */
+ 0x0a, 0x00, 0x00, 0x02, /* dst: 10.0.0.2 */
+ /* UDP header */
+ 0x04, 0xd2, 0x04, 0xd2, /* sport, dport (1234) */
+ 0x00, 0x1a, 0x00, 0x00, /* len, csum */
+ /* Payload: "Test packet!" */
+ 0x54, 0x65, 0x73, 0x74, 0x20, 0x70,
+ 0x61, 0x63, 0x6b, 0x65, 0x74, 0x21
+};
+
+/* Helper: Get timestamp from mbuf using dynamic field */
+static inline rte_mbuf_timestamp_t
+mbuf_timestamp_get(const struct rte_mbuf *mbuf)
+{
+ return *RTE_MBUF_DYNFIELD(mbuf, timestamp_dynfield_offset, rte_mbuf_timestamp_t *);
+}
+
+/* Helper: Check if mbuf has valid timestamp */
+static inline int
+mbuf_has_timestamp(const struct rte_mbuf *mbuf)
+{
+ return (mbuf->ol_flags & timestamp_rx_dynflag) != 0;
+}
+
+/* Helper: Initialize timestamp dynamic field access */
+static int
+timestamp_init(void)
+{
+ int offset;
+
+ offset = rte_mbuf_dynfield_lookup(RTE_MBUF_DYNFIELD_TIMESTAMP_NAME, NULL);
+ if (offset < 0) {
+ printf("Timestamp dynfield not registered\n");
+ return -1;
+ }
+ timestamp_dynfield_offset = offset;
+
+ offset = rte_mbuf_dynflag_lookup(RTE_MBUF_DYNFLAG_RX_TIMESTAMP_NAME, NULL);
+ if (offset < 0) {
+ printf("Timestamp dynflag not registered\n");
+ return -1;
+ }
+ timestamp_rx_dynflag = RTE_BIT64(offset);
+ return 0;
+}
+
+#ifdef RTE_EXEC_ENV_WINDOWS
+
+/*
+ * Helper: Create a unique temporary file path (Windows version)
+ */
+static int
+create_temp_path(char *buf, size_t buflen, const char *prefix)
+{
+ char temp_dir[MAX_PATH];
+ char temp_file[MAX_PATH];
+ DWORD ret;
+
+ ret = GetTempPathA(sizeof(temp_dir), temp_dir);
+ if (ret == 0 || ret > sizeof(temp_dir))
+ return -1;
+
+ if (GetTempFileNameA(temp_dir, prefix, 0, temp_file) == 0)
+ return -1;
+
+ ret = snprintf(buf, buflen, "%s.pcap", temp_file);
+ if (ret >= buflen) {
+ DeleteFileA(temp_file);
+ return -1;
+ }
+
+ if (MoveFileA(temp_file, buf) == 0) {
+ DeleteFileA(temp_file);
+ return -1;
+ }
+
+ return 0;
+}
+
+/*
+ * Helper: Remove temporary file (Windows version)
+ */
+static inline void
+remove_temp_file(const char *path)
+{
+ if (path[0] != '\0')
+ DeleteFileA(path);
+}
+
+#else /* POSIX */
+
+/*
+ * Helper: Create a unique temporary file path (POSIX version)
+ */
+static int
+create_temp_path(char *buf, size_t buflen, const char *prefix)
+{
+ int fd;
+
+ snprintf(buf, buflen, "/tmp/%s_XXXXXX.pcap", prefix);
+ fd = mkstemps(buf, 5); /* 5 = strlen(".pcap") */
+ if (fd < 0)
+ return -1;
+ close(fd);
+ return 0;
+}
+
+/*
+ * Helper: Remove temporary file (POSIX version)
+ */
+static inline void
+remove_temp_file(const char *path)
+{
+ if (path[0] != '\0')
+ unlink(path);
+}
+
+#endif /* RTE_EXEC_ENV_WINDOWS */
+
+/*
+ * Helper: Create a pcap file with test packets using libpcap
+ */
+static int
+create_test_pcap(const char *path, unsigned int num_pkts)
+{
+ pcap_t *pd;
+ pcap_dumper_t *dumper;
+ struct pcap_pkthdr hdr;
+ unsigned int i;
+
+ pd = pcap_open_dead(DLT_EN10MB, PCAP_SNAPLEN);
+ if (pd == NULL) {
+ printf("pcap_open_dead failed\n");
+ return -1;
+ }
+
+ dumper = pcap_dump_open(pd, path);
+ if (dumper == NULL) {
+ printf("pcap_dump_open failed: %s\n", pcap_geterr(pd));
+ pcap_close(pd);
+ return -1;
+ }
+
+ memset(&hdr, 0, sizeof(hdr));
+ hdr.caplen = sizeof(test_packet);
+ hdr.len = sizeof(test_packet);
+
+ for (i = 0; i < num_pkts; i++) {
+ hdr.ts.tv_sec = i;
+ hdr.ts.tv_usec = 0;
+ pcap_dump((u_char *)dumper, &hdr, test_packet);
+ }
+
+ pcap_dump_close(dumper);
+ pcap_close(pd);
+ return 0;
+}
+
+/*
+ * Helper: Create pcap file with packets of specified size
+ */
+static int
+create_sized_pcap(const char *path, unsigned int num_pkts, uint16_t pkt_size)
+{
+ pcap_t *pd;
+ pcap_dumper_t *dumper;
+ struct pcap_pkthdr hdr;
+ uint8_t *pkt_data;
+ unsigned int i;
+
+ /* Minimum valid ethernet frame */
+ if (pkt_size < 60)
+ pkt_size = 60;
+
+ pkt_data = calloc(1, pkt_size);
+ if (pkt_data == NULL)
+ return -1;
+
+ /* Build ethernet header */
+ struct rte_ether_hdr *eth_hdr = (struct rte_ether_hdr *)pkt_data;
+ rte_ether_addr_copy(&src_mac, ð_hdr->src_addr);
+ rte_ether_addr_copy(&dst_mac, ð_hdr->dst_addr);
+ eth_hdr->ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4);
+
+ /* Build IP header */
+ struct rte_ipv4_hdr *ip_hdr = (struct rte_ipv4_hdr *)(eth_hdr + 1);
+ uint16_t ip_len = pkt_size - sizeof(struct rte_ether_hdr);
+ ip_hdr->version_ihl = RTE_IPV4_VHL_DEF;
+ ip_hdr->total_length = rte_cpu_to_be_16(ip_len);
+ ip_hdr->time_to_live = 64;
+ ip_hdr->next_proto_id = IPPROTO_UDP;
+ ip_hdr->src_addr = rte_cpu_to_be_32(IPV4_ADDR(10, 0, 0, 1));
+ ip_hdr->dst_addr = rte_cpu_to_be_32(IPV4_ADDR(10, 0, 0, 2));
+ ip_hdr->hdr_checksum = 0;
+ ip_hdr->hdr_checksum = rte_ipv4_cksum(ip_hdr);
+
+ /* Build UDP header */
+ struct rte_udp_hdr *udp_hdr = (struct rte_udp_hdr *)(ip_hdr + 1);
+ uint16_t udp_len = ip_len - sizeof(struct rte_ipv4_hdr);
+ udp_hdr->src_port = rte_cpu_to_be_16(1234);
+ udp_hdr->dst_port = rte_cpu_to_be_16(1234);
+ udp_hdr->dgram_len = rte_cpu_to_be_16(udp_len);
+ udp_hdr->dgram_cksum = 0;
+
+ /* Fill payload with pattern */
+ uint8_t *payload = (uint8_t *)(udp_hdr + 1);
+ uint16_t payload_len = udp_len - sizeof(struct rte_udp_hdr);
+ for (uint16_t j = 0; j < payload_len; j++)
+ payload[j] = (uint8_t)(j & 0xFF);
+
+ pd = pcap_open_dead(DLT_EN10MB, PCAP_SNAPLEN);
+ if (pd == NULL) {
+ free(pkt_data);
+ return -1;
+ }
+
+ dumper = pcap_dump_open(pd, path);
+ if (dumper == NULL) {
+ pcap_close(pd);
+ free(pkt_data);
+ return -1;
+ }
+
+ memset(&hdr, 0, sizeof(hdr));
+ hdr.caplen = pkt_size;
+ hdr.len = pkt_size;
+
+ for (i = 0; i < num_pkts; i++) {
+ hdr.ts.tv_sec = i;
+ hdr.ts.tv_usec = 0;
+ /* Vary sequence byte in payload */
+ payload[0] = (uint8_t)(i & 0xFF);
+ pcap_dump((u_char *)dumper, &hdr, pkt_data);
+ }
+
+ pcap_dump_close(dumper);
+ pcap_close(pd);
+ free(pkt_data);
+ return 0;
+}
+
+/*
+ * Helper: Create pcap file with varied packet sizes
+ */
+static int
+create_varied_pcap(const char *path, unsigned int num_pkts)
+{
+ static const uint16_t sizes[] = {
+ PKT_SIZE_MIN, PKT_SIZE_SMALL, PKT_SIZE_MEDIUM,
+ PKT_SIZE_LARGE, PKT_SIZE_MTU
+ };
+ pcap_t *pd;
+ pcap_dumper_t *dumper;
+ struct pcap_pkthdr hdr;
+ uint8_t *pkt_data;
+ unsigned int i;
+
+ pkt_data = calloc(1, PKT_SIZE_MTU);
+ if (pkt_data == NULL)
+ return -1;
+
+ pd = pcap_open_dead(DLT_EN10MB, PCAP_SNAPLEN);
+ if (pd == NULL) {
+ free(pkt_data);
+ return -1;
+ }
+
+ dumper = pcap_dump_open(pd, path);
+ if (dumper == NULL) {
+ pcap_close(pd);
+ free(pkt_data);
+ return -1;
+ }
+
+ for (i = 0; i < num_pkts; i++) {
+ uint16_t pkt_size = sizes[i % RTE_DIM(sizes)];
+
+ memset(pkt_data, 0, pkt_size);
+
+ /* Build ethernet header */
+ struct rte_ether_hdr *eth_hdr = (struct rte_ether_hdr *)pkt_data;
+ rte_ether_addr_copy(&src_mac, ð_hdr->src_addr);
+ rte_ether_addr_copy(&dst_mac, ð_hdr->dst_addr);
+ eth_hdr->ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4);
+
+ /* Build IP header */
+ struct rte_ipv4_hdr *ip_hdr = (struct rte_ipv4_hdr *)(eth_hdr + 1);
+ uint16_t ip_len = pkt_size - sizeof(struct rte_ether_hdr);
+ ip_hdr->version_ihl = RTE_IPV4_VHL_DEF;
+ ip_hdr->total_length = rte_cpu_to_be_16(ip_len);
+ ip_hdr->time_to_live = 64;
+ ip_hdr->next_proto_id = IPPROTO_UDP;
+ ip_hdr->src_addr = rte_cpu_to_be_32(IPV4_ADDR(10, 0, 0, 1));
+ ip_hdr->dst_addr = rte_cpu_to_be_32(IPV4_ADDR(10, 0, 0, 2));
+ ip_hdr->hdr_checksum = 0;
+ ip_hdr->hdr_checksum = rte_ipv4_cksum(ip_hdr);
+
+ /* Build UDP header */
+ struct rte_udp_hdr *udp_hdr = (struct rte_udp_hdr *)(ip_hdr + 1);
+ uint16_t udp_len = ip_len - sizeof(struct rte_ipv4_hdr);
+ udp_hdr->src_port = rte_cpu_to_be_16(1234);
+ udp_hdr->dst_port = rte_cpu_to_be_16(1234);
+ udp_hdr->dgram_len = rte_cpu_to_be_16(udp_len);
+
+ memset(&hdr, 0, sizeof(hdr));
+ hdr.ts.tv_sec = i;
+ hdr.caplen = pkt_size;
+ hdr.len = pkt_size;
+
+ pcap_dump((u_char *)dumper, &hdr, pkt_data);
+ }
+
+ pcap_dump_close(dumper);
+ pcap_close(pd);
+ free(pkt_data);
+ return 0;
+}
+
+/*
+ * Helper: Create pcap file with specific timestamps for testing
+ */
+static int
+create_timestamped_pcap(const char *path, unsigned int num_pkts,
+ uint32_t base_sec, uint32_t usec_increment)
+{
+ pcap_t *pd;
+ pcap_dumper_t *dumper;
+ struct pcap_pkthdr hdr;
+ unsigned int i;
+
+ pd = pcap_open_dead_with_tstamp_precision(DLT_EN10MB, PCAP_SNAPLEN,
+ PCAP_TSTAMP_PRECISION_MICRO);
+ if (pd == NULL)
+ return -1;
+
+ dumper = pcap_dump_open(pd, path);
+ if (dumper == NULL) {
+ pcap_close(pd);
+ return -1;
+ }
+
+ memset(&hdr, 0, sizeof(hdr));
+ hdr.caplen = sizeof(test_packet);
+ hdr.len = sizeof(test_packet);
+
+ for (i = 0; i < num_pkts; i++) {
+ uint64_t total_usec = (uint64_t)i * usec_increment;
+ hdr.ts.tv_sec = base_sec + total_usec / 1000000;
+ hdr.ts.tv_usec = total_usec % 1000000;
+ pcap_dump((u_char *)dumper, &hdr, test_packet);
+ }
+
+ pcap_dump_close(dumper);
+ pcap_close(pd);
+ return 0;
+}
+
+/*
+ * Helper: Count packets in a pcap file using libpcap
+ */
+static int
+count_pcap_packets(const char *path)
+{
+ pcap_t *pd;
+ char errbuf[PCAP_ERRBUF_SIZE];
+ struct pcap_pkthdr *hdr;
+ const u_char *data;
+ int count = 0;
+
+ pd = pcap_open_offline(path, errbuf);
+ if (pd == NULL)
+ return -1;
+
+ while (pcap_next_ex(pd, &hdr, &data) == 1)
+ count++;
+
+ pcap_close(pd);
+ return count;
+}
+
+/*
+ * Helper: Get packet sizes from pcap file
+ */
+static int
+get_pcap_packet_sizes(const char *path, uint16_t *sizes, unsigned int max_pkts)
+{
+ pcap_t *pd;
+ char errbuf[PCAP_ERRBUF_SIZE];
+ struct pcap_pkthdr *hdr;
+ const u_char *data;
+ unsigned int count = 0;
+
+ pd = pcap_open_offline(path, errbuf);
+ if (pd == NULL)
+ return -1;
+
+ while (pcap_next_ex(pd, &hdr, &data) == 1 && count < max_pkts) {
+ sizes[count] = hdr->caplen;
+ count++;
+ }
+
+ pcap_close(pd);
+ return count;
+}
+
+/*
+ * Helper: Verify packets in pcap file are truncated correctly
+ * Returns 0 if all packets have caplen == expected_caplen and len == expected_len
+ */
+static int
+verify_pcap_truncation(const char *path, uint32_t expected_caplen,
+ uint32_t expected_len, unsigned int *pkt_count)
+{
+ pcap_t *pd;
+ char errbuf[PCAP_ERRBUF_SIZE];
+ struct pcap_pkthdr *hdr;
+ const u_char *data;
+ unsigned int count = 0;
+
+ pd = pcap_open_offline(path, errbuf);
+ if (pd == NULL)
+ return -1;
+
+ while (pcap_next_ex(pd, &hdr, &data) == 1) {
+ if (hdr->caplen != expected_caplen || hdr->len != expected_len) {
+ printf("Packet %u: caplen=%u (expected %u), len=%u (expected %u)\n",
+ count, hdr->caplen, expected_caplen,
+ hdr->len, expected_len);
+ pcap_close(pd);
+ return -1;
+ }
+ count++;
+ }
+
+ pcap_close(pd);
+ if (pkt_count)
+ *pkt_count = count;
+ return 0;
+}
+
+/*
+ * Helper: Configure and start a pcap ethdev port with custom config
+ */
+static int
+setup_pcap_port_conf(uint16_t port, const struct rte_eth_conf *conf)
+{
+ int ret;
+
+ ret = rte_eth_dev_configure(port, 1, 1, conf);
+ TEST_ASSERT(ret == 0, "Failed to configure port %u: %s",
+ port, rte_strerror(-ret));
+
+ ret = rte_eth_rx_queue_setup(port, 0, RING_SIZE, SOCKET0, NULL, mp);
+ TEST_ASSERT(ret == 0, "Failed to setup RX queue on port %u: %s",
+ port, rte_strerror(-ret));
+
+ ret = rte_eth_tx_queue_setup(port, 0, RING_SIZE, SOCKET0, NULL);
+ TEST_ASSERT(ret == 0, "Failed to setup TX queue on port %u: %s",
+ port, rte_strerror(-ret));
+
+ ret = rte_eth_dev_start(port);
+ TEST_ASSERT(ret == 0, "Failed to start port %u: %s",
+ port, rte_strerror(-ret));
+
+ return 0;
+}
+
+/*
+ * Helper: Configure and start a pcap ethdev port (default: timestamp offload)
+ */
+static int
+setup_pcap_port(uint16_t port)
+{
+ struct rte_eth_conf port_conf = {
+ .rxmode.offloads = RTE_ETH_RX_OFFLOAD_TIMESTAMP,
+ };
+
+ return setup_pcap_port_conf(port, &port_conf);
+}
+
+/*
+ * Helper: Create a pcap vdev and return its port ID
+ */
+static int
+create_pcap_vdev(const char *name, const char *devargs, uint16_t *port_id)
+{
+ int ret;
+
+ ret = rte_vdev_init(name, devargs);
+ TEST_ASSERT(ret == 0, "Failed to create vdev %s: %s",
+ name, rte_strerror(-ret));
+
+ ret = rte_eth_dev_get_port_by_name(name, port_id);
+ TEST_ASSERT(ret == 0, "Failed to get port ID for %s", name);
+
+ return 0;
+}
+
+/*
+ * Helper: Cleanup a pcap vdev
+ */
+static void
+cleanup_pcap_vdev(const char *name, uint16_t port_id)
+{
+ rte_eth_dev_stop(port_id);
+ rte_vdev_uninit(name);
+}
+
+/*
+ * Helper: Create a pcap file with VLAN-tagged packets
+ */
+static int
+create_vlan_tagged_pcap(const char *path, unsigned int num_pkts,
+ uint16_t vlan_id, uint8_t pcp)
+{
+ pcap_t *pd;
+ pcap_dumper_t *dumper;
+ struct pcap_pkthdr hdr;
+ uint8_t pkt_data[128];
+ unsigned int i;
+ size_t pkt_len;
+
+ /* Build VLAN-tagged packet */
+ struct rte_ether_hdr *eth_hdr = (struct rte_ether_hdr *)pkt_data;
+ struct rte_vlan_hdr *vlan_hdr;
+ struct rte_ipv4_hdr *ip_hdr;
+ struct rte_udp_hdr *udp_hdr;
+
+ rte_ether_addr_copy(&src_mac, ð_hdr->src_addr);
+ rte_ether_addr_copy(&dst_mac, ð_hdr->dst_addr);
+ eth_hdr->ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_VLAN);
+
+ vlan_hdr = (struct rte_vlan_hdr *)(eth_hdr + 1);
+ vlan_hdr->vlan_tci = rte_cpu_to_be_16((pcp << 13) | vlan_id);
+ vlan_hdr->eth_proto = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4);
+
+ ip_hdr = (struct rte_ipv4_hdr *)(vlan_hdr + 1);
+ ip_hdr->version_ihl = RTE_IPV4_VHL_DEF;
+ ip_hdr->total_length = rte_cpu_to_be_16(46); /* 20 IP + 8 UDP + 18 payload */
+ ip_hdr->time_to_live = 64;
+ ip_hdr->next_proto_id = IPPROTO_UDP;
+ ip_hdr->src_addr = rte_cpu_to_be_32(IPV4_ADDR(10, 0, 0, 1));
+ ip_hdr->dst_addr = rte_cpu_to_be_32(IPV4_ADDR(10, 0, 0, 2));
+ ip_hdr->hdr_checksum = 0;
+ ip_hdr->hdr_checksum = rte_ipv4_cksum(ip_hdr);
+
+ udp_hdr = (struct rte_udp_hdr *)(ip_hdr + 1);
+ udp_hdr->src_port = rte_cpu_to_be_16(1234);
+ udp_hdr->dst_port = rte_cpu_to_be_16(1234);
+ udp_hdr->dgram_len = rte_cpu_to_be_16(26); /* 8 UDP + 18 payload */
+ udp_hdr->dgram_cksum = 0;
+
+ /* Add payload pattern */
+ uint8_t *payload = (uint8_t *)(udp_hdr + 1);
+ for (int j = 0; j < 18; j++)
+ payload[j] = (uint8_t)(j & 0xFF);
+
+ pkt_len = sizeof(struct rte_ether_hdr) + sizeof(struct rte_vlan_hdr) +
+ sizeof(struct rte_ipv4_hdr) + sizeof(struct rte_udp_hdr) + 18;
+
+ pd = pcap_open_dead(DLT_EN10MB, PCAP_SNAPLEN);
+ if (pd == NULL)
+ return -1;
+
+ dumper = pcap_dump_open(pd, path);
+ if (dumper == NULL) {
+ pcap_close(pd);
+ return -1;
+ }
+
+ memset(&hdr, 0, sizeof(hdr));
+ hdr.caplen = pkt_len;
+ hdr.len = pkt_len;
+
+ for (i = 0; i < num_pkts; i++) {
+ hdr.ts.tv_sec = i;
+ hdr.ts.tv_usec = 0;
+ /* Vary sequence byte in payload */
+ payload[0] = (uint8_t)(i & 0xFF);
+ pcap_dump((u_char *)dumper, &hdr, pkt_data);
+ }
+
+ pcap_dump_close(dumper);
+ pcap_close(pd);
+ return 0;
+}
+
+/*
+ * Helper: Verify packet has VLAN tag with expected values
+ */
+static int
+verify_vlan_tag(struct rte_mbuf *mbuf, uint16_t expected_vlan_id, uint8_t expected_pcp)
+{
+ struct rte_ether_hdr *eth_hdr;
+ struct rte_vlan_hdr *vlan_hdr;
+ uint16_t tci;
+
+ eth_hdr = rte_pktmbuf_mtod(mbuf, struct rte_ether_hdr *);
+
+ /* Check for VLAN ethertype */
+ if (rte_be_to_cpu_16(eth_hdr->ether_type) != RTE_ETHER_TYPE_VLAN) {
+ printf(" Error: Expected VLAN ethertype 0x%04x, got 0x%04x\n",
+ RTE_ETHER_TYPE_VLAN, rte_be_to_cpu_16(eth_hdr->ether_type));
+ return -1;
+ }
+
+ vlan_hdr = (struct rte_vlan_hdr *)(eth_hdr + 1);
+ tci = rte_be_to_cpu_16(vlan_hdr->vlan_tci);
+
+ if ((tci & 0x0FFF) != expected_vlan_id) {
+ printf(" Error: Expected VLAN ID %u, got %u\n",
+ expected_vlan_id, tci & 0x0FFF);
+ return -1;
+ }
+
+ if ((tci >> 13) != expected_pcp) {
+ printf(" Error: Expected PCP %u, got %u\n",
+ expected_pcp, tci >> 13);
+ return -1;
+ }
+
+ return 0;
+}
+
+/*
+ * Helper: Verify packet has NO VLAN tag (plain ethernet)
+ */
+static int
+verify_no_vlan_tag(struct rte_mbuf *mbuf)
+{
+ struct rte_ether_hdr *eth_hdr;
+
+ eth_hdr = rte_pktmbuf_mtod(mbuf, struct rte_ether_hdr *);
+
+ if (rte_be_to_cpu_16(eth_hdr->ether_type) == RTE_ETHER_TYPE_VLAN) {
+ printf(" Error: Packet still has VLAN tag (ethertype 0x8100)\n");
+ return -1;
+ }
+
+ return 0;
+}
+
+/*
+ * Helper: Count packets in pcap and verify VLAN tags
+ */
+static int
+count_vlan_packets_in_pcap(const char *path, uint16_t expected_vlan_id,
+ int expect_vlan_tag)
+{
+ pcap_t *pd;
+ char errbuf[PCAP_ERRBUF_SIZE];
+ struct pcap_pkthdr *hdr;
+ const u_char *data;
+ int count = 0;
+ int errors = 0;
+
+ pd = pcap_open_offline(path, errbuf);
+ if (pd == NULL)
+ return -1;
+
+ while (pcap_next_ex(pd, &hdr, &data) == 1) {
+ const struct rte_ether_hdr *eth = (const struct rte_ether_hdr *)data;
+ uint16_t etype = rte_be_to_cpu_16(eth->ether_type);
+
+ if (expect_vlan_tag) {
+ if (etype != RTE_ETHER_TYPE_VLAN) {
+ printf(" Packet %d: expected VLAN tag, got ethertype 0x%04x\n",
+ count, etype);
+ errors++;
+ } else {
+ const struct rte_vlan_hdr *vlan =
+ (const struct rte_vlan_hdr *)(eth + 1);
+ uint16_t tci = rte_be_to_cpu_16(vlan->vlan_tci);
+ if ((tci & 0x0FFF) != expected_vlan_id) {
+ printf(" Packet %d: VLAN ID %u != expected %u\n",
+ count, tci & 0x0FFF, expected_vlan_id);
+ errors++;
+ }
+ }
+ } else {
+ if (etype == RTE_ETHER_TYPE_VLAN) {
+ printf(" Packet %d: unexpected VLAN tag present\n", count);
+ errors++;
+ }
+ }
+ count++;
+ }
+
+ pcap_close(pd);
+
+ if (errors > 0)
+ return -errors;
+
+ return count;
+}
+
+/*
+ * Helper: Configure port with VLAN strip offload enabled
+ */
+static int
+setup_pcap_port_vlan_strip(uint16_t port)
+{
+ struct rte_eth_conf port_conf = {
+ .rxmode.offloads = RTE_ETH_RX_OFFLOAD_VLAN_STRIP,
+ };
+
+ return setup_pcap_port_conf(port, &port_conf);
+}
+
+/*
+ * Helper: Allocate mbufs with VLAN TX offload info set
+ */
+static int
+alloc_vlan_tx_mbufs(struct rte_mbuf **mbufs, unsigned int count,
+ uint16_t vlan_id, uint8_t pcp)
+{
+ unsigned int i;
+ int ret;
+
+ ret = rte_pktmbuf_alloc_bulk(mp, mbufs, count);
+ if (ret != 0)
+ return -1;
+
+ for (i = 0; i < count; i++) {
+ /* Copy untagged test packet */
+ memcpy(rte_pktmbuf_mtod(mbufs[i], void *),
+ test_packet, sizeof(test_packet));
+ mbufs[i]->data_len = sizeof(test_packet);
+ mbufs[i]->pkt_len = sizeof(test_packet);
+
+ /* Set VLAN TX offload flags */
+ mbufs[i]->ol_flags |= RTE_MBUF_F_TX_VLAN;
+ mbufs[i]->vlan_tci = (pcp << 13) | vlan_id;
+ }
+
+ return 0;
+}
+
+/*
+ * Helper: Generate test packets using packet_burst_generator
+ */
+static int
+generate_test_packets(struct rte_mempool *pool, struct rte_mbuf **mbufs,
+ unsigned int count, uint16_t pkt_len)
+{
+ struct rte_ether_hdr eth_hdr;
+ struct rte_ipv4_hdr ip_hdr;
+ struct rte_udp_hdr udp_hdr;
+ uint16_t ip_pkt_data_len;
+ int nb_pkt;
+
+ /* Initialize ethernet header */
+ initialize_eth_header(ð_hdr, &src_mac, &dst_mac,
+ RTE_ETHER_TYPE_IPV4, 0, 0);
+
+ /* Calculate IP payload length (total - eth - ip headers) */
+ ip_pkt_data_len = pkt_len - sizeof(struct rte_ether_hdr) -
+ sizeof(struct rte_ipv4_hdr);
+
+ /* Initialize UDP header */
+ initialize_udp_header(&udp_hdr, 1234, 1234,
+ ip_pkt_data_len - sizeof(struct rte_udp_hdr));
+
+ /* Initialize IPv4 header */
+ initialize_ipv4_header(&ip_hdr, IPV4_ADDR(10, 0, 0, 1),
+ IPV4_ADDR(10, 0, 0, 2), ip_pkt_data_len);
+
+ /* Generate packet burst */
+ nb_pkt = generate_packet_burst(pool, mbufs, ð_hdr, 0,
+ &ip_hdr, 1, &udp_hdr,
+ count, pkt_len, 1);
+
+ return nb_pkt;
+}
+
+/*
+ * Helper: Allocate mbufs and fill with test packet data (legacy method)
+ */
+static int
+alloc_test_mbufs(struct rte_mbuf **mbufs, unsigned int count)
+{
+ unsigned int i;
+ int ret;
+
+ ret = rte_pktmbuf_alloc_bulk(mp, mbufs, count);
+ if (ret != 0)
+ return -1;
+
+ for (i = 0; i < count; i++) {
+ memcpy(rte_pktmbuf_mtod(mbufs[i], void *),
+ test_packet, sizeof(test_packet));
+ mbufs[i]->data_len = sizeof(test_packet);
+ mbufs[i]->pkt_len = sizeof(test_packet);
+ }
+ return 0;
+}
+
+/*
+ * Helper: Allocate a multi-segment mbuf for jumbo frames
+ * Returns the head mbuf with chained segments, or NULL on failure
+ */
+static struct rte_mbuf *
+alloc_jumbo_mbuf(uint32_t pkt_len, uint8_t fill_byte)
+{
+ struct rte_mbuf *head = NULL;
+ struct rte_mbuf **prev = &head;
+ uint32_t remaining = pkt_len;
+ uint16_t nb_segs = 0;
+
+ while (remaining > 0) {
+ struct rte_mbuf *seg = rte_pktmbuf_alloc(mp);
+ uint16_t seg_size;
+
+ if (seg == NULL) {
+ rte_pktmbuf_free(head);
+ return NULL;
+ }
+
+ seg_size = RTE_MIN(remaining, rte_pktmbuf_tailroom(seg));
+ seg->data_len = seg_size;
+
+ /* Fill segment with pattern */
+ memset(rte_pktmbuf_mtod(seg, void *), fill_byte, seg_size);
+
+ *prev = seg;
+ prev = &seg->next;
+ remaining -= seg_size;
+ nb_segs++;
+ }
+
+ if (head != NULL) {
+ head->pkt_len = pkt_len;
+ head->nb_segs = nb_segs;
+ }
+
+ return head;
+}
+
+/*
+ * Helper: Allocate a multi-segment mbuf with controlled segment size.
+ *
+ * Unlike alloc_jumbo_mbuf which fills segments to tailroom capacity,
+ * this limits each segment to seg_size bytes, guaranteeing that the
+ * resulting mbuf chain has multiple segments even for moderate pkt_len.
+ */
+static struct rte_mbuf *
+alloc_multiseg_mbuf(uint32_t pkt_len, uint16_t seg_size, uint8_t fill_byte)
+{
+ struct rte_mbuf *head = NULL;
+ struct rte_mbuf **prev = &head;
+ uint32_t remaining = pkt_len;
+ uint16_t nb_segs = 0;
+
+ while (remaining > 0) {
+ struct rte_mbuf *seg = rte_pktmbuf_alloc(mp);
+ uint16_t this_len;
+
+ if (seg == NULL) {
+ rte_pktmbuf_free(head);
+ return NULL;
+ }
+
+ this_len = RTE_MIN(remaining, seg_size);
+ this_len = RTE_MIN(this_len, rte_pktmbuf_tailroom(seg));
+ seg->data_len = this_len;
+
+ memset(rte_pktmbuf_mtod(seg, void *), fill_byte, this_len);
+
+ *prev = seg;
+ prev = &seg->next;
+ remaining -= this_len;
+ nb_segs++;
+ }
+
+ if (head != NULL) {
+ head->pkt_len = pkt_len;
+ head->nb_segs = nb_segs;
+ }
+
+ return head;
+}
+
+/*
+ * Helper: Receive packets from port (no retry needed for file-based RX)
+ */
+static int
+receive_packets(uint16_t port, struct rte_mbuf **mbufs,
+ unsigned int max_pkts, unsigned int *received)
+{
+ unsigned int total = 0;
+
+ while (total < max_pkts) {
+ uint16_t nb_rx = rte_eth_rx_burst(port, 0, &mbufs[total], max_pkts - total);
+ if (nb_rx == 0)
+ break;
+ total += nb_rx;
+ }
+ *received = total;
+ return 0;
+}
+
+/*
+ * Helper: Verify mbuf contains expected test packet
+ */
+static int
+verify_packet(struct rte_mbuf *mbuf)
+{
+ TEST_ASSERT_EQUAL(rte_pktmbuf_data_len(mbuf), sizeof(test_packet),
+ "Packet length mismatch");
+ TEST_ASSERT_BUFFERS_ARE_EQUAL(rte_pktmbuf_mtod(mbuf, void *),
+ test_packet, sizeof(test_packet),
+ "Packet data mismatch");
+ return 0;
+}
+
+/*
+ * Helper: Check if interface supports Ethernet (DLT_EN10MB)
+ *
+ * The pcap PMD only works with Ethernet interfaces. On FreeBSD/macOS,
+ * the loopback interface uses DLT_NULL which is incompatible.
+ */
+static int
+iface_is_ethernet(const char *name)
+{
+ char errbuf[PCAP_ERRBUF_SIZE];
+ pcap_t *pcap;
+ int datalink;
+
+ pcap = pcap_open_live(name, 256, 0, 0, errbuf);
+ if (pcap == NULL)
+ return 0;
+
+ datalink = pcap_datalink(pcap);
+ pcap_close(pcap);
+
+ return datalink == DLT_EN10MB;
+}
+
+/*
+ * Helper: Find a usable test interface using pcap_findalldevs
+ *
+ * Uses libpcap's portable interface enumeration which works on
+ * Linux, FreeBSD, macOS, and Windows.
+ *
+ * Only selects interfaces that support Ethernet link type (DLT_EN10MB).
+ * This excludes loopback on FreeBSD/macOS which uses DLT_NULL.
+ *
+ * Preference order:
+ * 1. Loopback interface (if Ethernet - Linux only)
+ * 2. Any interface that is UP and RUNNING
+ * 3. Any available Ethernet interface
+ *
+ * Returns static buffer with interface name, or NULL if none found.
+ */
+static const char *
+find_test_iface(void)
+{
+ static char iface_name[256];
+ pcap_if_t *alldevs, *dev;
+ char errbuf[PCAP_ERRBUF_SIZE];
+ const char *loopback = NULL;
+ const char *any_up = NULL;
+ const char *any_ether = NULL;
+
+ if (pcap_findalldevs(&alldevs, errbuf) != 0) {
+ printf("pcap_findalldevs failed: %s\n", errbuf);
+ return NULL;
+ }
+
+ if (alldevs == NULL) {
+ printf("No interfaces found\n");
+ return NULL;
+ }
+
+ for (dev = alldevs; dev != NULL; dev = dev->next) {
+ if (dev->name == NULL)
+ continue;
+
+ /* Only consider Ethernet interfaces */
+ if (!iface_is_ethernet(dev->name))
+ continue;
+
+ if (any_ether == NULL)
+ any_ether = dev->name;
+
+ /* Prefer loopback for safety (Linux lo supports DLT_EN10MB) */
+ if ((dev->flags & PCAP_IF_LOOPBACK) && loopback == NULL) {
+ loopback = dev->name;
+ continue;
+ }
+
+#ifdef PCAP_IF_UP
+ if ((dev->flags & PCAP_IF_UP) &&
+ (dev->flags & PCAP_IF_RUNNING) &&
+ any_up == NULL)
+ any_up = dev->name;
+#else
+ if (any_up == NULL)
+ any_up = dev->name;
+#endif
+ }
+
+ /* Select best available interface */
+ const char *selected = NULL;
+ if (loopback != NULL)
+ selected = loopback;
+ else if (any_up != NULL)
+ selected = any_up;
+ else if (any_ether != NULL)
+ selected = any_ether;
+
+ if (selected != NULL)
+ strlcpy(iface_name, selected, sizeof(iface_name));
+
+ pcap_freealldevs(alldevs);
+ return selected ? iface_name : NULL;
+}
+
+/*
+ * Test: Transmit packets to pcap file
+ */
+static int
+test_tx_to_file(void)
+{
+ struct rte_mbuf *mbufs[NUM_PACKETS];
+ char devargs[256];
+ uint16_t port_id;
+ int nb_tx, pkt_count;
+ int ret;
+
+ printf("Testing TX to pcap file\n");
+
+ TEST_ASSERT(create_temp_path(tx_pcap_path, sizeof(tx_pcap_path),
+ "pcap_tx") == 0,
+ "Failed to create temp file path");
+
+ ret = snprintf(devargs, sizeof(devargs), "tx_pcap=%s", tx_pcap_path);
+ TEST_ASSERT(ret > 0 && ret < (int)sizeof(devargs),
+ "devargs string truncated");
+ TEST_ASSERT(create_pcap_vdev("net_pcap_tx", devargs, &port_id) == 0,
+ "Failed to create TX vdev");
+ TEST_ASSERT(setup_pcap_port(port_id) == 0,
+ "Failed to setup TX port");
+ TEST_ASSERT(alloc_test_mbufs(mbufs, NUM_PACKETS) == 0,
+ "Failed to allocate mbufs");
+
+ nb_tx = rte_eth_tx_burst(port_id, 0, mbufs, NUM_PACKETS);
+ TEST_ASSERT_EQUAL(nb_tx, NUM_PACKETS,
+ "TX burst failed: sent %d/%d", nb_tx, NUM_PACKETS);
+
+ cleanup_pcap_vdev("net_pcap_tx", port_id);
+
+ pkt_count = count_pcap_packets(tx_pcap_path);
+ TEST_ASSERT_EQUAL(pkt_count, NUM_PACKETS,
+ "Pcap file has %d packets, expected %d",
+ pkt_count, NUM_PACKETS);
+
+ printf("TX to file PASSED: %d packets written\n", NUM_PACKETS);
+ return TEST_SUCCESS;
+}
+
+/*
+ * Test: Receive packets from pcap file
+ * Uses output from TX test as input
+ */
+static int
+test_rx_from_file(void)
+{
+ struct rte_mbuf *mbufs[NUM_PACKETS];
+ char devargs[256];
+ uint16_t port_id;
+ unsigned int received, i;
+ int ret;
+
+ printf("Testing RX from pcap file\n");
+
+ /* Create input file if TX test didn't run */
+ if (access(tx_pcap_path, F_OK) != 0) {
+ TEST_ASSERT(create_temp_path(tx_pcap_path, sizeof(tx_pcap_path),
+ "pcap_rx_input") == 0,
+ "Failed to create temp path");
+ TEST_ASSERT(create_test_pcap(tx_pcap_path, NUM_PACKETS) == 0,
+ "Failed to create input pcap");
+ }
+
+ ret = snprintf(devargs, sizeof(devargs), "rx_pcap=%s", tx_pcap_path);
+ TEST_ASSERT(ret > 0 && ret < (int)sizeof(devargs),
+ "devargs string truncated");
+ TEST_ASSERT(create_pcap_vdev("net_pcap_rx", devargs, &port_id) == 0,
+ "Failed to create RX vdev");
+ TEST_ASSERT(setup_pcap_port(port_id) == 0,
+ "Failed to setup RX port");
+
+ receive_packets(port_id, mbufs, NUM_PACKETS, &received);
+ TEST_ASSERT_EQUAL(received, NUM_PACKETS,
+ "Received %u packets, expected %d", received, NUM_PACKETS);
+
+ for (i = 0; i < received; i++) {
+ TEST_ASSERT(verify_packet(mbufs[i]) == 0,
+ "Packet %u verification failed", i);
+ }
+ rte_pktmbuf_free_bulk(mbufs, received);
+
+ cleanup_pcap_vdev("net_pcap_rx", port_id);
+
+ printf("RX from file PASSED: %u packets verified\n", received);
+ return TEST_SUCCESS;
+}
+
+/*
+ * Test: TX with varied packet sizes using packet_burst_generator
+ */
+static int
+test_tx_varied_sizes(void)
+{
+ static const uint16_t test_sizes[] = {
+ PKT_SIZE_MIN, PKT_SIZE_SMALL, 200
+ };
+ char tx_path[PATH_MAX];
+ char devargs[256];
+ uint16_t port_id;
+ unsigned int i;
+ int ret;
+
+ printf("Testing TX with varied packet sizes\n");
+
+ TEST_ASSERT(create_temp_path(tx_path, sizeof(tx_path),
+ "pcap_tx_varied") == 0,
+ "Failed to create temp file path");
+
+ ret = snprintf(devargs, sizeof(devargs), "tx_pcap=%s", tx_path);
+ TEST_ASSERT(ret > 0 && ret < (int)sizeof(devargs),
+ "devargs string truncated");
+ TEST_ASSERT(create_pcap_vdev("net_pcap_tx_var", devargs, &port_id) == 0,
+ "Failed to create TX vdev");
+ TEST_ASSERT(setup_pcap_port(port_id) == 0,
+ "Failed to setup TX port");
+
+ unsigned int total_tx = 0;
+
+ for (i = 0; i < RTE_DIM(test_sizes); i++) {
+ struct rte_mbuf *mbufs[MAX_PKT_BURST];
+ int nb_pkt, nb_tx;
+
+ nb_pkt = generate_test_packets(mp, mbufs, MAX_PKT_BURST,
+ test_sizes[i]);
+ TEST_ASSERT(nb_pkt > 0,
+ "Failed to generate packets of size %u",
+ test_sizes[i]);
+
+ nb_tx = rte_eth_tx_burst(port_id, 0, mbufs, nb_pkt);
+ if (nb_tx < nb_pkt)
+ rte_pktmbuf_free_bulk(&mbufs[nb_tx], nb_pkt - nb_tx);
+
+ printf(" Size %u: generated %d, transmitted %d\n",
+ test_sizes[i], nb_pkt, nb_tx);
+ TEST_ASSERT(nb_tx > 0, "Failed to TX packets of size %u",
+ test_sizes[i]);
+ total_tx += nb_tx;
+ }
+
+ cleanup_pcap_vdev("net_pcap_tx_var", port_id);
+
+ /* Read back pcap file and verify packet sizes match what was sent */
+ {
+ uint16_t sizes[MAX_PKT_BURST * RTE_DIM(test_sizes)];
+ int pkt_count;
+ unsigned int idx = 0;
+
+ pkt_count = get_pcap_packet_sizes(tx_path, sizes,
+ RTE_DIM(sizes));
+ TEST_ASSERT_EQUAL((unsigned int)pkt_count, total_tx,
+ "Pcap has %d packets, expected %u",
+ pkt_count, total_tx);
+
+ for (i = 0; i < RTE_DIM(test_sizes); i++) {
+ unsigned int j;
+ /* Each size produced MAX_PKT_BURST (or fewer) packets */
+ for (j = 0; j < MAX_PKT_BURST && idx < (unsigned int)pkt_count; j++, idx++) {
+ TEST_ASSERT_EQUAL(sizes[idx], test_sizes[i],
+ "Packet %u: size %u, expected %u",
+ idx, sizes[idx], test_sizes[i]);
+ }
+ }
+ }
+
+ remove_temp_file(tx_path);
+
+ printf("TX varied sizes PASSED: %u packets verified\n", total_tx);
+ return TEST_SUCCESS;
+}
+
+/*
+ * Test: RX with varied packet sizes
+ */
+static int
+test_rx_varied_sizes(void)
+{
+ static const uint16_t expected_sizes[] = {
+ PKT_SIZE_MIN, PKT_SIZE_SMALL, PKT_SIZE_MEDIUM,
+ PKT_SIZE_LARGE, PKT_SIZE_MTU
+ };
+ struct rte_mbuf *mbufs[NUM_PACKETS];
+ uint16_t rx_sizes[NUM_PACKETS];
+ char varied_pcap_path[PATH_MAX];
+ char devargs[256];
+ uint16_t port_id;
+ unsigned int received, i;
+ int ret;
+
+ printf("Testing RX with varied packet sizes\n");
+
+ TEST_ASSERT(create_temp_path(varied_pcap_path, sizeof(varied_pcap_path),
+ "pcap_varied") == 0,
+ "Failed to create temp path");
+ TEST_ASSERT(create_varied_pcap(varied_pcap_path, NUM_PACKETS) == 0,
+ "Failed to create varied pcap");
+
+ ret = snprintf(devargs, sizeof(devargs), "rx_pcap=%s", varied_pcap_path);
+ TEST_ASSERT(ret > 0 && ret < (int)sizeof(devargs),
+ "devargs string truncated");
+ TEST_ASSERT(create_pcap_vdev("net_pcap_var", devargs, &port_id) == 0,
+ "Failed to create varied RX vdev");
+ TEST_ASSERT(setup_pcap_port(port_id) == 0,
+ "Failed to setup varied RX port");
+
+ receive_packets(port_id, mbufs, NUM_PACKETS, &received);
+ TEST_ASSERT_EQUAL(received, NUM_PACKETS,
+ "Received %u packets, expected %d", received, NUM_PACKETS);
+
+ /* Verify packet sizes match expected pattern */
+ for (i = 0; i < received; i++) {
+ uint16_t expected = expected_sizes[i % RTE_DIM(expected_sizes)];
+ rx_sizes[i] = rte_pktmbuf_pkt_len(mbufs[i]);
+ TEST_ASSERT_EQUAL(rx_sizes[i], expected,
+ "Packet %u: size %u, expected %u",
+ i, rx_sizes[i], expected);
+ }
+
+ rte_pktmbuf_free_bulk(mbufs, received);
+ cleanup_pcap_vdev("net_pcap_var", port_id);
+ remove_temp_file(varied_pcap_path);
+
+ printf("RX varied sizes PASSED: %u packets with correct sizes\n", received);
+ return TEST_SUCCESS;
+}
+
+/*
+ * Test: Infinite RX mode - loops through pcap file continuously
+ */
+static int
+test_infinite_rx(void)
+{
+ struct rte_mbuf *mbufs[MAX_PKT_BURST];
+ char infinite_pcap_path[PATH_MAX];
+ char devargs[256];
+ uint16_t port_id;
+ unsigned int total_rx = 0;
+ int iter, attempts;
+ int ret;
+
+ printf("Testing infinite RX mode\n");
+
+ TEST_ASSERT(create_temp_path(infinite_pcap_path, sizeof(infinite_pcap_path),
+ "pcap_inf") == 0,
+ "Failed to create temp path");
+ TEST_ASSERT(create_test_pcap(infinite_pcap_path, NUM_PACKETS) == 0,
+ "Failed to create input pcap");
+
+ ret = snprintf(devargs, sizeof(devargs),
+ "rx_pcap=%s,infinite_rx=1", infinite_pcap_path);
+ TEST_ASSERT(ret > 0 && ret < (int)sizeof(devargs),
+ "devargs string truncated");
+ TEST_ASSERT(create_pcap_vdev("net_pcap_inf", devargs, &port_id) == 0,
+ "Failed to create infinite RX vdev");
+ TEST_ASSERT(setup_pcap_port(port_id) == 0,
+ "Failed to setup infinite RX port");
+
+ /* Read more packets than file contains to verify looping */
+ for (iter = 0; iter < 3 && total_rx < NUM_PACKETS * 2; iter++) {
+ for (attempts = 0; attempts < 100 && total_rx < NUM_PACKETS * 2;
+ attempts++) {
+ uint16_t nb_rx = rte_eth_rx_burst(port_id, 0, mbufs,
+ MAX_PKT_BURST);
+ if (nb_rx > 0)
+ rte_pktmbuf_free_bulk(mbufs, nb_rx);
+ total_rx += nb_rx;
+ if (nb_rx == 0)
+ usleep(100);
+ }
+ }
+
+ cleanup_pcap_vdev("net_pcap_inf", port_id);
+ remove_temp_file(infinite_pcap_path);
+
+ TEST_ASSERT(total_rx >= NUM_PACKETS * 2,
+ "Infinite RX: got %u packets, need >= %d",
+ total_rx, NUM_PACKETS * 2);
+
+ printf("Infinite RX PASSED: %u packets (file has %d)\n",
+ total_rx, NUM_PACKETS);
+ return TEST_SUCCESS;
+}
+
+/*
+ * Test: TX drop mode - packets dropped when no tx_pcap specified
+ */
+static int
+test_tx_drop(void)
+{
+ struct rte_mbuf *mbufs[NUM_PACKETS];
+ struct rte_eth_stats stats;
+ char rx_pcap_path[PATH_MAX];
+ char devargs[256];
+ uint16_t port_id;
+ int nb_tx;
+ int ret;
+
+ printf("Testing TX drop mode\n");
+
+ TEST_ASSERT(create_temp_path(rx_pcap_path, sizeof(rx_pcap_path), "pcap_drop") == 0,
+ "Failed to create temp path");
+ TEST_ASSERT(create_test_pcap(rx_pcap_path, NUM_PACKETS) == 0,
+ "Failed to create input pcap");
+
+ /* Only rx_pcap - TX should silently drop */
+ ret = snprintf(devargs, sizeof(devargs), "rx_pcap=%s", rx_pcap_path);
+ TEST_ASSERT(ret > 0 && ret < (int)sizeof(devargs),
+ "devargs string truncated");
+ TEST_ASSERT(create_pcap_vdev("net_pcap_drop", devargs, &port_id) == 0,
+ "Failed to create drop vdev");
+ TEST_ASSERT(setup_pcap_port(port_id) == 0,
+ "Failed to setup drop port");
+ TEST_ASSERT(alloc_test_mbufs(mbufs, NUM_PACKETS) == 0,
+ "Failed to allocate mbufs");
+
+ TEST_ASSERT(rte_eth_stats_reset(port_id) == 0,
+ "Failed to reset stats");
+ nb_tx = rte_eth_tx_burst(port_id, 0, mbufs, NUM_PACKETS);
+
+ /* Packets should be accepted even in drop mode */
+ TEST_ASSERT_EQUAL(nb_tx, NUM_PACKETS,
+ "Drop mode TX: %d/%d accepted", nb_tx, NUM_PACKETS);
+
+ TEST_ASSERT(rte_eth_stats_get(port_id, &stats) == 0,
+ "Failed to get stats");
+ cleanup_pcap_vdev("net_pcap_drop", port_id);
+ remove_temp_file(rx_pcap_path);
+
+ printf("TX drop PASSED: %d packets dropped, opackets=%" PRIu64"\n",
+ nb_tx, stats.opackets);
+ return TEST_SUCCESS;
+}
+
+/*
+ * Test: Statistics accuracy and reset
+ */
+static int
+test_stats(void)
+{
+ struct rte_mbuf *mbufs[NUM_PACKETS];
+ struct rte_eth_stats stats;
+ char rx_pcap_path[PATH_MAX];
+ char devargs[256];
+ char stats_tx_path[PATH_MAX];
+ uint16_t port_id;
+ unsigned int received;
+ int nb_tx;
+ int ret;
+
+ printf("Testing statistics accuracy\n");
+
+ TEST_ASSERT(create_temp_path(rx_pcap_path, sizeof(rx_pcap_path), "pcap_stats_rx") == 0,
+ "Failed to create RX temp path");
+ TEST_ASSERT(create_temp_path(stats_tx_path, sizeof(stats_tx_path), "pcap_stats_tx") == 0,
+ "Failed to create TX temp path");
+ TEST_ASSERT(create_test_pcap(rx_pcap_path, NUM_PACKETS) == 0,
+ "Failed to create input pcap");
+
+ ret = snprintf(devargs, sizeof(devargs),
+ "rx_pcap=%s,tx_pcap=%s", rx_pcap_path, stats_tx_path);
+ TEST_ASSERT(ret > 0 && ret < (int)sizeof(devargs),
+ "devargs string truncated");
+ TEST_ASSERT(create_pcap_vdev("net_pcap_stats", devargs, &port_id) == 0,
+ "Failed to create stats vdev");
+ TEST_ASSERT(setup_pcap_port(port_id) == 0,
+ "Failed to setup stats port");
+
+ /* Verify stats start at zero */
+ TEST_ASSERT(rte_eth_stats_reset(port_id) == 0,
+ "Failed to reset stats");
+ TEST_ASSERT(rte_eth_stats_get(port_id, &stats) == 0,
+ "Failed to get stats");
+ TEST_ASSERT(stats.ipackets == 0 && stats.opackets == 0 &&
+ stats.ibytes == 0 && stats.obytes == 0,
+ "Initial stats not zero");
+
+ /* RX and verify stats */
+ receive_packets(port_id, mbufs, NUM_PACKETS, &received);
+ TEST_ASSERT(rte_eth_stats_get(port_id, &stats) == 0,
+ "Failed to get stats after RX");
+ TEST_ASSERT_EQUAL(stats.ipackets, received,
+ "RX stats: ipackets=%"PRIu64", received=%u",
+ stats.ipackets, received);
+ TEST_ASSERT(stats.ibytes > 0,
+ "RX stats: ibytes should be > 0");
+ TEST_ASSERT_EQUAL(stats.ibytes, (uint64_t)received * sizeof(test_packet),
+ "RX stats: ibytes=%"PRIu64", expected=%"PRIu64,
+ stats.ibytes,
+ (uint64_t)received * sizeof(test_packet));
+
+ /* TX and verify stats */
+ nb_tx = rte_eth_tx_burst(port_id, 0, mbufs, received);
+ TEST_ASSERT(rte_eth_stats_get(port_id, &stats) == 0,
+ "Failed to get stats after TX");
+ TEST_ASSERT_EQUAL(stats.opackets, (uint64_t)nb_tx,
+ "TX stats: opackets=%"PRIu64", sent=%u",
+ stats.opackets, nb_tx);
+ TEST_ASSERT(stats.obytes > 0,
+ "TX stats: obytes should be > 0");
+ TEST_ASSERT_EQUAL(stats.obytes, (uint64_t)nb_tx * sizeof(test_packet),
+ "TX stats: obytes=%"PRIu64", expected=%"PRIu64,
+ stats.obytes,
+ (uint64_t)nb_tx * sizeof(test_packet));
+
+ /* Verify stats reset */
+ TEST_ASSERT(rte_eth_stats_reset(port_id) == 0,
+ "Failed to reset stats");
+ TEST_ASSERT(rte_eth_stats_get(port_id, &stats) == 0,
+ "Failed to get stats after reset");
+ TEST_ASSERT(stats.ipackets == 0 && stats.opackets == 0,
+ "Stats not reset to zero");
+
+ cleanup_pcap_vdev("net_pcap_stats", port_id);
+ remove_temp_file(rx_pcap_path);
+ remove_temp_file(stats_tx_path);
+
+ printf("Statistics PASSED: RX=%u, TX=%d\n", received, nb_tx);
+ return TEST_SUCCESS;
+}
+
+/*
+ * Test: Jumbo frame RX (multi-segment mbufs)
+ */
+static int
+test_jumbo_rx(void)
+{
+ struct rte_mbuf *mbufs[NUM_PACKETS];
+ char jumbo_pcap_path[PATH_MAX];
+ char devargs[256];
+ uint16_t port_id;
+ unsigned int received, i;
+ int ret;
+ const unsigned int num_jumbo = 16;
+
+ printf("Testing jumbo frame RX (%u byte packets, multi-segment)\n",
+ PKT_SIZE_JUMBO);
+
+ TEST_ASSERT(create_temp_path(jumbo_pcap_path, sizeof(jumbo_pcap_path), "pcap_jumbo") == 0,
+ "Failed to create temp path");
+ TEST_ASSERT(create_sized_pcap(jumbo_pcap_path, num_jumbo,
+ PKT_SIZE_JUMBO) == 0,
+ "Failed to create jumbo pcap");
+
+ ret = snprintf(devargs, sizeof(devargs), "rx_pcap=%s", jumbo_pcap_path);
+ TEST_ASSERT(ret > 0 && ret < (int)sizeof(devargs),
+ "devargs string truncated");
+ TEST_ASSERT(create_pcap_vdev("net_pcap_jumbo", devargs, &port_id) == 0,
+ "Failed to create jumbo RX vdev");
+
+ /* Jumbo frames require scatter to receive into multi-segment mbufs */
+ struct rte_eth_conf jumbo_conf = {
+ .rxmode.offloads = RTE_ETH_RX_OFFLOAD_SCATTER |
+ RTE_ETH_RX_OFFLOAD_TIMESTAMP,
+ };
+ TEST_ASSERT(setup_pcap_port_conf(port_id, &jumbo_conf) == 0,
+ "Failed to setup jumbo RX port");
+
+ receive_packets(port_id, mbufs, num_jumbo, &received);
+ TEST_ASSERT_EQUAL(received, num_jumbo,
+ "Received %u packets, expected %u", received, num_jumbo);
+
+ /* Verify all packets are jumbo size (may be multi-segment) */
+ for (i = 0; i < received; i++) {
+ uint32_t pkt_len = rte_pktmbuf_pkt_len(mbufs[i]);
+ uint16_t nb_segs = mbufs[i]->nb_segs;
+
+ TEST_ASSERT_EQUAL(pkt_len, PKT_SIZE_JUMBO,
+ "Packet %u: size %u, expected %u",
+ i, pkt_len, PKT_SIZE_JUMBO);
+
+ /* Jumbo frames should use multiple segments */
+ if (nb_segs > 1)
+ printf(" Packet %u: %u segments\n", i, nb_segs);
+ }
+
+ rte_pktmbuf_free_bulk(mbufs, received);
+ cleanup_pcap_vdev("net_pcap_jumbo", port_id);
+ remove_temp_file(jumbo_pcap_path);
+
+ printf("Jumbo RX PASSED: %u jumbo packets received\n", received);
+ return TEST_SUCCESS;
+}
+
+/*
+ * Test: Jumbo frame TX (multi-segment mbufs)
+ */
+static int
+test_jumbo_tx(void)
+{
+ struct rte_mbuf *mbufs[MAX_PKT_BURST];
+ char tx_path[PATH_MAX];
+ char devargs[256];
+ uint16_t port_id;
+ uint16_t sizes[MAX_PKT_BURST];
+ int nb_tx, pkt_count;
+ unsigned int i;
+ int ret;
+ const unsigned int num_jumbo = 8;
+
+ printf("Testing jumbo frame TX (multi-segment mbufs)\n");
+
+ TEST_ASSERT(create_temp_path(tx_path, sizeof(tx_path),
+ "pcap_jumbo_tx") == 0,
+ "Failed to create temp file path");
+
+ ret = snprintf(devargs, sizeof(devargs), "tx_pcap=%s", tx_path);
+ TEST_ASSERT(ret > 0 && ret < (int)sizeof(devargs),
+ "devargs string truncated");
+ TEST_ASSERT(create_pcap_vdev("net_pcap_jumbo_tx", devargs, &port_id) == 0,
+ "Failed to create TX vdev");
+ TEST_ASSERT(setup_pcap_port(port_id) == 0,
+ "Failed to setup TX port");
+
+ /* Allocate multi-segment mbufs for jumbo frames */
+ for (i = 0; i < num_jumbo; i++) {
+ mbufs[i] = alloc_jumbo_mbuf(PKT_SIZE_JUMBO, (uint8_t)(i & 0xFF));
+ if (mbufs[i] == NULL) {
+ /* Free already allocated mbufs */
+ while (i > 0)
+ rte_pktmbuf_free(mbufs[--i]);
+ cleanup_pcap_vdev("net_pcap_jumbo_tx", port_id);
+ remove_temp_file(tx_path);
+ return TEST_FAILED;
+ }
+ printf(" Packet %u: %u segments for %u bytes\n",
+ i, mbufs[i]->nb_segs, PKT_SIZE_JUMBO);
+ }
+
+ nb_tx = rte_eth_tx_burst(port_id, 0, mbufs, num_jumbo);
+ /* Free any unsent mbufs */
+ for (i = nb_tx; i < num_jumbo; i++)
+ rte_pktmbuf_free(mbufs[i]);
+
+ TEST_ASSERT_EQUAL(nb_tx, (int)num_jumbo,
+ "TX burst failed: sent %d/%u", nb_tx, num_jumbo);
+
+ cleanup_pcap_vdev("net_pcap_jumbo_tx", port_id);
+
+ /* Verify pcap file has correct packet count and sizes */
+ pkt_count = get_pcap_packet_sizes(tx_path, sizes, MAX_PKT_BURST);
+ TEST_ASSERT_EQUAL(pkt_count, (int)num_jumbo,
+ "Pcap file has %d packets, expected %u",
+ pkt_count, num_jumbo);
+
+ for (i = 0; i < (unsigned int)pkt_count; i++) {
+ TEST_ASSERT_EQUAL(sizes[i], PKT_SIZE_JUMBO,
+ "Packet %u: size %u, expected %u",
+ i, sizes[i], PKT_SIZE_JUMBO);
+ }
+
+ remove_temp_file(tx_path);
+
+ printf("Jumbo TX PASSED: %d jumbo packets written\n", nb_tx);
+ return TEST_SUCCESS;
+}
+
+/*
+ * Test: Layering on Linux network interface
+ */
+static int
+test_iface(void)
+{
+ struct rte_mbuf *mbufs[MAX_PKT_BURST];
+ struct rte_eth_dev_info dev_info;
+ char devargs[256];
+ uint16_t port_id;
+ const char *iface;
+ int ret, nb_tx, nb_pkt;
+
+ printf("Testing pcap on network interface\n");
+
+ iface = find_test_iface();
+ if (iface == NULL) {
+ printf("No suitable interface, skipping\n");
+ return TEST_SKIPPED;
+ }
+ printf("Using interface: %s\n", iface);
+
+ ret = snprintf(devargs, sizeof(devargs), "iface=%s", iface);
+ TEST_ASSERT(ret > 0 && ret < (int)sizeof(devargs),
+ "devargs string truncated");
+ if (rte_vdev_init("net_pcap_iface", devargs) < 0) {
+ printf("Cannot create iface vdev (needs root?), skipping\n");
+ return TEST_SKIPPED;
+ }
+
+ TEST_ASSERT(rte_eth_dev_get_port_by_name("net_pcap_iface",
+ &port_id) == 0,
+ "Failed to get iface port ID");
+ TEST_ASSERT(setup_pcap_port(port_id) == 0,
+ "Failed to setup iface port");
+
+ ret = rte_eth_dev_info_get(port_id, &dev_info);
+ TEST_ASSERT(ret == 0, "Failed to get dev info: %s", rte_strerror(-ret));
+
+ printf("Driver: %s, max_rx_queues=%u, max_tx_queues=%u\n",
+ dev_info.driver_name, dev_info.max_rx_queues,
+ dev_info.max_tx_queues);
+
+ /* Use packet_burst_generator for interface test */
+ nb_pkt = generate_test_packets(mp, mbufs, MAX_PKT_BURST,
+ PACKET_BURST_GEN_PKT_LEN);
+ TEST_ASSERT(nb_pkt > 0, "Failed to generate packets");
+
+ nb_tx = rte_eth_tx_burst(port_id, 0, mbufs, nb_pkt);
+ if (nb_tx < nb_pkt)
+ rte_pktmbuf_free_bulk(&mbufs[nb_tx], nb_pkt - nb_tx);
+
+ cleanup_pcap_vdev("net_pcap_iface", port_id);
+
+ printf("Interface test PASSED: sent %d packets\n", nb_tx);
+ return TEST_SUCCESS;
+}
+
+/*
+ * Test: Link status and speed reporting
+ *
+ * This test verifies that:
+ * 1. In interface (pass-through) mode, link state reflects the real interface
+ * 2. In file mode, link status follows device started/stopped state
+ * 3. Link speed values are properly reported
+ */
+static int
+test_link_status(void)
+{
+ struct rte_eth_link link;
+ char rx_pcap_path[PATH_MAX];
+ char devargs[256];
+ uint16_t port_id;
+ const char *iface;
+ int ret;
+
+ printf("Testing link status reporting\n");
+
+ /*
+ * Test 1: Interface (pass-through) mode
+ * Link state should reflect the underlying interface
+ */
+ iface = find_test_iface();
+ if (iface != NULL) {
+ printf(" Testing interface mode with: %s\n", iface);
+
+ ret = snprintf(devargs, sizeof(devargs), "iface=%s", iface);
+ TEST_ASSERT(ret > 0 && ret < (int)sizeof(devargs),
+ "devargs string truncated");
+ if (rte_vdev_init("net_pcap_link_iface", devargs) == 0) {
+ ret = rte_eth_dev_get_port_by_name("net_pcap_link_iface", &port_id);
+ TEST_ASSERT(ret == 0, "Failed to get port ID");
+
+ ret = setup_pcap_port(port_id);
+ TEST_ASSERT(ret == 0, "Failed to setup port");
+
+ /* Get link status */
+ ret = rte_eth_link_get_nowait(port_id, &link);
+ TEST_ASSERT(ret == 0, "Failed to get link: %s", rte_strerror(-ret));
+
+ printf(" Link status: %s\n",
+ link.link_status ? "UP" : "DOWN");
+ printf(" Link speed: %u Mbps\n", link.link_speed);
+ printf(" Link duplex: %s\n",
+ link.link_duplex ? "full" : "half");
+ printf(" Link autoneg: %s\n",
+ link.link_autoneg ? "enabled" : "disabled");
+
+ /*
+ * For loopback interface, link should be up.
+ * Speed may be 0 or undefined for virtual interfaces.
+ */
+ if (strcmp(iface, "lo") == 0) {
+ TEST_ASSERT(link.link_status == RTE_ETH_LINK_UP,
+ "Loopback should report link UP");
+ }
+
+ /*
+ * Verify link_get returns consistent results
+ */
+ struct rte_eth_link link2;
+ ret = rte_eth_link_get(port_id, &link2);
+ TEST_ASSERT(ret == 0, "Second link_get failed");
+ TEST_ASSERT(link.link_status == link2.link_status,
+ "Link status inconsistent between calls");
+
+ cleanup_pcap_vdev("net_pcap_link_iface", port_id);
+ printf(" Interface mode link test PASSED\n");
+ } else {
+ printf(" Cannot create iface vdev (needs root?), skipping iface test\n");
+ }
+ } else {
+ printf(" No suitable interface found, skipping iface test\n");
+ }
+
+ /*
+ * Test 2: File mode
+ * Link status should be DOWN before start, UP after start
+ */
+ printf(" Testing file mode link status\n");
+
+ /* Create a simple pcap file for testing */
+ TEST_ASSERT(create_temp_path(rx_pcap_path, sizeof(rx_pcap_path), "pcap_link") == 0,
+ "Failed to create temp path");
+ TEST_ASSERT(create_test_pcap(rx_pcap_path, 1) == 0,
+ "Failed to create test pcap");
+
+ ret = snprintf(devargs, sizeof(devargs), "rx_pcap=%s", rx_pcap_path);
+ TEST_ASSERT(ret > 0 && ret < (int)sizeof(devargs),
+ "devargs string truncated");
+ TEST_ASSERT(create_pcap_vdev("net_pcap_link_file", devargs, &port_id) == 0,
+ "Failed to create file vdev");
+
+ /* Before starting: configure but don't start */
+ struct rte_eth_conf port_conf = { 0 };
+ ret = rte_eth_dev_configure(port_id, 1, 1, &port_conf);
+ TEST_ASSERT(ret == 0, "Failed to configure port");
+
+ ret = rte_eth_rx_queue_setup(port_id, 0, RING_SIZE, SOCKET0, NULL, mp);
+ TEST_ASSERT(ret == 0, "Failed to setup RX queue");
+
+ ret = rte_eth_tx_queue_setup(port_id, 0, RING_SIZE, SOCKET0, NULL);
+ TEST_ASSERT(ret == 0, "Failed to setup TX queue");
+
+ /* Check link before start - should be DOWN */
+ ret = rte_eth_link_get_nowait(port_id, &link);
+ TEST_ASSERT(ret == 0, "Failed to get link before start");
+ printf(" Before start: link %s, speed %u Mbps\n",
+ link.link_status ? "UP" : "DOWN", link.link_speed);
+ TEST_ASSERT(link.link_status == RTE_ETH_LINK_DOWN,
+ "Link should be DOWN before start");
+
+ /* Start the port */
+ ret = rte_eth_dev_start(port_id);
+ TEST_ASSERT(ret == 0, "Failed to start port");
+
+ /* Check link after start - should be UP */
+ ret = rte_eth_link_get_nowait(port_id, &link);
+ TEST_ASSERT(ret == 0, "Failed to get link after start");
+ printf(" After start: link %s, speed %u Mbps\n",
+ link.link_status ? "UP" : "DOWN", link.link_speed);
+ TEST_ASSERT(link.link_status == RTE_ETH_LINK_UP,
+ "Link should be UP after start");
+
+ /* Stop the port */
+ ret = rte_eth_dev_stop(port_id);
+ TEST_ASSERT(ret == 0, "Failed to stop port");
+
+ /* Check link after stop - should be DOWN again */
+ ret = rte_eth_link_get_nowait(port_id, &link);
+ TEST_ASSERT(ret == 0, "Failed to get link after stop");
+ printf(" After stop: link %s\n",
+ link.link_status ? "UP" : "DOWN");
+ TEST_ASSERT(link.link_status == RTE_ETH_LINK_DOWN,
+ "Link should be DOWN after stop");
+
+ rte_vdev_uninit("net_pcap_link_file");
+ remove_temp_file(rx_pcap_path);
+
+ printf("Link status test PASSED\n");
+ return TEST_SUCCESS;
+}
+
+#ifdef RTE_EXEC_ENV_WINDOWS
+static int
+test_lsc_iface(void)
+{
+ printf(" Link toggle test not supported on Windows, skipping\n");
+ return TEST_SKIPPED;
+}
+#else
+/*
+ * Test: Link Status Change (LSC) interrupt support
+ *
+ * Verifies that:
+ * 1. LSC capability is NOT advertised for file mode
+ * 2. LSC capability IS advertised for iface mode
+ * 3. LSC callback fires when the underlying interface goes down/up
+ *
+ * Requires a toggleable Ethernet interface created before running:
+ * Linux: ip link add dummy0 type dummy && ip link set dummy0 up
+ * FreeBSD: ifconfig disc0 create && ifconfig disc0 up
+ *
+ * Skipped if no suitable interface is found or on Windows.
+ */
+
+/* Callback counter for LSC test */
+static volatile int lsc_callback_count;
+
+static int
+test_lsc_callback(uint16_t port_id __rte_unused,
+ enum rte_eth_event_type event __rte_unused,
+ void *cb_arg __rte_unused, void *ret_param __rte_unused)
+{
+ lsc_callback_count++;
+ return 0;
+}
+
+/*
+ * Helper: Set interface link up or down via ioctl.
+ * Returns 0 on success, -errno on failure.
+ */
+static int
+set_iface_up_down(const char *ifname, int up)
+{
+ struct ifreq ifr;
+ int fd, ret;
+
+ fd = socket(AF_INET, SOCK_DGRAM, 0);
+ if (fd < 0)
+ return -errno;
+
+ memset(&ifr, 0, sizeof(ifr));
+ strlcpy(ifr.ifr_name, ifname, IFNAMSIZ);
+
+ ret = ioctl(fd, SIOCGIFFLAGS, &ifr);
+ if (ret < 0) {
+ ret = -errno;
+ close(fd);
+ return ret;
+ }
+
+ if (up)
+ ifr.ifr_flags |= IFF_UP;
+ else
+ ifr.ifr_flags &= ~IFF_UP;
+
+ ret = ioctl(fd, SIOCSIFFLAGS, &ifr);
+ if (ret < 0)
+ ret = -errno;
+ else
+ ret = 0;
+
+ close(fd);
+ return ret;
+}
+
+/*
+ * Helper: Find a toggleable test interface for LSC testing.
+ *
+ * Looks for well-known interfaces that are safe to bring up/down:
+ * Linux: dummy0 (ip link add dummy0 type dummy)
+ * FreeBSD: disc0 (ifconfig disc0 create)
+ *
+ * Returns interface name or NULL if none found.
+ */
+static const char *
+find_lsc_test_iface(void)
+{
+ static const char * const candidates[] = { "dummy0", "disc0" };
+ unsigned int i;
+
+ for (i = 0; i < RTE_DIM(candidates); i++) {
+ if (iface_is_ethernet(candidates[i]))
+ return candidates[i];
+ }
+ return NULL;
+}
+
+static int
+test_lsc_iface(void)
+{
+ struct rte_eth_dev_info dev_info;
+ char devargs[256];
+ int ret;
+
+ printf("Testing Link Status Change (LSC) support\n");
+
+ /*
+ * Test 1: Verify LSC is NOT advertised for file mode
+ */
+ printf(" Testing file mode does not advertise LSC\n");
+ {
+ char lsc_pcap_path[PATH_MAX];
+ uint16_t file_port_id;
+
+ TEST_ASSERT(create_temp_path(lsc_pcap_path, sizeof(lsc_pcap_path),
+ "pcap_lsc") == 0,
+ "Failed to create temp path");
+ TEST_ASSERT(create_test_pcap(lsc_pcap_path, 1) == 0,
+ "Failed to create test pcap");
+
+ ret = snprintf(devargs, sizeof(devargs), "rx_pcap=%s", lsc_pcap_path);
+ TEST_ASSERT(ret > 0 && ret < (int)sizeof(devargs),
+ "devargs string truncated");
+ TEST_ASSERT(create_pcap_vdev("net_pcap_lsc_file", devargs,
+ &file_port_id) == 0,
+ "Failed to create file vdev");
+
+ ret = rte_eth_dev_info_get(file_port_id, &dev_info);
+ TEST_ASSERT(ret == 0, "Failed to get dev info");
+
+ TEST_ASSERT((*dev_info.dev_flags & RTE_ETH_DEV_INTR_LSC) == 0,
+ "File mode should NOT advertise LSC capability");
+
+ rte_vdev_uninit("net_pcap_lsc_file");
+ remove_temp_file(lsc_pcap_path);
+ printf(" File mode LSC check PASSED\n");
+ }
+
+ struct rte_eth_link link;
+ struct rte_eth_conf port_conf = {
+ .intr_conf.lsc = 1,
+ };
+ uint16_t port_id;
+
+ /*
+ * Test 2: Use a toggleable interface to test link change events.
+ * Skip if not present.
+ */
+ const char *lsc_iface = find_lsc_test_iface();
+ if (lsc_iface == NULL) {
+ printf(" No toggleable interface found, skipping link change test\n");
+ printf(" Linux: ip link add dummy0 type dummy && ip link set dummy0 up\n");
+ printf(" FreeBSD: ifconfig disc0 create && ifconfig disc0 up\n");
+ return TEST_SKIPPED;
+ }
+
+ printf(" Testing iface mode LSC with: %s\n", lsc_iface);
+
+ /* Ensure interface is up before we start */
+ ret = set_iface_up_down(lsc_iface, 1);
+ if (ret != 0) {
+ printf(" Cannot set %s up (%s), skipping\n",
+ lsc_iface, strerror(-ret));
+ return TEST_SKIPPED;
+ }
+
+ ret = snprintf(devargs, sizeof(devargs), "iface=%s", lsc_iface);
+ TEST_ASSERT(ret > 0 && ret < (int)sizeof(devargs),
+ "devargs string truncated");
+ ret = rte_vdev_init("net_pcap_lsc", devargs);
+ if (ret < 0) {
+ printf(" Cannot create iface vdev for %s, skipping\n", lsc_iface);
+ return TEST_SKIPPED;
+ }
+
+ ret = rte_eth_dev_get_port_by_name("net_pcap_lsc", &port_id);
+ TEST_ASSERT(ret == 0, "Failed to get port ID");
+
+ /* Verify LSC capability is advertised */
+ ret = rte_eth_dev_info_get(port_id, &dev_info);
+ TEST_ASSERT(ret == 0, "Failed to get dev info");
+ TEST_ASSERT(*dev_info.dev_flags & RTE_ETH_DEV_INTR_LSC,
+ "Iface mode should advertise LSC capability");
+ printf(" LSC capability advertised: yes\n");
+
+ /* Register LSC callback */
+ lsc_callback_count = 0;
+ ret = rte_eth_dev_callback_register(port_id, RTE_ETH_EVENT_INTR_LSC,
+ test_lsc_callback, NULL);
+ TEST_ASSERT(ret == 0, "Failed to register LSC callback");
+
+ /* Configure with LSC enabled and start */
+ TEST_ASSERT(setup_pcap_port_conf(port_id, &port_conf) == 0,
+ "Failed to setup port with LSC");
+
+ /* Verify link is up initially */
+ ret = rte_eth_link_get_nowait(port_id, &link);
+ TEST_ASSERT(ret == 0, "Failed to get link status");
+ TEST_ASSERT(link.link_status == RTE_ETH_LINK_UP,
+ "Link should be UP after start");
+ printf(" Link after start: UP\n");
+
+ /* Bring interface down - should trigger LSC */
+ lsc_callback_count = 0;
+ ret = set_iface_up_down(lsc_iface, 0);
+ TEST_ASSERT(ret == 0, "Failed to set %s down: %s",
+ lsc_iface, strerror(-ret));
+
+ /* Poll for link state change (1 second poll interval in driver) */
+ {
+ int poll;
+ for (poll = 0; poll < 30; poll++) {
+ ret = rte_eth_link_get_nowait(port_id, &link);
+ if (ret == 0 && link.link_status == RTE_ETH_LINK_DOWN &&
+ lsc_callback_count >= 1)
+ break;
+ usleep(100 * 1000);
+ }
+ }
+
+ ret = rte_eth_link_get_nowait(port_id, &link);
+ TEST_ASSERT(ret == 0, "Failed to get link after down");
+ TEST_ASSERT(link.link_status == RTE_ETH_LINK_DOWN,
+ "Link should be DOWN after interface down");
+ TEST_ASSERT(lsc_callback_count >= 1,
+ "LSC callback should have fired, count=%d",
+ lsc_callback_count);
+ printf(" Interface down: link DOWN, callbacks=%d\n",
+ lsc_callback_count);
+
+ /* Bring it back up - should trigger another LSC */
+ lsc_callback_count = 0;
+ ret = set_iface_up_down(lsc_iface, 1);
+ TEST_ASSERT(ret == 0, "Failed to set %s up: %s",
+ lsc_iface, strerror(-ret));
+
+ /* Poll for link state change back to UP */
+ {
+ int poll;
+ for (poll = 0; poll < 30; poll++) {
+ ret = rte_eth_link_get_nowait(port_id, &link);
+ if (ret == 0 && link.link_status == RTE_ETH_LINK_UP &&
+ lsc_callback_count >= 1)
+ break;
+ usleep(100 * 1000);
+ }
+ }
+
+ ret = rte_eth_link_get_nowait(port_id, &link);
+ TEST_ASSERT(ret == 0, "Failed to get link after up");
+ TEST_ASSERT(link.link_status == RTE_ETH_LINK_UP,
+ "Link should be UP after interface up");
+ TEST_ASSERT(lsc_callback_count >= 1,
+ "LSC callback should have fired on link restore, count=%d",
+ lsc_callback_count);
+ printf(" Interface up: link UP, callbacks=%d\n",
+ lsc_callback_count);
+
+ /* Cleanup */
+ rte_eth_dev_stop(port_id);
+ rte_eth_dev_callback_unregister(port_id, RTE_ETH_EVENT_INTR_LSC,
+ test_lsc_callback, NULL);
+ rte_vdev_uninit("net_pcap_lsc");
+
+ printf("LSC test PASSED\n");
+ return TEST_SUCCESS;
+}
+#endif /* RTE_EXEC_ENV_WINDOWS */
+
+/*
+ * Test: EOF notification via link status change
+ *
+ * Verifies that:
+ * 1. The eof devarg causes link down + LSC event at end of pcap file
+ * 2. link_get reports DOWN after EOF
+ * 3. Stop/start resets the EOF state and replays the file
+ * 4. The eof and infinite_rx options are mutually exclusive
+ */
+
+static volatile int eof_callback_count;
+
+static int
+test_eof_callback(uint16_t port_id __rte_unused,
+ enum rte_eth_event_type event __rte_unused,
+ void *cb_arg __rte_unused, void *ret_param __rte_unused)
+{
+ eof_callback_count++;
+ return 0;
+}
+
+static int
+test_eof_rx(void)
+{
+ struct rte_mbuf *mbufs[MAX_PKT_BURST];
+ struct rte_eth_conf port_conf = {
+ .intr_conf.lsc = 1,
+ };
+ struct rte_eth_link link;
+ char eof_pcap_path[PATH_MAX];
+ char devargs[256];
+ uint16_t port_id;
+ unsigned int total_rx;
+ int ret;
+
+ printf("Testing EOF notification via link status change\n");
+
+ /* Create pcap file with known number of packets */
+ TEST_ASSERT(create_temp_path(eof_pcap_path, sizeof(eof_pcap_path),
+ "pcap_eof") == 0,
+ "Failed to create temp file path");
+ TEST_ASSERT(create_test_pcap(eof_pcap_path, NUM_PACKETS) == 0,
+ "Failed to create test pcap file");
+
+ /*
+ * Test 1: EOF triggers link down and LSC callback
+ */
+ printf(" Testing EOF triggers link down and LSC event\n");
+
+ ret = snprintf(devargs, sizeof(devargs), "rx_pcap=%s,eof=1",
+ eof_pcap_path);
+ TEST_ASSERT(ret > 0 && ret < (int)sizeof(devargs),
+ "devargs string truncated");
+ ret = rte_vdev_init("net_pcap_eof", devargs);
+ TEST_ASSERT(ret == 0, "Failed to create eof vdev: %s",
+ rte_strerror(-ret));
+
+ ret = rte_eth_dev_get_port_by_name("net_pcap_eof", &port_id);
+ TEST_ASSERT(ret == 0, "Failed to get port ID");
+
+ /* Verify LSC capability is advertised */
+ struct rte_eth_dev_info dev_info;
+ ret = rte_eth_dev_info_get(port_id, &dev_info);
+ TEST_ASSERT(ret == 0, "Failed to get dev info");
+ TEST_ASSERT(*dev_info.dev_flags & RTE_ETH_DEV_INTR_LSC,
+ "EOF mode should advertise LSC capability");
+
+ /* Register LSC callback */
+ eof_callback_count = 0;
+ ret = rte_eth_dev_callback_register(port_id, RTE_ETH_EVENT_INTR_LSC,
+ test_eof_callback, NULL);
+ TEST_ASSERT(ret == 0, "Failed to register LSC callback");
+
+ /* Configure with LSC enabled and start */
+ TEST_ASSERT(setup_pcap_port_conf(port_id, &port_conf) == 0,
+ "Failed to setup port with LSC");
+
+ /* Verify link is up initially */
+ ret = rte_eth_link_get_nowait(port_id, &link);
+ TEST_ASSERT(ret == 0, "Failed to get link");
+ TEST_ASSERT(link.link_status == RTE_ETH_LINK_UP,
+ "Link should be UP after start");
+
+ /* Drain all packets from the pcap file */
+ total_rx = 0;
+ for (int attempts = 0; attempts < 200; attempts++) {
+ uint16_t nb_rx = rte_eth_rx_burst(port_id, 0, mbufs,
+ MAX_PKT_BURST);
+ if (nb_rx > 0) {
+ rte_pktmbuf_free_bulk(mbufs, nb_rx);
+ total_rx += nb_rx;
+ } else if (total_rx >= NUM_PACKETS) {
+ /* Got all packets and rx returned 0 — EOF hit */
+ break;
+ }
+ }
+
+ printf(" Received %u packets (expected %d)\n", total_rx, NUM_PACKETS);
+ TEST_ASSERT_EQUAL(total_rx, NUM_PACKETS,
+ "Should receive exactly %d packets", NUM_PACKETS);
+
+ /* Verify link went down */
+ ret = rte_eth_link_get_nowait(port_id, &link);
+ TEST_ASSERT(ret == 0, "Failed to get link after EOF");
+ TEST_ASSERT(link.link_status == RTE_ETH_LINK_DOWN,
+ "Link should be DOWN after EOF");
+
+ /* Poll for the deferred EOF alarm to fire on the interrupt thread */
+ {
+ int poll;
+ for (poll = 0; poll < 20; poll++) {
+ if (eof_callback_count >= 1)
+ break;
+ usleep(100 * 1000);
+ }
+ }
+
+ /* Verify callback fired exactly once */
+ TEST_ASSERT_EQUAL(eof_callback_count, 1,
+ "LSC callback should fire once, fired %d times",
+ eof_callback_count);
+ printf(" EOF signaled: link DOWN, callback fired\n");
+
+ /*
+ * Test 2: Stop/start resets EOF and replays the file
+ */
+ printf(" Testing restart replays pcap file\n");
+
+ ret = rte_eth_dev_stop(port_id);
+ TEST_ASSERT(ret == 0, "Failed to stop port");
+
+ eof_callback_count = 0;
+
+ ret = rte_eth_dev_start(port_id);
+ TEST_ASSERT(ret == 0, "Failed to restart port");
+
+ /* Verify link is up again */
+ ret = rte_eth_link_get_nowait(port_id, &link);
+ TEST_ASSERT(ret == 0, "Failed to get link after restart");
+ TEST_ASSERT(link.link_status == RTE_ETH_LINK_UP,
+ "Link should be UP after restart");
+
+ /* Read packets again */
+ total_rx = 0;
+ for (int attempts = 0; attempts < 200; attempts++) {
+ uint16_t nb_rx = rte_eth_rx_burst(port_id, 0, mbufs,
+ MAX_PKT_BURST);
+ if (nb_rx > 0) {
+ rte_pktmbuf_free_bulk(mbufs, nb_rx);
+ total_rx += nb_rx;
+ } else if (total_rx >= NUM_PACKETS) {
+ break;
+ }
+ }
+
+ TEST_ASSERT_EQUAL(total_rx, NUM_PACKETS,
+ "Restart: should receive %d packets, got %u",
+ NUM_PACKETS, total_rx);
+
+ /* Poll for the deferred EOF alarm to fire on the interrupt thread */
+ {
+ int poll;
+ for (poll = 0; poll < 20; poll++) {
+ if (eof_callback_count >= 1)
+ break;
+ usleep(100 * 1000);
+ }
+ }
+
+ TEST_ASSERT_EQUAL(eof_callback_count, 1,
+ "Restart: callback should fire once, fired %d times",
+ eof_callback_count);
+ printf(" Restart replay: %u packets, EOF signaled again\n", total_rx);
+
+ /* Cleanup */
+ rte_eth_dev_callback_unregister(port_id, RTE_ETH_EVENT_INTR_LSC,
+ test_eof_callback, NULL);
+ rte_eth_dev_stop(port_id);
+ rte_vdev_uninit("net_pcap_eof");
+
+ /*
+ * Test 3: eof + infinite_rx is rejected
+ */
+ printf(" Testing eof + infinite_rx mutual exclusion\n");
+
+ ret = snprintf(devargs, sizeof(devargs),
+ "rx_pcap=%s,eof=1,infinite_rx=1", eof_pcap_path);
+ TEST_ASSERT(ret > 0 && ret < (int)sizeof(devargs),
+ "devargs string truncated");
+ ret = rte_vdev_init("net_pcap_eof_bad", devargs);
+ TEST_ASSERT(ret != 0, "eof + infinite_rx should be rejected");
+ printf(" Mutual exclusion check PASSED\n");
+
+ remove_temp_file(eof_pcap_path);
+
+ printf("EOF test PASSED\n");
+ return TEST_SUCCESS;
+}
+
+/*
+ * Test: Verify receive timestamps from pcap file
+ */
+static int
+test_rx_timestamp(void)
+{
+ struct rte_mbuf *mbufs[NUM_PACKETS];
+ char timestamp_pcap_path[PATH_MAX];
+ char devargs[256];
+ uint16_t port_id;
+ unsigned int received, i;
+ int ret;
+ const uint32_t base_sec = 1000;
+ const uint32_t usec_increment = 10000; /* 10ms between packets */
+ rte_mbuf_timestamp_t prev_ts = 0;
+
+ printf("Testing RX timestamp accuracy\n");
+
+ TEST_ASSERT(create_temp_path(timestamp_pcap_path, sizeof(timestamp_pcap_path),
+ "pcap_ts") == 0,
+ "Failed to create temp path");
+ TEST_ASSERT(create_timestamped_pcap(timestamp_pcap_path, NUM_PACKETS,
+ base_sec, usec_increment) == 0,
+ "Failed to create timestamped pcap");
+
+ ret = snprintf(devargs, sizeof(devargs), "rx_pcap=%s", timestamp_pcap_path);
+ TEST_ASSERT(ret > 0 && ret < (int)sizeof(devargs),
+ "devargs string truncated");
+ TEST_ASSERT(create_pcap_vdev("net_pcap_ts", devargs, &port_id) == 0,
+ "Failed to create timestamp vdev");
+ TEST_ASSERT(setup_pcap_port(port_id) == 0,
+ "Failed to setup timestamp port");
+
+ /* Try to initialize timestamp dynamic field access */
+ TEST_ASSERT(timestamp_init() == 0, "Timestamp dynfield not available");
+
+ receive_packets(port_id, mbufs, NUM_PACKETS, &received);
+ TEST_ASSERT_EQUAL(received, NUM_PACKETS,
+ "Received %u packets, expected %d", received, NUM_PACKETS);
+
+ /* Check if first packet has timestamp flag set */
+ if (!mbuf_has_timestamp(mbufs[0])) {
+ printf("Timestamps not enabled in mbufs, skipping validation\n");
+ rte_pktmbuf_free_bulk(mbufs, received);
+ cleanup_pcap_vdev("net_pcap_ts", port_id);
+ return TEST_SKIPPED;
+ }
+
+ for (i = 0; i < received; i++) {
+ struct rte_mbuf *m = mbufs[i];
+
+ TEST_ASSERT(mbuf_has_timestamp(m),
+ "Packet %u missing timestamp flag", i);
+
+ /* PCAP PMD stores timestamp in nanoseconds */
+ rte_mbuf_timestamp_t ts = mbuf_timestamp_get(mbufs[i]);
+ uint64_t expected = (uint64_t)base_sec * NS_PER_S
+ + (uint64_t)i * usec_increment * 1000;
+
+ TEST_ASSERT_EQUAL(ts, expected,
+ "Packet %u: timestamp mismatch, expected=%"PRIu64
+ " actual=%"PRIu64, i, expected, ts);
+
+ /* Verify monotonically increasing timestamps */
+ if (i > 0) {
+ TEST_ASSERT(ts >= prev_ts,
+ "Packet %u: timestamp not monotonic %"PRIu64" > %"PRIu64,
+ i, prev_ts, ts);
+ }
+ prev_ts = ts;
+ }
+
+ rte_pktmbuf_free_bulk(mbufs, received);
+ cleanup_pcap_vdev("net_pcap_ts", port_id);
+ remove_temp_file(timestamp_pcap_path);
+
+ printf("RX timestamp PASSED: %u packets with valid timestamps\n", received);
+ return TEST_SUCCESS;
+}
+
+/* Helper: Generate packets for multi-queue tests */
+static int
+generate_mq_test_packets(struct rte_mbuf **pkts, unsigned int nb_pkts, uint16_t queue_id)
+{
+ struct rte_ether_hdr eth_hdr;
+ struct rte_ipv4_hdr ip_hdr;
+ struct rte_udp_hdr udp_hdr;
+ uint16_t pkt_data_len;
+ unsigned int i;
+
+ initialize_eth_header(ð_hdr, &src_mac, &dst_mac, RTE_ETHER_TYPE_IPV4, 0, 0);
+ pkt_data_len = sizeof(struct rte_udp_hdr);
+ initialize_udp_header(&udp_hdr, 1234, 1234, pkt_data_len);
+ initialize_ipv4_header(&ip_hdr, IPV4_ADDR(192, 168, 1, 1), IPV4_ADDR(192, 168, 1, 2),
+ pkt_data_len + sizeof(struct rte_udp_hdr));
+
+ for (i = 0; i < nb_pkts; i++) {
+ pkts[i] = rte_pktmbuf_alloc(mp);
+ if (pkts[i] == NULL) {
+ printf("Failed to allocate mbuf\n");
+ while (i > 0)
+ rte_pktmbuf_free(pkts[--i]);
+ return -1;
+ }
+
+ char *pkt_data = rte_pktmbuf_append(pkts[i], PACKET_BURST_GEN_PKT_LEN);
+ if (pkt_data == NULL) {
+ printf("Failed to append data to mbuf\n");
+ rte_pktmbuf_free(pkts[i]);
+ while (i > 0)
+ rte_pktmbuf_free(pkts[--i]);
+ return -1;
+ }
+
+ size_t offset = 0;
+ memcpy(pkt_data + offset, ð_hdr, sizeof(eth_hdr));
+ offset += sizeof(eth_hdr);
+
+ /* Mark packet with queue ID in IP packet_id field for tracing */
+ ip_hdr.packet_id = rte_cpu_to_be_16((queue_id << 8) | (i & 0xFF));
+ ip_hdr.hdr_checksum = 0;
+ ip_hdr.hdr_checksum = rte_ipv4_cksum(&ip_hdr);
+
+ memcpy(pkt_data + offset, &ip_hdr, sizeof(ip_hdr));
+ offset += sizeof(ip_hdr);
+ memcpy(pkt_data + offset, &udp_hdr, sizeof(udp_hdr));
+ }
+ return (int)nb_pkts;
+}
+
+/* Helper: Validate pcap file structure using libpcap */
+static int
+validate_pcap_file(const char *filename)
+{
+ pcap_t *pcap;
+ char errbuf[PCAP_ERRBUF_SIZE];
+
+ pcap = pcap_open_offline(filename, errbuf);
+ if (pcap == NULL) {
+ printf("Failed to validate pcap file %s: %s\n", filename, errbuf);
+ return -1;
+ }
+ if (pcap_datalink(pcap) != DLT_EN10MB) {
+ printf("Unexpected datalink type: %d\n", pcap_datalink(pcap));
+ pcap_close(pcap);
+ return -1;
+ }
+ pcap_close(pcap);
+ return 0;
+}
+
+/*
+ * Test: Multiple TX queues writing to separate pcap files
+ *
+ * This test creates a pcap PMD with multiple TX queues, each configured
+ * to write to its own output file. We verify that:
+ * 1. All packets from all queues are written
+ * 2. Each resulting pcap file is valid
+ * 3. Each file has the expected packet count
+ */
+static int
+test_multi_tx_queue(void)
+{
+ char multi_tx_pcap_paths[MULTI_QUEUE_NUM_QUEUES][PATH_MAX];
+ char devargs[512];
+ uint16_t port_id;
+ struct rte_eth_conf port_conf;
+ struct rte_eth_txconf tx_conf;
+ struct rte_mbuf *pkts[MULTI_QUEUE_BURST_SIZE];
+ uint16_t q;
+ int ret;
+ unsigned int total_tx = 0;
+ unsigned int tx_per_queue[MULTI_QUEUE_NUM_QUEUES] = {0};
+
+ printf("Testing multiple TX queues to separate files\n");
+
+ /* Create temp paths for each TX queue */
+ for (q = 0; q < MULTI_QUEUE_NUM_QUEUES; q++) {
+ char prefix[32];
+ snprintf(prefix, sizeof(prefix), "pcap_multi_tx%u", q);
+ TEST_ASSERT(create_temp_path(multi_tx_pcap_paths[q],
+ sizeof(multi_tx_pcap_paths[q]), prefix) == 0,
+ "Failed to create temp path for queue %u", q);
+ }
+
+ /* Create the pcap PMD with multiple TX queues to separate files */
+ ret = snprintf(devargs, sizeof(devargs), "tx_pcap=%s,tx_pcap=%s,tx_pcap=%s,tx_pcap=%s",
+ multi_tx_pcap_paths[0], multi_tx_pcap_paths[1],
+ multi_tx_pcap_paths[2], multi_tx_pcap_paths[3]);
+ TEST_ASSERT(ret > 0 && ret < (int)sizeof(devargs),
+ "devargs string truncated");
+
+ ret = rte_vdev_init("net_pcap_multi_tx", devargs);
+ TEST_ASSERT_SUCCESS(ret, "Failed to create pcap PMD: %s", rte_strerror(-ret));
+
+ ret = rte_eth_dev_get_port_by_name("net_pcap_multi_tx", &port_id);
+ TEST_ASSERT_SUCCESS(ret, "Cannot find added pcap device");
+
+ memset(&port_conf, 0, sizeof(port_conf));
+ ret = rte_eth_dev_configure(port_id, 0, MULTI_QUEUE_NUM_QUEUES, &port_conf);
+ TEST_ASSERT_SUCCESS(ret, "Failed to configure device: %s", rte_strerror(-ret));
+
+ memset(&tx_conf, 0, sizeof(tx_conf));
+ for (q = 0; q < MULTI_QUEUE_NUM_QUEUES; q++) {
+ ret = rte_eth_tx_queue_setup(port_id, q, RING_SIZE,
+ rte_eth_dev_socket_id(port_id), &tx_conf);
+ TEST_ASSERT_SUCCESS(ret, "Failed to setup TX queue %u: %s", q, rte_strerror(-ret));
+ }
+
+ ret = rte_eth_dev_start(port_id);
+ TEST_ASSERT_SUCCESS(ret, "Failed to start device: %s", rte_strerror(-ret));
+
+ /* Transmit packets from each queue */
+ for (q = 0; q < MULTI_QUEUE_NUM_QUEUES; q++) {
+ unsigned int pkts_to_send = MULTI_QUEUE_NUM_PACKETS / MULTI_QUEUE_NUM_QUEUES;
+
+ while (tx_per_queue[q] < pkts_to_send) {
+ unsigned int burst = RTE_MIN(MULTI_QUEUE_BURST_SIZE,
+ pkts_to_send - tx_per_queue[q]);
+
+ ret = generate_mq_test_packets(pkts, burst, q);
+ TEST_ASSERT(ret >= 0, "Failed to generate packets for queue %u", q);
+
+ uint16_t nb_tx = rte_eth_tx_burst(port_id, q, pkts, burst);
+ for (unsigned int i = nb_tx; i < burst; i++)
+ rte_pktmbuf_free(pkts[i]);
+
+ tx_per_queue[q] += nb_tx;
+ total_tx += nb_tx;
+
+ if (nb_tx == 0) {
+ printf("TX stall on queue %u\n", q);
+ break;
+ }
+ }
+ printf(" Queue %u: transmitted %u packets\n", q, tx_per_queue[q]);
+ }
+
+ rte_eth_dev_stop(port_id);
+ rte_vdev_uninit("net_pcap_multi_tx");
+ rte_delay_ms(100);
+
+ /* Validate each pcap file */
+ unsigned int total_in_files = 0;
+ for (q = 0; q < MULTI_QUEUE_NUM_QUEUES; q++) {
+ ret = validate_pcap_file(multi_tx_pcap_paths[q]);
+ TEST_ASSERT_SUCCESS(ret, "pcap file for queue %u is invalid", q);
+
+ int pkt_count = count_pcap_packets(multi_tx_pcap_paths[q]);
+ TEST_ASSERT(pkt_count >= 0, "Could not count packets in pcap file for queue %u", q);
+
+ printf(" Queue %u file: %d packets\n", q, pkt_count);
+ TEST_ASSERT_EQUAL((unsigned int)pkt_count, tx_per_queue[q],
+ "Queue %u: file has %d packets, expected %u",
+ q, pkt_count, tx_per_queue[q]);
+ total_in_files += pkt_count;
+ }
+
+ printf(" Total packets transmitted: %u\n", total_tx);
+ printf(" Total packets in all files: %u\n", total_in_files);
+
+ TEST_ASSERT_EQUAL(total_in_files, total_tx,
+ "Total packet count mismatch: expected %u, got %u",
+ total_tx, total_in_files);
+
+ for (q = 0; q < MULTI_QUEUE_NUM_QUEUES; q++)
+ remove_temp_file(multi_tx_pcap_paths[q]);
+
+ printf("Multi-TX queue PASSED\n");
+ return TEST_SUCCESS;
+}
+
+/*
+ * Test: Multiple RX queues reading from the same pcap file
+ *
+ * This test creates a pcap PMD with multiple RX queues all configured
+ * to read from the same input file. We verify that:
+ * 1. Each queue can read packets
+ * 2. The total packets read equals the file content (or expected behavior)
+ */
+static int
+test_multi_rx_queue_same_file(void)
+{
+ char multi_rx_pcap_path[PATH_MAX];
+ char devargs[512];
+ uint16_t port_id;
+ struct rte_eth_conf port_conf;
+ struct rte_eth_rxconf rx_conf;
+ struct rte_mbuf *pkts[MULTI_QUEUE_BURST_SIZE];
+ uint16_t q;
+ int ret;
+ unsigned int total_rx = 0;
+ unsigned int rx_per_queue[MULTI_QUEUE_NUM_QUEUES] = {0};
+ unsigned int seed_packets = MULTI_QUEUE_NUM_PACKETS;
+ unsigned int expected_total;
+
+ printf("Testing multiple RX queues from same file\n");
+
+ TEST_ASSERT(create_temp_path(multi_rx_pcap_path, sizeof(multi_rx_pcap_path),
+ "pcap_multi_rx") == 0,
+ "Failed to create temp path");
+
+ ret = create_test_pcap(multi_rx_pcap_path, seed_packets);
+ TEST_ASSERT_SUCCESS(ret, "Failed to create seed pcap file");
+ printf(" Created seed pcap file with %u packets\n", seed_packets);
+
+ ret = snprintf(devargs, sizeof(devargs), "rx_pcap=%s,rx_pcap=%s,rx_pcap=%s,rx_pcap=%s",
+ multi_rx_pcap_path, multi_rx_pcap_path, multi_rx_pcap_path, multi_rx_pcap_path);
+ TEST_ASSERT(ret > 0 && ret < (int)sizeof(devargs),
+ "devargs string truncated");
+
+ ret = rte_vdev_init("net_pcap_multi_rx", devargs);
+ TEST_ASSERT_SUCCESS(ret, "Failed to create pcap PMD: %s", rte_strerror(-ret));
+
+ ret = rte_eth_dev_get_port_by_name("net_pcap_multi_rx", &port_id);
+ TEST_ASSERT_SUCCESS(ret, "Cannot find added pcap device");
+
+ memset(&port_conf, 0, sizeof(port_conf));
+ ret = rte_eth_dev_configure(port_id, MULTI_QUEUE_NUM_QUEUES, 0, &port_conf);
+ TEST_ASSERT_SUCCESS(ret, "Failed to configure device: %s", rte_strerror(-ret));
+
+ memset(&rx_conf, 0, sizeof(rx_conf));
+ for (q = 0; q < MULTI_QUEUE_NUM_QUEUES; q++) {
+ ret = rte_eth_rx_queue_setup(port_id, q, RING_SIZE,
+ rte_eth_dev_socket_id(port_id), &rx_conf, mp);
+ TEST_ASSERT_SUCCESS(ret, "Failed to setup RX queue %u: %s", q, rte_strerror(-ret));
+ }
+
+ ret = rte_eth_dev_start(port_id);
+ TEST_ASSERT_SUCCESS(ret, "Failed to start device: %s", rte_strerror(-ret));
+
+ /* Receive packets from all queues. Each queue has its own file handle. */
+ int empty_rounds = 0;
+ while (empty_rounds < 10) {
+ int received_this_round = 0;
+ for (q = 0; q < MULTI_QUEUE_NUM_QUEUES; q++) {
+ uint16_t nb_rx = rte_eth_rx_burst(port_id, q, pkts, MULTI_QUEUE_BURST_SIZE);
+ if (nb_rx > 0) {
+ rx_per_queue[q] += nb_rx;
+ total_rx += nb_rx;
+ received_this_round += nb_rx;
+ rte_pktmbuf_free_bulk(pkts, nb_rx);
+ }
+ }
+ if (received_this_round == 0)
+ empty_rounds++;
+ else
+ empty_rounds = 0;
+ }
+
+ printf(" RX Results:\n");
+ for (q = 0; q < MULTI_QUEUE_NUM_QUEUES; q++)
+ printf(" Queue %u: received %u packets\n", q, rx_per_queue[q]);
+ printf(" Total received: %u packets\n", total_rx);
+
+ /* Each RX queue opens its own file handle, so each reads all packets */
+ expected_total = seed_packets * MULTI_QUEUE_NUM_QUEUES;
+ printf(" Expected total (each queue reads all): %u packets\n", expected_total);
+
+ rte_eth_dev_stop(port_id);
+ rte_vdev_uninit("net_pcap_multi_rx");
+
+ TEST_ASSERT(total_rx > 0, "No packets received at all");
+ for (q = 0; q < MULTI_QUEUE_NUM_QUEUES; q++) {
+ TEST_ASSERT(rx_per_queue[q] > 0, "Queue %u received no packets", q);
+ TEST_ASSERT_EQUAL(rx_per_queue[q], seed_packets,
+ "Queue %u received %u packets, expected %u",
+ q, rx_per_queue[q], seed_packets);
+ }
+ TEST_ASSERT_EQUAL(total_rx, expected_total,
+ "Total RX mismatch: expected %u, got %u", expected_total, total_rx);
+
+ remove_temp_file(multi_rx_pcap_path);
+
+ printf("Multi-RX queue PASSED\n");
+ return TEST_SUCCESS;
+}
+
+/*
+ * Test: Device info reports correct queue counts and MTU limits
+ *
+ * This test verifies that rte_eth_dev_info_get() returns correct values:
+ * 1. max_rx_queues matches the number of rx_pcap files passed
+ * 2. max_tx_queues matches the number of tx_pcap files passed
+ * 3. max_rx_pktlen and max_mtu are based on default snapshot length
+ */
+static int
+test_dev_info(void)
+{
+ struct rte_eth_dev_info dev_info;
+ char devargs[512];
+ char rx_paths[3][PATH_MAX];
+ char tx_paths[2][PATH_MAX];
+ uint16_t port_id;
+ int ret;
+ unsigned int i;
+ /* Default snapshot length is 65535 */
+ const uint32_t default_snaplen = 65535;
+ const uint32_t expected_max_mtu = default_snaplen - RTE_ETHER_HDR_LEN;
+
+ printf("Testing device info reporting\n");
+
+ /* Create temp RX pcap files (3 queues) */
+ for (i = 0; i < 3; i++) {
+ char prefix[32];
+ snprintf(prefix, sizeof(prefix), "pcap_devinfo_rx%u", i);
+ TEST_ASSERT(create_temp_path(rx_paths[i], sizeof(rx_paths[i]), prefix) == 0,
+ "Failed to create RX temp path %u", i);
+ TEST_ASSERT(create_test_pcap(rx_paths[i], 1) == 0,
+ "Failed to create RX pcap %u", i);
+ }
+
+ /* Create temp TX pcap files (2 queues) */
+ for (i = 0; i < 2; i++) {
+ char prefix[32];
+ snprintf(prefix, sizeof(prefix), "pcap_devinfo_tx%u", i);
+ TEST_ASSERT(create_temp_path(tx_paths[i], sizeof(tx_paths[i]), prefix) == 0,
+ "Failed to create TX temp path %u", i);
+ }
+
+ /* Create device with 3 RX queues and 2 TX queues */
+ ret = snprintf(devargs, sizeof(devargs),
+ "rx_pcap=%s,rx_pcap=%s,rx_pcap=%s,tx_pcap=%s,tx_pcap=%s",
+ rx_paths[0], rx_paths[1], rx_paths[2], tx_paths[0], tx_paths[1]);
+ TEST_ASSERT(ret > 0 && ret < (int)sizeof(devargs),
+ "devargs string truncated");
+
+ ret = rte_vdev_init("net_pcap_devinfo", devargs);
+ TEST_ASSERT_SUCCESS(ret, "Failed to create pcap PMD: %s", rte_strerror(-ret));
+
+ ret = rte_eth_dev_get_port_by_name("net_pcap_devinfo", &port_id);
+ TEST_ASSERT_SUCCESS(ret, "Cannot find added pcap device");
+
+ ret = rte_eth_dev_info_get(port_id, &dev_info);
+ TEST_ASSERT_SUCCESS(ret, "Failed to get device info: %s", rte_strerror(-ret));
+
+ printf(" Device info:\n");
+ printf(" driver_name: %s\n", dev_info.driver_name);
+ printf(" max_rx_queues: %u (expected: 3)\n", dev_info.max_rx_queues);
+ printf(" max_tx_queues: %u (expected: 2)\n", dev_info.max_tx_queues);
+ printf(" max_rx_pktlen: %u (expected: %u)\n", dev_info.max_rx_pktlen, default_snaplen);
+ printf(" max_mtu: %u (expected: %u)\n", dev_info.max_mtu, expected_max_mtu);
+
+ /* Verify queue counts match number of pcap files */
+ TEST_ASSERT_EQUAL(dev_info.max_rx_queues, 3U,
+ "max_rx_queues mismatch: expected 3, got %u", dev_info.max_rx_queues);
+ TEST_ASSERT_EQUAL(dev_info.max_tx_queues, 2U,
+ "max_tx_queues mismatch: expected 2, got %u", dev_info.max_tx_queues);
+
+ /* Verify max_rx_pktlen equals default snapshot length */
+ TEST_ASSERT_EQUAL(dev_info.max_rx_pktlen, default_snaplen,
+ "max_rx_pktlen mismatch: expected %u, got %u",
+ default_snaplen, dev_info.max_rx_pktlen);
+
+ /* Verify max_mtu is snapshot_len minus ethernet header */
+ TEST_ASSERT_EQUAL(dev_info.max_mtu, expected_max_mtu,
+ "max_mtu mismatch: expected %u, got %u",
+ expected_max_mtu, dev_info.max_mtu);
+
+ rte_vdev_uninit("net_pcap_devinfo");
+
+ /* Cleanup temp files */
+ for (i = 0; i < 3; i++)
+ remove_temp_file(rx_paths[i]);
+ for (i = 0; i < 2; i++)
+ remove_temp_file(tx_paths[i]);
+
+ printf("Device info PASSED\n");
+ return TEST_SUCCESS;
+}
+
+/*
+ * Test: Custom snapshot length (snaplen) parameter
+ *
+ * This test verifies that the snaplen devarg works correctly:
+ * 1. max_rx_pktlen reflects the custom snapshot length
+ * 2. max_mtu is calculated as snaplen - ethernet header
+ */
+static int
+test_snaplen(void)
+{
+ struct rte_eth_dev_info dev_info;
+ char devargs[512];
+ char rx_path[PATH_MAX];
+ char tx_path[PATH_MAX];
+ uint16_t port_id;
+ int ret;
+ const uint32_t custom_snaplen = 9000;
+ const uint32_t expected_max_mtu = custom_snaplen - RTE_ETHER_HDR_LEN;
+
+ printf("Testing custom snapshot length parameter\n");
+
+ /* Create temp files */
+ TEST_ASSERT(create_temp_path(rx_path, sizeof(rx_path), "pcap_snaplen_rx") == 0,
+ "Failed to create RX temp path");
+ TEST_ASSERT(create_test_pcap(rx_path, 1) == 0,
+ "Failed to create RX pcap");
+ TEST_ASSERT(create_temp_path(tx_path, sizeof(tx_path), "pcap_snaplen_tx") == 0,
+ "Failed to create TX temp path");
+
+ /* Create device with custom snaplen */
+ ret = snprintf(devargs, sizeof(devargs), "rx_pcap=%s,tx_pcap=%s,snaplen=%u",
+ rx_path, tx_path, custom_snaplen);
+ TEST_ASSERT(ret > 0 && ret < (int)sizeof(devargs),
+ "devargs string truncated");
+
+ ret = rte_vdev_init("net_pcap_snaplen", devargs);
+ TEST_ASSERT_SUCCESS(ret, "Failed to create pcap PMD: %s", rte_strerror(-ret));
+
+ ret = rte_eth_dev_get_port_by_name("net_pcap_snaplen", &port_id);
+ TEST_ASSERT_SUCCESS(ret, "Cannot find added pcap device");
+
+ ret = rte_eth_dev_info_get(port_id, &dev_info);
+ TEST_ASSERT_SUCCESS(ret, "Failed to get device info: %s", rte_strerror(-ret));
+
+ printf(" Custom snaplen: %u\n", custom_snaplen);
+ printf(" max_rx_pktlen: %u (expected: %u)\n", dev_info.max_rx_pktlen, custom_snaplen);
+ printf(" max_mtu: %u (expected: %u)\n", dev_info.max_mtu, expected_max_mtu);
+
+ /* Verify max_rx_pktlen equals custom snapshot length */
+ TEST_ASSERT_EQUAL(dev_info.max_rx_pktlen, custom_snaplen,
+ "max_rx_pktlen mismatch: expected %u, got %u",
+ custom_snaplen, dev_info.max_rx_pktlen);
+
+ /* Verify max_mtu is snaplen minus ethernet header */
+ TEST_ASSERT_EQUAL(dev_info.max_mtu, expected_max_mtu,
+ "max_mtu mismatch: expected %u, got %u",
+ expected_max_mtu, dev_info.max_mtu);
+
+ rte_vdev_uninit("net_pcap_snaplen");
+
+ /* Cleanup temp files */
+ remove_temp_file(rx_path);
+ remove_temp_file(tx_path);
+
+ printf("Snapshot length test PASSED\n");
+ return TEST_SUCCESS;
+}
+
+/*
+ * Test: Snapshot length truncation behavior
+ *
+ * This test verifies that packets larger than snaplen are properly truncated
+ * when written to pcap files:
+ * 1. caplen in pcap header is limited to snaplen
+ * 2. len in pcap header preserves original packet length
+ * 3. Only snaplen bytes of data are written
+ */
+static int
+test_snaplen_truncation(void)
+{
+ struct rte_mbuf *mbufs[NUM_PACKETS];
+ char devargs[512];
+ char tx_path[PATH_MAX];
+ uint16_t port_id;
+ int ret, nb_tx, nb_gen;
+ unsigned int pkt_count;
+ const uint32_t test_snaplen = 100;
+ const uint8_t pkt_size = 200;
+
+ printf("Testing snaplen truncation behavior\n");
+
+ /* Create temp TX file */
+ TEST_ASSERT(create_temp_path(tx_path, sizeof(tx_path), "pcap_trunc_tx") == 0,
+ "Failed to create TX temp path");
+
+ /* Create device with small snaplen */
+ ret = snprintf(devargs, sizeof(devargs), "tx_pcap=%s,snaplen=%u",
+ tx_path, test_snaplen);
+ TEST_ASSERT(ret > 0 && ret < (int)sizeof(devargs),
+ "devargs string truncated");
+
+ ret = rte_vdev_init("net_pcap_trunc", devargs);
+ TEST_ASSERT_SUCCESS(ret, "Failed to create pcap PMD: %s", rte_strerror(-ret));
+
+ ret = rte_eth_dev_get_port_by_name("net_pcap_trunc", &port_id);
+ TEST_ASSERT_SUCCESS(ret, "Cannot find added pcap device");
+
+ TEST_ASSERT(setup_pcap_port(port_id) == 0, "Failed to setup port");
+
+ /* Generate packets larger than snaplen */
+ nb_gen = generate_test_packets(mp, mbufs, NUM_PACKETS, pkt_size);
+ TEST_ASSERT_EQUAL(nb_gen, NUM_PACKETS,
+ "Failed to generate packets: got %d, expected %d",
+ nb_gen, NUM_PACKETS);
+
+ printf(" Sending %d packets of size %u with snaplen=%u\n",
+ NUM_PACKETS, pkt_size, test_snaplen);
+
+ /* Transmit packets */
+ nb_tx = rte_eth_tx_burst(port_id, 0, mbufs, NUM_PACKETS);
+ TEST_ASSERT_EQUAL(nb_tx, NUM_PACKETS,
+ "TX burst failed: sent %d/%d", nb_tx, NUM_PACKETS);
+
+ cleanup_pcap_vdev("net_pcap_trunc", port_id);
+
+ /* Verify truncation in output file */
+ ret = verify_pcap_truncation(tx_path, test_snaplen, pkt_size, &pkt_count);
+ TEST_ASSERT_SUCCESS(ret, "Truncation verification failed");
+ TEST_ASSERT_EQUAL(pkt_count, (unsigned int)NUM_PACKETS,
+ "Packet count mismatch: got %u, expected %d",
+ pkt_count, NUM_PACKETS);
+
+ printf(" Verified %u packets: caplen=%u, len=%u\n",
+ pkt_count, test_snaplen, pkt_size);
+
+ /* Cleanup */
+ remove_temp_file(tx_path);
+
+ printf("Snaplen truncation test PASSED\n");
+ return TEST_SUCCESS;
+}
+
+/*
+ * Test: Snapshot length truncation with multi-segment mbufs
+ *
+ * This test verifies that the dumper path correctly truncates
+ * non-contiguous (multi-segment) mbufs when the total packet length
+ * exceeds the configured snaplen. It exercises the RTE_MIN(len, snaplen)
+ * cap in the TX dumper by ensuring:
+ *
+ * 1. caplen in the pcap header equals snaplen (not pkt_len)
+ * 2. len in the pcap header preserves the original packet length
+ * 3. Truncation works when the snaplen boundary falls mid-chain
+ */
+static int
+test_snaplen_truncation_multiseg(void)
+{
+ struct rte_mbuf *mbufs[MAX_PKT_BURST];
+ char devargs[512];
+ char tx_path[PATH_MAX];
+ uint16_t port_id;
+ int ret, nb_tx;
+ unsigned int i, pkt_count;
+ const uint32_t test_snaplen = 100;
+ const uint32_t pkt_size = 300;
+ const uint16_t seg_size = 64;
+ const unsigned int num_pkts = 8;
+
+ printf("Testing snaplen truncation with multi-segment mbufs\n");
+
+ /* Create temp TX file */
+ TEST_ASSERT(create_temp_path(tx_path, sizeof(tx_path),
+ "pcap_trunc_ms") == 0,
+ "Failed to create TX temp path");
+
+ /* Create device with small snaplen */
+ ret = snprintf(devargs, sizeof(devargs), "tx_pcap=%s,snaplen=%u",
+ tx_path, test_snaplen);
+ TEST_ASSERT(ret > 0 && ret < (int)sizeof(devargs),
+ "devargs string truncated");
+
+ TEST_ASSERT(create_pcap_vdev("net_pcap_trunc_ms", devargs,
+ &port_id) == 0,
+ "Failed to create TX vdev");
+ TEST_ASSERT(setup_pcap_port(port_id) == 0, "Failed to setup port");
+
+ /*
+ * Allocate multi-segment mbufs. With seg_size=64 and pkt_size=300,
+ * each mbuf will have 5 segments (4×64 + 1×44). The snaplen of 100
+ * falls partway through the second segment, forcing the dumper to
+ * stop writing in the middle of the chain.
+ */
+ for (i = 0; i < num_pkts; i++) {
+ mbufs[i] = alloc_multiseg_mbuf(pkt_size, seg_size,
+ (uint8_t)(0xA0 + i));
+ if (mbufs[i] == NULL) {
+ while (i > 0)
+ rte_pktmbuf_free(mbufs[--i]);
+ cleanup_pcap_vdev("net_pcap_trunc_ms", port_id);
+ remove_temp_file(tx_path);
+ return TEST_FAILED;
+ }
+ }
+
+ printf(" Sending %u packets: pkt_len=%u, seg_size=%u (%u segs), snaplen=%u\n",
+ num_pkts, pkt_size, seg_size, mbufs[0]->nb_segs, test_snaplen);
+
+ /* Verify mbufs are actually multi-segment */
+ TEST_ASSERT(mbufs[0]->nb_segs > 1,
+ "Expected multi-segment mbufs, got %u segment(s)",
+ mbufs[0]->nb_segs);
+
+ /* Transmit packets */
+ nb_tx = rte_eth_tx_burst(port_id, 0, mbufs, num_pkts);
+
+ /* Free any unsent mbufs */
+ for (i = nb_tx; i < num_pkts; i++)
+ rte_pktmbuf_free(mbufs[i]);
+
+ TEST_ASSERT_EQUAL(nb_tx, (int)num_pkts,
+ "TX burst failed: sent %d/%u", nb_tx, num_pkts);
+
+ cleanup_pcap_vdev("net_pcap_trunc_ms", port_id);
+
+ /* Verify truncation in output file */
+ ret = verify_pcap_truncation(tx_path, test_snaplen, pkt_size,
+ &pkt_count);
+ TEST_ASSERT_SUCCESS(ret, "Truncation verification failed");
+ TEST_ASSERT_EQUAL(pkt_count, num_pkts,
+ "Packet count mismatch: got %u, expected %u",
+ pkt_count, num_pkts);
+
+ printf(" Verified %u packets: caplen=%u, len=%u\n",
+ pkt_count, test_snaplen, pkt_size);
+
+ remove_temp_file(tx_path);
+
+ printf("Snaplen truncation multi-segment test PASSED\n");
+ return TEST_SUCCESS;
+}
+
+/*
+ * Test: VLAN Strip on RX
+ *
+ * This test verifies that when VLAN strip offload is enabled:
+ * 1. VLAN-tagged packets from pcap file have tags removed
+ * 2. VLAN info is stored in mbuf metadata (vlan_tci, ol_flags)
+ * 3. Packet data no longer contains the 4-byte VLAN tag
+ */
+static int
+test_vlan_strip_rx(void)
+{
+ struct rte_mbuf *mbufs[NUM_PACKETS];
+ char devargs[256];
+ uint16_t port_id;
+ unsigned int received, i;
+ size_t expected_len;
+ int ret;
+
+ printf("Testing VLAN strip on RX\n");
+
+ /* Create pcap file with VLAN-tagged packets */
+ TEST_ASSERT(create_temp_path(vlan_rx_pcap_path, sizeof(vlan_rx_pcap_path),
+ "pcap_vlan_rx") == 0,
+ "Failed to create temp file path");
+
+ TEST_ASSERT(create_vlan_tagged_pcap(vlan_rx_pcap_path, NUM_PACKETS,
+ TEST_VLAN_ID, TEST_VLAN_PCP) == 0,
+ "Failed to create VLAN-tagged pcap file");
+
+ printf(" Created VLAN-tagged pcap with %d packets (VLAN ID=%u, PCP=%u)\n",
+ NUM_PACKETS, TEST_VLAN_ID, TEST_VLAN_PCP);
+
+ /* Create vdev and configure with VLAN strip enabled */
+ ret = snprintf(devargs, sizeof(devargs), "rx_pcap=%s", vlan_rx_pcap_path);
+ TEST_ASSERT(ret > 0 && ret < (int)sizeof(devargs),
+ "devargs string truncated");
+ TEST_ASSERT(create_pcap_vdev("net_pcap_vlan_rx", devargs, &port_id) == 0,
+ "Failed to create RX vdev");
+ TEST_ASSERT(setup_pcap_port_vlan_strip(port_id) == 0,
+ "Failed to setup port with VLAN strip");
+
+ /* Receive packets */
+ receive_packets(port_id, mbufs, NUM_PACKETS, &received);
+ TEST_ASSERT_EQUAL(received, (unsigned int)NUM_PACKETS,
+ "Received %u packets, expected %d", received, NUM_PACKETS);
+
+ /* Expected length after VLAN strip (original - 4 bytes VLAN header) */
+ expected_len = sizeof(struct rte_ether_hdr) + sizeof(struct rte_ipv4_hdr) +
+ sizeof(struct rte_udp_hdr) + 18; /* 18 bytes payload */
+
+ /* Verify VLAN was stripped from each packet */
+ for (i = 0; i < received; i++) {
+ /* Check packet no longer has VLAN tag in data */
+ TEST_ASSERT(verify_no_vlan_tag(mbufs[i]) == 0,
+ "Packet %u still has VLAN tag after strip", i);
+
+ /* Check packet length decreased by 4 (VLAN header size) */
+ TEST_ASSERT_EQUAL(rte_pktmbuf_pkt_len(mbufs[i]), expected_len,
+ "Packet %u length %u != expected %zu after strip",
+ i, rte_pktmbuf_pkt_len(mbufs[i]), expected_len);
+
+ /* Check VLAN info stored in mbuf metadata */
+ TEST_ASSERT(mbufs[i]->ol_flags & RTE_MBUF_F_RX_VLAN,
+ "Packet %u: RX_VLAN flag not set", i);
+ TEST_ASSERT(mbufs[i]->ol_flags & RTE_MBUF_F_RX_VLAN_STRIPPED,
+ "Packet %u: RX_VLAN_STRIPPED flag not set", i);
+
+ /* Verify the stored VLAN TCI contains expected values */
+ uint16_t expected_tci = (TEST_VLAN_PCP << 13) | TEST_VLAN_ID;
+ TEST_ASSERT_EQUAL(mbufs[i]->vlan_tci, expected_tci,
+ "Packet %u: vlan_tci %u != expected %u",
+ i, mbufs[i]->vlan_tci, expected_tci);
+ }
+
+ rte_pktmbuf_free_bulk(mbufs, received);
+ cleanup_pcap_vdev("net_pcap_vlan_rx", port_id);
+
+ printf("VLAN strip RX PASSED: %u packets verified\n", received);
+ return TEST_SUCCESS;
+}
+
+/*
+ * Test: VLAN Insert on TX
+ *
+ * This test verifies that when TX VLAN insert offload is used:
+ * 1. Untagged packets with RTE_MBUF_F_TX_VLAN flag get VLAN tag inserted
+ * 2. The written pcap file contains properly VLAN-tagged packets
+ * 3. VLAN ID and PCP from mbuf vlan_tci are correctly inserted
+ */
+static int
+test_vlan_insert_tx(void)
+{
+ struct rte_mbuf *mbufs[NUM_PACKETS];
+ char vlan_tx_pcap_path[PATH_MAX];
+ char devargs[256];
+ uint16_t port_id;
+ uint16_t nb_prep;
+ int nb_tx, pkt_count;
+ int ret;
+
+ printf("Testing VLAN insert on TX\n");
+
+ /* Create temp file for TX output */
+ TEST_ASSERT(create_temp_path(vlan_tx_pcap_path, sizeof(vlan_tx_pcap_path),
+ "pcap_vlan_tx") == 0,
+ "Failed to create temp file path");
+
+ /* Create vdev */
+ ret = snprintf(devargs, sizeof(devargs), "tx_pcap=%s", vlan_tx_pcap_path);
+ TEST_ASSERT(ret > 0 && ret < (int)sizeof(devargs),
+ "devargs string truncated");
+ TEST_ASSERT(create_pcap_vdev("net_pcap_vlan_tx", devargs, &port_id) == 0,
+ "Failed to create TX vdev");
+ TEST_ASSERT(setup_pcap_port(port_id) == 0,
+ "Failed to setup TX port");
+
+ /* Allocate mbufs with VLAN TX offload configured */
+ TEST_ASSERT(alloc_vlan_tx_mbufs(mbufs, NUM_PACKETS,
+ TEST_VLAN_ID, TEST_VLAN_PCP) == 0,
+ "Failed to allocate VLAN TX mbufs");
+
+ printf(" Transmitting %d untagged packets with TX_VLAN offload "
+ "(VLAN ID=%u, PCP=%u)\n", NUM_PACKETS, TEST_VLAN_ID, TEST_VLAN_PCP);
+
+ /* tx_prepare handles VLAN tag insertion */
+ nb_prep = rte_eth_tx_prepare(port_id, 0, mbufs, NUM_PACKETS);
+ TEST_ASSERT_EQUAL(nb_prep, NUM_PACKETS,
+ "tx_prepare failed: prepared %u/%d", nb_prep, NUM_PACKETS);
+
+ /* Transmit the prepared packets */
+ nb_tx = rte_eth_tx_burst(port_id, 0, mbufs, nb_prep);
+ TEST_ASSERT_EQUAL(nb_tx, NUM_PACKETS,
+ "TX burst failed: sent %d/%d", nb_tx, NUM_PACKETS);
+
+ cleanup_pcap_vdev("net_pcap_vlan_tx", port_id);
+
+ /* Verify the output pcap file contains VLAN-tagged packets */
+ pkt_count = count_vlan_packets_in_pcap(vlan_tx_pcap_path, TEST_VLAN_ID, 1);
+ TEST_ASSERT(pkt_count >= 0, "Error verifying VLAN tags in output pcap");
+ TEST_ASSERT_EQUAL(pkt_count, NUM_PACKETS,
+ "Pcap file has %d packets, expected %d",
+ pkt_count, NUM_PACKETS);
+
+ remove_temp_file(vlan_tx_pcap_path);
+
+ printf("VLAN insert TX PASSED: %d VLAN-tagged packets written\n", NUM_PACKETS);
+ return TEST_SUCCESS;
+}
+
+/*
+ * Test: VLAN Strip disabled (packets should remain tagged)
+ *
+ * This test verifies that when VLAN strip is NOT enabled:
+ * 1. VLAN-tagged packets from pcap file keep their tags
+ * 2. Packet data still contains the 4-byte VLAN header
+ */
+static int
+test_vlan_no_strip_rx(void)
+{
+ struct rte_mbuf *mbufs[NUM_PACKETS];
+ char devargs[256];
+ uint16_t port_id;
+ unsigned int received, i;
+ size_t expected_len;
+ int ret;
+
+ printf("Testing VLAN packets without strip (offload disabled)\n");
+
+ /* Create pcap file with VLAN-tagged packets if not already created */
+ if (access(vlan_rx_pcap_path, F_OK) != 0) {
+ TEST_ASSERT(create_temp_path(vlan_rx_pcap_path, sizeof(vlan_rx_pcap_path),
+ "pcap_vlan_nostrip") == 0,
+ "Failed to create temp file path");
+ TEST_ASSERT(create_vlan_tagged_pcap(vlan_rx_pcap_path, NUM_PACKETS,
+ TEST_VLAN_ID, TEST_VLAN_PCP) == 0,
+ "Failed to create VLAN-tagged pcap file");
+ }
+
+ /* Create vdev and configure WITHOUT VLAN strip */
+ ret = snprintf(devargs, sizeof(devargs), "rx_pcap=%s", vlan_rx_pcap_path);
+ TEST_ASSERT(ret > 0 && ret < (int)sizeof(devargs),
+ "devargs string truncated");
+ TEST_ASSERT(create_pcap_vdev("net_pcap_vlan_nostrip", devargs, &port_id) == 0,
+ "Failed to create RX vdev");
+ /* Use standard setup which does NOT enable VLAN strip */
+ TEST_ASSERT(setup_pcap_port(port_id) == 0,
+ "Failed to setup port");
+
+ /* Receive packets */
+ receive_packets(port_id, mbufs, NUM_PACKETS, &received);
+ TEST_ASSERT_EQUAL(received, (unsigned int)NUM_PACKETS,
+ "Received %u packets, expected %d", received, NUM_PACKETS);
+
+ /* Expected length with VLAN tag still present */
+ expected_len = sizeof(struct rte_ether_hdr) + sizeof(struct rte_vlan_hdr) +
+ sizeof(struct rte_ipv4_hdr) + sizeof(struct rte_udp_hdr) + 18;
+
+ /* Verify VLAN tag is still present in each packet */
+ for (i = 0; i < received; i++) {
+ /* Check packet still has VLAN tag */
+ TEST_ASSERT(verify_vlan_tag(mbufs[i], TEST_VLAN_ID, TEST_VLAN_PCP) == 0,
+ "Packet %u: VLAN tag verification failed", i);
+
+ /* Check packet length unchanged */
+ TEST_ASSERT_EQUAL(rte_pktmbuf_pkt_len(mbufs[i]), expected_len,
+ "Packet %u length %u != expected %zu",
+ i, rte_pktmbuf_pkt_len(mbufs[i]), expected_len);
+
+ /* VLAN strip flags should NOT be set */
+ TEST_ASSERT(!(mbufs[i]->ol_flags & RTE_MBUF_F_RX_VLAN_STRIPPED),
+ "Packet %u: RX_VLAN_STRIPPED flag set unexpectedly", i);
+ }
+
+ rte_pktmbuf_free_bulk(mbufs, received);
+ cleanup_pcap_vdev("net_pcap_vlan_nostrip", port_id);
+
+ printf("VLAN no-strip RX PASSED: %u packets verified with tags intact\n", received);
+ return TEST_SUCCESS;
+}
+
+/*
+ * Test: Runtime VLAN offload configuration via rte_eth_dev_set_vlan_offload
+ *
+ * This test verifies that VLAN strip can be enabled/disabled at runtime
+ * using the standard ethdev API rather than only at configure time.
+ * Uses infinite_rx mode so the same packets can be read in each phase.
+ */
+static int
+test_vlan_offload_set(void)
+{
+ struct rte_mbuf *mbufs[MAX_PKT_BURST];
+ char vlan_set_pcap_path[PATH_MAX];
+ char devargs[512];
+ uint16_t port_id;
+ struct rte_eth_conf port_conf = { 0 };
+ unsigned int i;
+ uint16_t nb_rx;
+ int ret, current_offload;
+ size_t tagged_len, untagged_len;
+
+ printf("Testing runtime VLAN offload configuration\n");
+
+ /* Create pcap file with VLAN-tagged packets */
+ TEST_ASSERT(create_temp_path(vlan_set_pcap_path, sizeof(vlan_set_pcap_path),
+ "pcap_vlan_set") == 0,
+ "Failed to create temp file path");
+
+ TEST_ASSERT(create_vlan_tagged_pcap(vlan_set_pcap_path, NUM_PACKETS,
+ TEST_VLAN_ID, TEST_VLAN_PCP) == 0,
+ "Failed to create VLAN-tagged pcap file");
+
+ /* Use infinite_rx so packets are always available */
+ ret = snprintf(devargs, sizeof(devargs),
+ "rx_pcap=%s,infinite_rx=1", vlan_set_pcap_path);
+ TEST_ASSERT(ret > 0 && ret < (int)sizeof(devargs),
+ "devargs string truncated");
+ TEST_ASSERT(create_pcap_vdev("net_pcap_vlan_set", devargs, &port_id) == 0,
+ "Failed to create RX vdev");
+
+ /* Configure WITHOUT VLAN strip initially and start */
+ TEST_ASSERT(setup_pcap_port_conf(port_id, &port_conf) == 0,
+ "Failed to setup port");
+
+ /* Expected lengths */
+ tagged_len = sizeof(struct rte_ether_hdr) + sizeof(struct rte_vlan_hdr) +
+ sizeof(struct rte_ipv4_hdr) + sizeof(struct rte_udp_hdr) + 18;
+ untagged_len = sizeof(struct rte_ether_hdr) + sizeof(struct rte_ipv4_hdr) +
+ sizeof(struct rte_udp_hdr) + 18;
+
+ /* Verify VLAN strip is initially disabled */
+ current_offload = rte_eth_dev_get_vlan_offload(port_id);
+ TEST_ASSERT(!(current_offload & RTE_ETH_VLAN_STRIP_OFFLOAD),
+ "VLAN strip should be disabled initially");
+
+ /*
+ * Phase 1: VLAN strip disabled - packets should have tags
+ */
+ printf(" Phase 1: VLAN strip disabled\n");
+ nb_rx = rte_eth_rx_burst(port_id, 0, mbufs, MAX_PKT_BURST);
+ TEST_ASSERT(nb_rx > 0, "No packets received in phase 1");
+
+ for (i = 0; i < nb_rx; i++) {
+ TEST_ASSERT_EQUAL(rte_pktmbuf_pkt_len(mbufs[i]), tagged_len,
+ "Phase 1 packet %u: expected tagged length %zu, got %u",
+ i, tagged_len, rte_pktmbuf_pkt_len(mbufs[i]));
+ }
+ rte_pktmbuf_free_bulk(mbufs, nb_rx);
+ printf(" Received %u packets with VLAN tags intact\n", nb_rx);
+
+ /*
+ * Phase 2: Enable VLAN strip at runtime - packets should be stripped
+ */
+ printf(" Phase 2: Enabling VLAN strip via rte_eth_dev_set_vlan_offload\n");
+ ret = rte_eth_dev_set_vlan_offload(port_id, RTE_ETH_VLAN_STRIP_OFFLOAD);
+ TEST_ASSERT(ret == 0, "Failed to enable VLAN strip: %s", rte_strerror(-ret));
+
+ current_offload = rte_eth_dev_get_vlan_offload(port_id);
+ TEST_ASSERT(current_offload & RTE_ETH_VLAN_STRIP_OFFLOAD,
+ "VLAN strip should be enabled after set_vlan_offload");
+
+ nb_rx = rte_eth_rx_burst(port_id, 0, mbufs, MAX_PKT_BURST);
+ TEST_ASSERT(nb_rx > 0, "No packets received in phase 2");
+
+ for (i = 0; i < nb_rx; i++) {
+ TEST_ASSERT_EQUAL(rte_pktmbuf_pkt_len(mbufs[i]), untagged_len,
+ "Phase 2 packet %u: expected untagged length %zu, got %u",
+ i, untagged_len, rte_pktmbuf_pkt_len(mbufs[i]));
+ TEST_ASSERT(mbufs[i]->ol_flags & RTE_MBUF_F_RX_VLAN_STRIPPED,
+ "Phase 2 packet %u: VLAN_STRIPPED flag not set", i);
+ }
+ rte_pktmbuf_free_bulk(mbufs, nb_rx);
+ printf(" Received %u packets with VLAN tags stripped\n", nb_rx);
+
+ /*
+ * Phase 3: Disable VLAN strip - packets should have tags again
+ */
+ printf(" Phase 3: Disabling VLAN strip\n");
+ ret = rte_eth_dev_set_vlan_offload(port_id, 0);
+ TEST_ASSERT(ret == 0, "Failed to disable VLAN strip: %s", rte_strerror(-ret));
+
+ current_offload = rte_eth_dev_get_vlan_offload(port_id);
+ TEST_ASSERT(!(current_offload & RTE_ETH_VLAN_STRIP_OFFLOAD),
+ "VLAN strip should be disabled after clearing");
+
+ nb_rx = rte_eth_rx_burst(port_id, 0, mbufs, MAX_PKT_BURST);
+ TEST_ASSERT(nb_rx > 0, "No packets received in phase 3");
+
+ for (i = 0; i < nb_rx; i++) {
+ TEST_ASSERT_EQUAL(rte_pktmbuf_pkt_len(mbufs[i]), tagged_len,
+ "Phase 3 packet %u: expected tagged length %zu, got %u",
+ i, tagged_len, rte_pktmbuf_pkt_len(mbufs[i]));
+ }
+ rte_pktmbuf_free_bulk(mbufs, nb_rx);
+ printf(" Received %u packets with VLAN tags intact again\n", nb_rx);
+
+ cleanup_pcap_vdev("net_pcap_vlan_set", port_id);
+ remove_temp_file(vlan_set_pcap_path);
+
+ printf("Runtime VLAN offload PASSED\n");
+ return TEST_SUCCESS;
+}
+
+/*
+ * Test: VLAN Strip in infinite RX mode
+ *
+ * This test verifies that VLAN strip offload works correctly when combined
+ * with infinite_rx mode, which uses a different RX path (eth_pcap_rx_infinite).
+ */
+static int
+test_vlan_strip_infinite_rx(void)
+{
+ struct rte_mbuf *mbufs[MAX_PKT_BURST];
+ struct rte_eth_conf port_conf = {
+ .rxmode.offloads = RTE_ETH_RX_OFFLOAD_VLAN_STRIP,
+ };
+ char vlan_inf_pcap_path[PATH_MAX];
+ char devargs[256];
+ uint16_t port_id;
+ unsigned int total_rx = 0;
+ unsigned int stripped_count = 0;
+ int iter, attempts, ret;
+ size_t expected_len;
+
+ printf("Testing VLAN strip with infinite RX mode\n");
+
+ /* Create pcap file with VLAN-tagged packets */
+ TEST_ASSERT(create_temp_path(vlan_inf_pcap_path, sizeof(vlan_inf_pcap_path),
+ "pcap_vlan_inf") == 0,
+ "Failed to create temp file path");
+
+ TEST_ASSERT(create_vlan_tagged_pcap(vlan_inf_pcap_path, NUM_PACKETS,
+ TEST_VLAN_ID, TEST_VLAN_PCP) == 0,
+ "Failed to create VLAN-tagged pcap file");
+
+ printf(" Created VLAN-tagged pcap with %d packets for infinite RX\n", NUM_PACKETS);
+
+ /* Create vdev with infinite_rx enabled */
+ ret = snprintf(devargs, sizeof(devargs), "rx_pcap=%s,infinite_rx=1", vlan_inf_pcap_path);
+ TEST_ASSERT(ret > 0 && ret < (int)sizeof(devargs),
+ "devargs string truncated");
+ ret = rte_vdev_init("net_pcap_vlan_inf", devargs);
+ TEST_ASSERT(ret == 0, "Failed to create infinite RX vdev: %s", rte_strerror(-ret));
+
+ ret = rte_eth_dev_get_port_by_name("net_pcap_vlan_inf", &port_id);
+ TEST_ASSERT(ret == 0, "Failed to get port ID");
+
+ /* Configure with VLAN strip enabled and start */
+ TEST_ASSERT(setup_pcap_port_conf(port_id, &port_conf) == 0,
+ "Failed to setup port with VLAN strip");
+
+ /* Expected length after VLAN strip */
+ expected_len = sizeof(struct rte_ether_hdr) + sizeof(struct rte_ipv4_hdr) +
+ sizeof(struct rte_udp_hdr) + 18;
+
+ /* Read packets - need more than file contains to verify infinite looping */
+ for (iter = 0; iter < 3 && total_rx < NUM_PACKETS * 2; iter++) {
+ for (attempts = 0; attempts < 100 && total_rx < NUM_PACKETS * 2; attempts++) {
+ uint16_t nb_rx = rte_eth_rx_burst(port_id, 0, mbufs, MAX_PKT_BURST);
+
+ for (uint16_t i = 0; i < nb_rx; i++) {
+ /* Verify VLAN was stripped */
+ if (verify_no_vlan_tag(mbufs[i]) == 0 &&
+ rte_pktmbuf_pkt_len(mbufs[i]) == expected_len &&
+ (mbufs[i]->ol_flags & RTE_MBUF_F_RX_VLAN_STRIPPED))
+ stripped_count++;
+ }
+
+ if (nb_rx > 0)
+ rte_pktmbuf_free_bulk(mbufs, nb_rx);
+ total_rx += nb_rx;
+
+ if (nb_rx == 0)
+ usleep(100);
+ }
+ }
+
+ rte_eth_dev_stop(port_id);
+ rte_vdev_uninit("net_pcap_vlan_inf");
+ remove_temp_file(vlan_inf_pcap_path);
+
+ TEST_ASSERT(total_rx >= NUM_PACKETS * 2,
+ "Infinite RX: got %u packets, need >= %d", total_rx, NUM_PACKETS * 2);
+
+ TEST_ASSERT_EQUAL(stripped_count, total_rx,
+ "VLAN strip failed: only %u/%u packets stripped correctly",
+ stripped_count, total_rx);
+
+ printf("VLAN strip infinite RX PASSED: %u packets stripped (file has %d)\n",
+ total_rx, NUM_PACKETS);
+ return TEST_SUCCESS;
+}
+
+/*
+ * Test: Timestamps in infinite RX mode
+ *
+ * This test verifies that timestamp offload works correctly when combined
+ * with infinite_rx mode. Since infinite_rx generates packets on-the-fly,
+ * timestamps should reflect the current time rather than pcap file timestamps.
+ */
+static int
+test_timestamp_infinite_rx(void)
+{
+ struct rte_mbuf *mbufs[MAX_PKT_BURST];
+ struct rte_eth_conf port_conf = {
+ .rxmode.offloads = RTE_ETH_RX_OFFLOAD_TIMESTAMP,
+ };
+ char ts_inf_pcap_path[PATH_MAX];
+ char devargs[256];
+ uint16_t port_id;
+ unsigned int total_rx = 0;
+ unsigned int ts_count = 0;
+ int iter, attempts, ret;
+ rte_mbuf_timestamp_t first_ts = 0;
+ rte_mbuf_timestamp_t last_ts = 0;
+
+ printf("Testing timestamps with infinite RX mode\n");
+
+ /* Initialize timestamp dynamic field access */
+ if (timestamp_dynfield_offset < 0) {
+ ret = timestamp_init();
+ if (ret != 0) {
+ printf("Timestamp dynfield not available, skipping\n");
+ return TEST_SKIPPED;
+ }
+ }
+
+ /* Create simple pcap file */
+ TEST_ASSERT(create_temp_path(ts_inf_pcap_path, sizeof(ts_inf_pcap_path),
+ "pcap_ts_inf") == 0,
+ "Failed to create temp file path");
+
+ TEST_ASSERT(create_test_pcap(ts_inf_pcap_path, NUM_PACKETS) == 0,
+ "Failed to create test pcap file");
+
+ /* Create vdev with infinite_rx enabled */
+ ret = snprintf(devargs, sizeof(devargs), "rx_pcap=%s,infinite_rx=1", ts_inf_pcap_path);
+ TEST_ASSERT(ret > 0 && ret < (int)sizeof(devargs),
+ "devargs string truncated");
+ ret = rte_vdev_init("net_pcap_ts_inf", devargs);
+ TEST_ASSERT(ret == 0, "Failed to create infinite RX vdev: %s", rte_strerror(-ret));
+
+ ret = rte_eth_dev_get_port_by_name("net_pcap_ts_inf", &port_id);
+ TEST_ASSERT(ret == 0, "Failed to get port ID");
+
+ /* Configure with timestamp offload enabled and start */
+ TEST_ASSERT(setup_pcap_port_conf(port_id, &port_conf) == 0,
+ "Failed to setup port with timestamps");
+
+ /* Read packets */
+ for (iter = 0; iter < 3 && total_rx < NUM_PACKETS * 2; iter++) {
+ for (attempts = 0; attempts < 100 && total_rx < NUM_PACKETS * 2; attempts++) {
+ uint16_t nb_rx = rte_eth_rx_burst(port_id, 0, mbufs, MAX_PKT_BURST);
+
+ for (uint16_t i = 0; i < nb_rx; i++) {
+ if (mbuf_has_timestamp(mbufs[i])) {
+ rte_mbuf_timestamp_t ts = mbuf_timestamp_get(mbufs[i]);
+
+ if (ts_count == 0)
+ first_ts = ts;
+ last_ts = ts;
+ ts_count++;
+ }
+ }
+
+ if (nb_rx > 0)
+ rte_pktmbuf_free_bulk(mbufs, nb_rx);
+ total_rx += nb_rx;
+
+ if (nb_rx == 0)
+ usleep(100);
+ }
+ }
+
+ rte_eth_dev_stop(port_id);
+ rte_vdev_uninit("net_pcap_ts_inf");
+ remove_temp_file(ts_inf_pcap_path);
+
+ TEST_ASSERT(total_rx >= NUM_PACKETS * 2,
+ "Infinite RX: got %u packets, need >= %d", total_rx, NUM_PACKETS * 2);
+
+ TEST_ASSERT_EQUAL(ts_count, total_rx,
+ "Timestamp missing: only %u/%u packets have timestamps",
+ ts_count, total_rx);
+
+ /* Timestamps should be monotonically increasing (current time) */
+ TEST_ASSERT(last_ts >= first_ts,
+ "Timestamps not monotonic: first=%" PRIu64 " last=%" PRIu64,
+ first_ts, last_ts);
+
+ printf("Timestamp infinite RX PASSED: %u packets with valid timestamps\n", total_rx);
+ return TEST_SUCCESS;
+}
+
+/*
+ * Test suite setup
+ */
+static int
+test_setup(void)
+{
+ /* Generate random source MAC address */
+ rte_eth_random_addr(src_mac.addr_bytes);
+
+ mp = rte_pktmbuf_pool_create("pcap_test_pool", NB_MBUF, 32, 0,
+ RTE_MBUF_DEFAULT_BUF_SIZE,
+ rte_socket_id());
+ TEST_ASSERT_NOT_NULL(mp, "Failed to create mempool");
+
+ return 0;
+}
+
+
+/*
+ * Test: Oversized packets are dropped when scatter is disabled
+ *
+ * Use the default mempool (buf_size ~2048) without scatter enabled.
+ * Read a pcap file containing packets larger than the mbuf data room.
+ * Verify that oversized packets are dropped and counted as errors.
+ */
+static int
+test_scatter_drop_oversized(void)
+{
+ struct rte_eth_conf port_conf;
+ struct rte_eth_stats stats;
+ struct rte_mbuf *mbufs[NUM_PACKETS];
+ char rx_pcap_path[PATH_MAX];
+ char devargs[256];
+ uint16_t port_id;
+ unsigned int received;
+ int ret;
+ const unsigned int num_pkts = 16;
+
+ printf("Testing scatter: oversized packets dropped without scatter\n");
+
+ TEST_ASSERT(create_temp_path(rx_pcap_path, sizeof(rx_pcap_path),
+ "pcap_scat_drop") == 0,
+ "Failed to create temp file path");
+
+ /*
+ * Create pcap with jumbo packets (9000 bytes) that exceed the
+ * default mbuf data room (~2048 bytes). Without scatter enabled,
+ * these should be dropped at receive time.
+ */
+ TEST_ASSERT(create_sized_pcap(rx_pcap_path, num_pkts,
+ PKT_SIZE_JUMBO) == 0,
+ "Failed to create jumbo pcap file");
+
+ ret = snprintf(devargs, sizeof(devargs), "rx_pcap=%s", rx_pcap_path);
+ TEST_ASSERT(ret > 0 && ret < (int)sizeof(devargs),
+ "devargs string truncated");
+ ret = rte_vdev_init("net_pcap_scat_drop", devargs);
+ TEST_ASSERT(ret == 0, "Failed to create vdev: %s", rte_strerror(-ret));
+
+ ret = rte_eth_dev_get_port_by_name("net_pcap_scat_drop", &port_id);
+ TEST_ASSERT(ret == 0, "Failed to get port ID");
+
+ /* Configure without scatter - MTU check passes (1514 < 2048) */
+ memset(&port_conf, 0, sizeof(port_conf));
+ ret = rte_eth_dev_configure(port_id, 1, 1, &port_conf);
+ TEST_ASSERT(ret == 0, "Failed to configure port: %s", rte_strerror(-ret));
+
+ ret = rte_eth_rx_queue_setup(port_id, 0, RING_SIZE, SOCKET0, NULL, mp);
+ TEST_ASSERT(ret == 0, "rx_queue_setup failed: %s", rte_strerror(-ret));
+
+ ret = rte_eth_tx_queue_setup(port_id, 0, RING_SIZE, SOCKET0, NULL);
+ TEST_ASSERT(ret == 0, "tx_queue_setup failed: %s", rte_strerror(-ret));
+
+ ret = rte_eth_dev_start(port_id);
+ TEST_ASSERT(ret == 0, "Failed to start port: %s", rte_strerror(-ret));
+
+ ret = rte_eth_stats_reset(port_id);
+ TEST_ASSERT(ret == 0, "Failed to reset stats");
+
+ /*
+ * Read all packets. The pcap file has 9000-byte packets but
+ * scatter is not enabled. They should all be dropped.
+ */
+ receive_packets(port_id, mbufs, num_pkts, &received);
+ rte_pktmbuf_free_bulk(mbufs, received);
+
+ ret = rte_eth_stats_get(port_id, &stats);
+ TEST_ASSERT(ret == 0, "Failed to get stats");
+
+ printf(" Received %u packets, errors=%" PRIu64 "\n",
+ received, stats.ierrors);
+
+ TEST_ASSERT_EQUAL(received, 0U,
+ "Expected 0 received packets without scatter, got %u",
+ received);
+ TEST_ASSERT_EQUAL(stats.ierrors, (uint64_t)num_pkts,
+ "Expected %u errors for oversized packets, got %" PRIu64,
+ num_pkts, stats.ierrors);
+
+ cleanup_pcap_vdev("net_pcap_scat_drop", port_id);
+ remove_temp_file(rx_pcap_path);
+
+ printf("Scatter drop oversized PASSED: %" PRIu64 " packets dropped\n",
+ stats.ierrors);
+ return TEST_SUCCESS;
+}
+
+/*
+ * Test: Jumbo packets are scattered when scatter is enabled
+ *
+ * With scatter enabled and a normal mempool, read jumbo-sized packets
+ * from a pcap file. Verify they arrive as multi-segment mbufs with
+ * correct total length.
+ */
+static int
+test_scatter_jumbo_rx(void)
+{
+ struct rte_mbuf *mbufs[NUM_PACKETS];
+ struct rte_eth_conf port_conf;
+ char rx_pcap_path[PATH_MAX];
+ char devargs[256];
+ uint16_t port_id;
+ unsigned int received, i;
+ unsigned int multiseg_count = 0;
+ int ret;
+ const unsigned int num_pkts = 16;
+
+ printf("Testing scatter: jumbo RX with scatter enabled\n");
+
+ TEST_ASSERT(create_temp_path(rx_pcap_path, sizeof(rx_pcap_path),
+ "pcap_scat_jumbo") == 0,
+ "Failed to create temp file path");
+ TEST_ASSERT(create_sized_pcap(rx_pcap_path, num_pkts,
+ PKT_SIZE_JUMBO) == 0,
+ "Failed to create jumbo pcap file");
+
+ ret = snprintf(devargs, sizeof(devargs), "rx_pcap=%s", rx_pcap_path);
+ TEST_ASSERT(ret > 0 && ret < (int)sizeof(devargs),
+ "devargs string truncated");
+ ret = rte_vdev_init("net_pcap_scat_jumbo", devargs);
+ TEST_ASSERT(ret == 0, "Failed to create vdev: %s", rte_strerror(-ret));
+
+ ret = rte_eth_dev_get_port_by_name("net_pcap_scat_jumbo", &port_id);
+ TEST_ASSERT(ret == 0, "Failed to get port ID");
+
+ /* Configure WITH scatter enabled */
+ memset(&port_conf, 0, sizeof(port_conf));
+ port_conf.rxmode.offloads = RTE_ETH_RX_OFFLOAD_SCATTER;
+ ret = rte_eth_dev_configure(port_id, 1, 1, &port_conf);
+ TEST_ASSERT(ret == 0, "Failed to configure port: %s", rte_strerror(-ret));
+
+ ret = rte_eth_rx_queue_setup(port_id, 0, RING_SIZE, SOCKET0, NULL, mp);
+ TEST_ASSERT(ret == 0, "rx_queue_setup failed: %s", rte_strerror(-ret));
+
+ ret = rte_eth_tx_queue_setup(port_id, 0, RING_SIZE, SOCKET0, NULL);
+ TEST_ASSERT(ret == 0, "tx_queue_setup failed: %s", rte_strerror(-ret));
+
+ ret = rte_eth_dev_start(port_id);
+ TEST_ASSERT(ret == 0, "Failed to start port: %s", rte_strerror(-ret));
+
+ receive_packets(port_id, mbufs, num_pkts, &received);
+ TEST_ASSERT_EQUAL(received, num_pkts,
+ "Received %u packets, expected %u", received, num_pkts);
+
+ for (i = 0; i < received; i++) {
+ uint32_t pkt_len = rte_pktmbuf_pkt_len(mbufs[i]);
+ uint16_t nb_segs = mbufs[i]->nb_segs;
+
+ TEST_ASSERT_EQUAL(pkt_len, PKT_SIZE_JUMBO,
+ "Packet %u: size %u, expected %u",
+ i, pkt_len, PKT_SIZE_JUMBO);
+
+ if (nb_segs > 1)
+ multiseg_count++;
+ }
+
+ rte_pktmbuf_free_bulk(mbufs, received);
+ cleanup_pcap_vdev("net_pcap_scat_jumbo", port_id);
+ remove_temp_file(rx_pcap_path);
+
+ /* Jumbo packets should require multiple segments */
+ TEST_ASSERT(multiseg_count > 0,
+ "Expected multi-segment mbufs for jumbo packets");
+
+ printf("Scatter jumbo RX PASSED: %u/%u packets were multi-segment\n",
+ multiseg_count, received);
+ return TEST_SUCCESS;
+}
+
+/*
+ * Test: Asymmetric rx_iface/tx_iface mode
+ *
+ * Verifies that the rx_iface= and tx_iface= devargs work when
+ * specified separately, which exercises a distinct code path from
+ * the symmetric iface= mode.
+ */
+static int
+test_rx_tx_iface(void)
+{
+ struct rte_mbuf *mbufs[MAX_PKT_BURST];
+ char devargs[256];
+ uint16_t port_id;
+ const char *iface;
+ int ret, nb_tx, nb_pkt;
+
+ printf("Testing asymmetric rx_iface/tx_iface mode\n");
+
+ iface = find_test_iface();
+ if (iface == NULL) {
+ printf("No suitable interface, skipping\n");
+ return TEST_SKIPPED;
+ }
+ printf("Using interface: %s\n", iface);
+
+ ret = snprintf(devargs, sizeof(devargs),
+ "rx_iface=%s,tx_iface=%s", iface, iface);
+ TEST_ASSERT(ret > 0 && ret < (int)sizeof(devargs),
+ "devargs string truncated");
+ if (rte_vdev_init("net_pcap_rxtx_iface", devargs) < 0) {
+ printf("Cannot create rx_iface/tx_iface vdev (needs root?), skipping\n");
+ return TEST_SKIPPED;
+ }
+
+ ret = rte_eth_dev_get_port_by_name("net_pcap_rxtx_iface", &port_id);
+ TEST_ASSERT(ret == 0, "Failed to get port ID");
+ TEST_ASSERT(setup_pcap_port(port_id) == 0,
+ "Failed to setup port");
+
+ /* Transmit some packets to verify TX path works */
+ nb_pkt = generate_test_packets(mp, mbufs, MAX_PKT_BURST,
+ PACKET_BURST_GEN_PKT_LEN);
+ TEST_ASSERT(nb_pkt > 0, "Failed to generate packets");
+
+ nb_tx = rte_eth_tx_burst(port_id, 0, mbufs, nb_pkt);
+ if (nb_tx < nb_pkt)
+ rte_pktmbuf_free_bulk(&mbufs[nb_tx], nb_pkt - nb_tx);
+
+ /* RX burst to exercise the receive path (may or may not get packets) */
+ {
+ uint16_t nb_rx = rte_eth_rx_burst(port_id, 0, mbufs, MAX_PKT_BURST);
+ if (nb_rx > 0)
+ rte_pktmbuf_free_bulk(mbufs, nb_rx);
+ }
+
+ cleanup_pcap_vdev("net_pcap_rxtx_iface", port_id);
+
+ printf("rx_iface/tx_iface PASSED: sent %d packets\n", nb_tx);
+ return TEST_SUCCESS;
+}
+
+/*
+ * Test: rx_iface_in direction filtering
+ *
+ * Verifies that rx_iface_in= sets pcap direction to PCAP_D_IN,
+ * which filters out outgoing packets. This exercises the
+ * set_iface_direction() code path and PCAP_D_IN filtering.
+ */
+static int
+test_rx_iface_in(void)
+{
+ struct rte_mbuf *mbufs[MAX_PKT_BURST];
+ char devargs[256];
+ uint16_t port_id;
+ const char *iface;
+ int ret, nb_pkt, nb_tx;
+
+ printf("Testing rx_iface_in direction filtering\n");
+
+ iface = find_test_iface();
+ if (iface == NULL) {
+ printf("No suitable interface, skipping\n");
+ return TEST_SKIPPED;
+ }
+ printf("Using interface: %s\n", iface);
+
+ ret = snprintf(devargs, sizeof(devargs),
+ "rx_iface_in=%s,tx_iface=%s", iface, iface);
+ TEST_ASSERT(ret > 0 && ret < (int)sizeof(devargs),
+ "devargs string truncated");
+ if (rte_vdev_init("net_pcap_iface_in", devargs) < 0) {
+ printf("Cannot create rx_iface_in vdev (needs root?), skipping\n");
+ return TEST_SKIPPED;
+ }
+
+ ret = rte_eth_dev_get_port_by_name("net_pcap_iface_in", &port_id);
+ TEST_ASSERT(ret == 0, "Failed to get port ID");
+ TEST_ASSERT(setup_pcap_port(port_id) == 0,
+ "Failed to setup port");
+
+ /*
+ * Send packets on the TX side. With rx_iface_in (PCAP_D_IN),
+ * our own transmitted packets should NOT appear on the RX side
+ * because they are outgoing, not incoming.
+ */
+ nb_pkt = generate_test_packets(mp, mbufs, MAX_PKT_BURST,
+ PACKET_BURST_GEN_PKT_LEN);
+ TEST_ASSERT(nb_pkt > 0, "Failed to generate packets");
+
+ nb_tx = rte_eth_tx_burst(port_id, 0, mbufs, nb_pkt);
+ if (nb_tx < nb_pkt)
+ rte_pktmbuf_free_bulk(&mbufs[nb_tx], nb_pkt - nb_tx);
+
+ /* Small delay then try to receive - should get 0 (our own TX filtered) */
+ usleep(50 * 1000);
+ {
+ uint16_t nb_rx = rte_eth_rx_burst(port_id, 0, mbufs, MAX_PKT_BURST);
+ if (nb_rx > 0) {
+ printf(" Note: received %u packets (external traffic present)\n",
+ nb_rx);
+ rte_pktmbuf_free_bulk(mbufs, nb_rx);
+ } else {
+ printf(" No packets received (TX correctly filtered by PCAP_D_IN)\n");
+ }
+ }
+
+ cleanup_pcap_vdev("net_pcap_iface_in", port_id);
+
+ printf("rx_iface_in PASSED: direction filtering exercised\n");
+ return TEST_SUCCESS;
+}
+
+/*
+ * Test: Per-queue start and stop
+ *
+ * Verifies that rx_queue_start/stop and tx_queue_start/stop API calls
+ * succeed and return correct status. Note: the pcap PMD burst functions
+ * do not check queue state in the fast path, so this test validates
+ * the API plumbing rather than burst-level gating.
+ */
+static int
+test_queue_start_stop(void)
+{
+ char rx_pcap_path[PATH_MAX];
+ char tx_pcap_path_local[PATH_MAX];
+ char devargs[256];
+ uint16_t port_id;
+ int ret;
+
+ printf("Testing per-queue start/stop\n");
+
+ TEST_ASSERT(create_temp_path(rx_pcap_path, sizeof(rx_pcap_path),
+ "pcap_qss_rx") == 0,
+ "Failed to create RX temp path");
+ TEST_ASSERT(create_temp_path(tx_pcap_path_local, sizeof(tx_pcap_path_local),
+ "pcap_qss_tx") == 0,
+ "Failed to create TX temp path");
+ TEST_ASSERT(create_test_pcap(rx_pcap_path, NUM_PACKETS) == 0,
+ "Failed to create test pcap");
+
+ ret = snprintf(devargs, sizeof(devargs),
+ "rx_pcap=%s,tx_pcap=%s", rx_pcap_path, tx_pcap_path_local);
+ TEST_ASSERT(ret > 0 && ret < (int)sizeof(devargs),
+ "devargs string truncated");
+ TEST_ASSERT(create_pcap_vdev("net_pcap_qss", devargs, &port_id) == 0,
+ "Failed to create vdev");
+ TEST_ASSERT(setup_pcap_port(port_id) == 0,
+ "Failed to setup port");
+
+ /* Stop RX queue */
+ ret = rte_eth_dev_rx_queue_stop(port_id, 0);
+ TEST_ASSERT(ret == 0, "Failed to stop RX queue: %s", rte_strerror(-ret));
+
+ /* Restart RX queue */
+ ret = rte_eth_dev_rx_queue_start(port_id, 0);
+ TEST_ASSERT(ret == 0, "Failed to start RX queue: %s", rte_strerror(-ret));
+
+ /* Stop TX queue */
+ ret = rte_eth_dev_tx_queue_stop(port_id, 0);
+ TEST_ASSERT(ret == 0, "Failed to stop TX queue: %s", rte_strerror(-ret));
+
+ /* Restart TX queue */
+ ret = rte_eth_dev_tx_queue_start(port_id, 0);
+ TEST_ASSERT(ret == 0, "Failed to start TX queue: %s", rte_strerror(-ret));
+
+ /* Verify burst still works after stop/start cycle */
+ {
+ struct rte_mbuf *mbufs[MAX_PKT_BURST];
+ unsigned int received;
+
+ receive_packets(port_id, mbufs, NUM_PACKETS, &received);
+ TEST_ASSERT(received > 0,
+ "Expected packets after queue stop/start cycle, got 0");
+ rte_pktmbuf_free_bulk(mbufs, received);
+ }
+
+ {
+ struct rte_mbuf *mbufs[1];
+
+ TEST_ASSERT(alloc_test_mbufs(mbufs, 1) == 0,
+ "Failed to allocate mbufs");
+ int nb_tx = rte_eth_tx_burst(port_id, 0, mbufs, 1);
+ TEST_ASSERT_EQUAL(nb_tx, 1,
+ "TX after queue stop/start cycle failed: sent %d/1",
+ nb_tx);
+ }
+
+ cleanup_pcap_vdev("net_pcap_qss", port_id);
+ remove_temp_file(rx_pcap_path);
+ remove_temp_file(tx_pcap_path_local);
+
+ printf("Per-queue start/stop PASSED\n");
+ return TEST_SUCCESS;
+}
+
+/*
+ * Test: imissed statistic in interface mode
+ *
+ * Verifies that the imissed counter (based on pcap_stats kernel drops)
+ * is accessible and starts at zero after stats reset. Kernel-level
+ * drops are hard to trigger deterministically, so this test validates
+ * the counter plumbing rather than forcing drops.
+ */
+static int
+test_imissed_stat(void)
+{
+ struct rte_eth_stats stats;
+ char devargs[256];
+ uint16_t port_id;
+ const char *iface;
+ int ret;
+
+ printf("Testing imissed statistic\n");
+
+ iface = find_test_iface();
+ if (iface == NULL) {
+ printf("No suitable interface, skipping\n");
+ return TEST_SKIPPED;
+ }
+ printf("Using interface: %s\n", iface);
+
+ ret = snprintf(devargs, sizeof(devargs), "iface=%s", iface);
+ TEST_ASSERT(ret > 0 && ret < (int)sizeof(devargs),
+ "devargs string truncated");
+ if (rte_vdev_init("net_pcap_imissed", devargs) < 0) {
+ printf("Cannot create iface vdev (needs root?), skipping\n");
+ return TEST_SKIPPED;
+ }
+
+ ret = rte_eth_dev_get_port_by_name("net_pcap_imissed", &port_id);
+ TEST_ASSERT(ret == 0, "Failed to get port ID");
+ TEST_ASSERT(setup_pcap_port(port_id) == 0,
+ "Failed to setup port");
+
+ /* Reset stats and verify imissed starts at zero */
+ ret = rte_eth_stats_reset(port_id);
+ TEST_ASSERT(ret == 0, "Failed to reset stats");
+
+ ret = rte_eth_stats_get(port_id, &stats);
+ TEST_ASSERT(ret == 0, "Failed to get stats");
+
+ TEST_ASSERT_EQUAL(stats.imissed, 0U,
+ "imissed should be 0 after reset, got %"PRIu64,
+ stats.imissed);
+
+ /* Do some RX bursts to exercise the pcap_stats path */
+ {
+ struct rte_mbuf *mbufs[MAX_PKT_BURST];
+ int attempts;
+
+ for (attempts = 0; attempts < 5; attempts++) {
+ uint16_t nb_rx = rte_eth_rx_burst(port_id, 0,
+ mbufs, MAX_PKT_BURST);
+ if (nb_rx > 0)
+ rte_pktmbuf_free_bulk(mbufs, nb_rx);
+ }
+ }
+
+ /* Query stats again - imissed should still be queryable */
+ ret = rte_eth_stats_get(port_id, &stats);
+ TEST_ASSERT(ret == 0, "Failed to get stats after RX");
+ printf(" imissed=%"PRIu64" (kernel drops via pcap_stats)\n",
+ stats.imissed);
+
+ cleanup_pcap_vdev("net_pcap_imissed", port_id);
+
+ printf("imissed stat PASSED\n");
+ return TEST_SUCCESS;
+}
+
+/*
+ * Test suite teardown
+ */
+static void
+test_teardown(void)
+{
+ /* Cleanup shared temp files */
+ remove_temp_file(tx_pcap_path);
+ remove_temp_file(vlan_rx_pcap_path);
+
+ rte_mempool_free(mp);
+ mp = NULL;
+}
+
+static struct unit_test_suite test_pmd_pcap_suite = {
+ .setup = test_setup,
+ .teardown = test_teardown,
+ .suite_name = "PCAP PMD Unit Test Suite",
+ .unit_test_cases = {
+ TEST_CASE(test_dev_info),
+ TEST_CASE(test_tx_to_file),
+ TEST_CASE(test_rx_from_file),
+ TEST_CASE(test_tx_varied_sizes),
+ TEST_CASE(test_rx_varied_sizes),
+ TEST_CASE(test_jumbo_rx),
+ TEST_CASE(test_jumbo_tx),
+ TEST_CASE(test_infinite_rx),
+ TEST_CASE(test_tx_drop),
+ TEST_CASE(test_stats),
+ TEST_CASE(test_iface),
+ TEST_CASE(test_link_status),
+ TEST_CASE(test_lsc_iface),
+ TEST_CASE(test_eof_rx),
+ TEST_CASE(test_rx_timestamp),
+ TEST_CASE(test_multi_tx_queue),
+ TEST_CASE(test_multi_rx_queue_same_file),
+ TEST_CASE(test_vlan_strip_rx),
+ TEST_CASE(test_vlan_insert_tx),
+ TEST_CASE(test_vlan_no_strip_rx),
+ TEST_CASE(test_vlan_offload_set),
+ TEST_CASE(test_vlan_strip_infinite_rx),
+ TEST_CASE(test_timestamp_infinite_rx),
+ TEST_CASE(test_snaplen),
+ TEST_CASE(test_snaplen_truncation),
+ TEST_CASE(test_snaplen_truncation_multiseg),
+ TEST_CASE(test_scatter_drop_oversized),
+ TEST_CASE(test_scatter_jumbo_rx),
+ TEST_CASE(test_rx_tx_iface),
+ TEST_CASE(test_rx_iface_in),
+ TEST_CASE(test_queue_start_stop),
+ TEST_CASE(test_imissed_stat),
+
+ TEST_CASES_END()
+ }
+};
+
+static int
+test_pmd_pcap(void)
+{
+ return unit_test_suite_runner(&test_pmd_pcap_suite);
+}
+
+REGISTER_FAST_TEST(pcap_pmd_autotest, NOHUGE_OK, ASAN_OK, test_pmd_pcap);
diff --git a/doc/guides/rel_notes/release_26_03.rst b/doc/guides/rel_notes/release_26_03.rst
index 6752cf599a..8199149792 100644
--- a/doc/guides/rel_notes/release_26_03.rst
+++ b/doc/guides/rel_notes/release_26_03.rst
@@ -145,6 +145,7 @@ New Features
* Added support for Link State interrupt in ``iface`` mode.
* Added ``eof`` devarg to use link state to signal end of receive
file input.
+ * Added unit test suite.
Removed Items
--
2.53.0
^ permalink raw reply related
* Re: [PATCH 2/2] eal: add meson options for hotplug MP message buffer sizes
From: Stephen Hemminger @ 2026-03-25 2:44 UTC (permalink / raw)
To: Long Li; +Cc: dev, bruce.richardson
In-Reply-To: <20260325014506.1866374-2-longli@microsoft.com>
On Tue, 24 Mar 2026 18:45:06 -0700
Long Li <longli@microsoft.com> wrote:
> Add meson build options to allow increasing the multi-process hotplug
> message buffer limits at build time for deployments with many NICs:
> - 'dev_mp_devargs_max_len' (default 128): max device args length
> - 'mp_max_param_len' (default 256): max MP IPC message param length
>
> Example: meson setup build -Ddev_mp_devargs_max_len=256 -Dmp_max_param_len=512
>
> Guard the existing #defines with #ifndef so the meson-generated values
> from rte_build_config.h take precedence when overridden.
>
> Add a static_assert to ensure eal_dev_mp_req fits within the MP message
> param buffer, catching misconfiguration at compile time.
>
> Note: all primary and secondary processes must be built with the same
> values, as these sizes affect shared IPC message struct layouts.
>
> Signed-off-by: Long Li <longli@microsoft.com>
The whole mp API needs some work on sizing.
Ideally the message would be variable size and not include all the
file descriptors if not needed.
Even better it should be TLV encoded instead of fixed structure.
But doing this probably has to wait until 26.11.
^ permalink raw reply
* [PATCH] common/cnxk: allow typecasting to CN20K NPA structuress
From: Nawal Kishor @ 2026-03-25 5:17 UTC (permalink / raw)
To: dev, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
Satha Rao, Harman Kalra
Cc: jerinj, asekhar, Nawal Kishor
Add __attribute__((may_alias)) to the CN20K-specific NPA structures
(npa_cn20k_aura_s, npa_cn20k_pool_s, and npa_cn20k_halo_s) to allow
safe type punning when casting between these structures and their
base types (npa_aura_s and npa_pool_s).
This attribute tells the compiler that these structures may alias
with other types, which is necessary when casting pointers between
compatible hardware register structures that share the same memory
layout. Without this attribute, such casts violate strict aliasing
rules and can lead to incorrect compiler optimizations.
Signed-off-by: Nawal Kishor <nkishor@marvell.com>
---
drivers/common/cnxk/hw/npa.h | 8 +++++---
drivers/common/cnxk/roc_platform.h | 1 +
2 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/common/cnxk/hw/npa.h b/drivers/common/cnxk/hw/npa.h
index 8d6b6bbe8b..aea24b69da 100644
--- a/drivers/common/cnxk/hw/npa.h
+++ b/drivers/common/cnxk/hw/npa.h
@@ -5,6 +5,8 @@
#ifndef __NPA_HW_H__
#define __NPA_HW_H__
+#include "roc_platform.h"
+
/* Register offsets */
#define NPA_AF_BLK_RST (0x0ull)
@@ -389,7 +391,7 @@ struct npa_cn20k_aura_s {
uint64_t stream_ctx : 1;
uint64_t unified_ctx : 1;
uint64_t rsvd_511_448 : 64; /* W7 */
-};
+} __plt_may_alias;
/* NPA pool context structure [CN20K] */
struct npa_cn20k_pool_s {
@@ -465,7 +467,7 @@ struct npa_cn20k_pool_s {
uint64_t rsvd_895_832 : 64; /* W13 */
uint64_t rsvd_959_896 : 64; /* W14 */
uint64_t rsvd_1023_960 : 64; /* W15 */
-};
+} __plt_may_alias;
/* NPA halo context structure [CN20K] */
struct npa_cn20k_halo_s {
@@ -545,7 +547,7 @@ struct npa_cn20k_halo_s {
uint64_t reserved_895_832 : 64; /* W13 */
uint64_t reserved_959_896 : 64; /* W14 */
uint64_t reserved_1023_960 : 64; /* W15 */
-};
+} __plt_may_alias;
/* NPA queue interrupt context hardware structure */
struct npa_qint_hw_s {
diff --git a/drivers/common/cnxk/roc_platform.h b/drivers/common/cnxk/roc_platform.h
index e22a50d47a..73cc12e567 100644
--- a/drivers/common/cnxk/roc_platform.h
+++ b/drivers/common/cnxk/roc_platform.h
@@ -100,6 +100,7 @@
#define __plt_packed_begin __rte_packed_begin
#define __plt_packed_end __rte_packed_end
#define __plt_unused __rte_unused
+#define __plt_may_alias __rte_may_alias
#define __roc_api __rte_internal
#define plt_iova_t rte_iova_t
--
2.48.1
^ permalink raw reply related
* [PATCH v2] common/cnxk: allow typecasting to CN20K NPA structures
From: Nawal Kishor @ 2026-03-25 6:21 UTC (permalink / raw)
To: dev, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
Satha Rao, Harman Kalra
Cc: jerinj, asekhar, Nawal Kishor
In-Reply-To: <20260325051756.2588380-1-nkishor@marvell.com>
Add __attribute__((may_alias)) to the CN20K-specific NPA structures
(npa_cn20k_aura_s, npa_cn20k_pool_s, and npa_cn20k_halo_s) to allow
safe type punning when casting between these structures and their
base types (npa_aura_s and npa_pool_s).
This attribute tells the compiler that these structures may alias
with other types, which is necessary when casting pointers between
compatible hardware register structures that share the same memory
layout. Without this attribute, such casts violate strict aliasing
rules and can lead to incorrect compiler optimizations.
Signed-off-by: Nawal Kishor <nkishor@marvell.com>
---
v2:
* Fixed commit message typo
drivers/common/cnxk/hw/npa.h | 8 +++++---
drivers/common/cnxk/roc_platform.h | 1 +
2 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/common/cnxk/hw/npa.h b/drivers/common/cnxk/hw/npa.h
index 8d6b6bbe8b..aea24b69da 100644
--- a/drivers/common/cnxk/hw/npa.h
+++ b/drivers/common/cnxk/hw/npa.h
@@ -5,6 +5,8 @@
#ifndef __NPA_HW_H__
#define __NPA_HW_H__
+#include "roc_platform.h"
+
/* Register offsets */
#define NPA_AF_BLK_RST (0x0ull)
@@ -389,7 +391,7 @@ struct npa_cn20k_aura_s {
uint64_t stream_ctx : 1;
uint64_t unified_ctx : 1;
uint64_t rsvd_511_448 : 64; /* W7 */
-};
+} __plt_may_alias;
/* NPA pool context structure [CN20K] */
struct npa_cn20k_pool_s {
@@ -465,7 +467,7 @@ struct npa_cn20k_pool_s {
uint64_t rsvd_895_832 : 64; /* W13 */
uint64_t rsvd_959_896 : 64; /* W14 */
uint64_t rsvd_1023_960 : 64; /* W15 */
-};
+} __plt_may_alias;
/* NPA halo context structure [CN20K] */
struct npa_cn20k_halo_s {
@@ -545,7 +547,7 @@ struct npa_cn20k_halo_s {
uint64_t reserved_895_832 : 64; /* W13 */
uint64_t reserved_959_896 : 64; /* W14 */
uint64_t reserved_1023_960 : 64; /* W15 */
-};
+} __plt_may_alias;
/* NPA queue interrupt context hardware structure */
struct npa_qint_hw_s {
diff --git a/drivers/common/cnxk/roc_platform.h b/drivers/common/cnxk/roc_platform.h
index e22a50d47a..73cc12e567 100644
--- a/drivers/common/cnxk/roc_platform.h
+++ b/drivers/common/cnxk/roc_platform.h
@@ -100,6 +100,7 @@
#define __plt_packed_begin __rte_packed_begin
#define __plt_packed_end __rte_packed_end
#define __plt_unused __rte_unused
+#define __plt_may_alias __rte_may_alias
#define __roc_api __rte_internal
#define plt_iova_t rte_iova_t
--
2.48.1
^ permalink raw reply related
* RE: [PATCH v20 25/25] app/pdump: preserve VLAN tags in captured packets
From: Morten Brørup @ 2026-03-25 7:41 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Bruce Richardson, dev, Reshma Pattan
In-Reply-To: <20260324101209.04ffae54@phoenix.local>
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Tuesday, 24 March 2026 18.12
>
> On Mon, 16 Mar 2026 16:55:29 +0100
> Morten Brørup <mb@smartsharesystems.com> wrote:
>
> > >
> > > This is an example of something I previously flagged. Like with
> real
> > > hardware, I think the PMD should be inserting the VLAN tag into the
> > > packet
> > > as part of the Tx function, not the prepare function.
> >
> > Agree with Bruce on this.
> > For simple stuff like VLAN offload, applications should not be
> required to call tx_prep first.
> >
> > However, the Tx function is supposed to not modify the packets;
> relevant when refcnt > 1.
> >
> > Instead of modifying the packet data to insert/strip the VLAN tag,
> > perhaps the driver can split the write/read operation into multiple
> write/read operations:
> > 1. the Ethernet header
> > 2. the VLAN tag
> > 3. the remaining packet data
> >
> > I haven't really followed the pcap driver, so maybe my suggestion
> doesn't make sense.
>
> The prepare code and VLAN was copied from virtio.
> I assume virtio is widely used already.
OK, that makes it harder to object to.
<rant>
I checked out the virtio code.
virtio_xmit_pkts_prepare() calls rte_vlan_insert().
And rte_vlan_insert() doesn't support mbuf refcnt > 1.
So basically, these drivers don't support simple VLAN tagging when mbuf refcnt > 1.
This seems like a major limitation, which should be prominently documented.
A decade ago, when we started using DPDK for our projects, one of the things I loved about it was its documentation.
But after a while, I noticed that a lot of the documented mbuf library features weren't fully implemented.
These limitations should be documented.
E.g. if an important driver like virtio doesn't support VLAN tagging of indirect/cloned mbufs, this should be prominently highlighted in documentation!
</rant>
^ permalink raw reply
* Re: Request for Review of Fixes Applied for DPDK 24.11 and 25.11 Compilation Errors
From: Thomas Monjalon @ 2026-03-25 8:17 UTC (permalink / raw)
To: Reema Sharma
Cc: dev@dpdk.org, gakhil@marvell.com, david.marchand@redhat.com,
stephen@networkplumber.org, anatoly.burakov@intel.com,
jack.bond-preston@foss.arm.com, gakhil@marvell.com,
fanzhang.oss@gmail.com, kai.ji@intel.com, Prakash Durgapal,
Pratap Rana, Vijay Kumar Mahto, Jaydipkumar Dhameliya, Liu1, Kai,
Zhang, Liheng, Xiong, Tanghong, Mattias Rönnblom
In-Reply-To: <DS0PR08MB9465E1596F77A744A34FECD68649A@DS0PR08MB9465.namprd08.prod.outlook.com>
Hello,
If I understand well, you hit 2 issues while compiling a C++ app linked with DPDK?
Could you share the exact version of your compiler?
For rte_bitops.h, if disabling these macros is OK for you, go with it for now.
We will need to understand what happens exactly with __RTE_BIT_OVERLOAD macros.
For rte_cryptodev.h, it may be hiding an issue somewhere else.
Please could you share the exact error message?
25/03/2026 07:55, Reema Sharma:
> Hi Team,
>
> Could you please review the attached DPDK patch (dpdk-24.11_patch_for_crypto.patch) and confirm whether the applied fixes are acceptable from the DPDK perspective?
> Your guidance on the correct fix, if any changes are needed, would help us proceed with CU compilation.
> Kindly share an update at your earliest convenience.
> Thanks & Regards,
> Reema Sharma
> ________________________________
> From: Xiong, Tanghong <tanghong.xiong@intel.com>
> Sent: Friday, March 6, 2026 3:26 PM
> To: Zhang, Liheng <liheng.zhang@intel.com>; Reema Sharma <Reema.Sharma@radisys.com>; Liu1, Kai <kai.liu1@intel.com>
> Cc: Prakash Durgapal <Prakash.Durgapal@radisys.com>; Pratap Rana <Pratap.Rana@radisys.com>; Vijay Kumar Mahto <Vijay.Mahto@radisys.com>; Jaydipkumar Dhameliya <Jaydipkumar.Dhameliya@radisys.com>
> Subject: RE: Request for Review of Fixes Applied for DPDK 24.11 and 25.11 Compilation Errors
>
>
> Thanks Liheng,
>
> Just got other info that, you can contact DPDK directly through this mail: dev@dpdk.org<mailto:dev@dpdk.org>, but the response may be slow.
>
>
>
> BRs,
>
> Tanghong
>
>
>
> From: Zhang, Liheng <liheng.zhang@intel.com>
> Sent: Friday, March 6, 2026 5:51 PM
> To: Reema Sharma <Reema.Sharma@radisys.com>; Xiong, Tanghong <tanghong.xiong@intel.com>; Liu1, Kai <kai.liu1@intel.com>
> Cc: Durgapal, Prakash <prakash.durgapal@radisys.com>; Pratap Rana <Pratap.Rana@radisys.com>; Vijay Kumar Mahto <Vijay.Mahto@radisys.com>; Jaydipkumar Dhameliya <Jaydipkumar.Dhameliya@radisys.com>
> Subject: RE: Request for Review of Fixes Applied for DPDK 24.11 and 25.11 Compilation Errors
>
>
>
> Yes, there is a file “MAINTAINERS” in the DPDK root path.
>
> The file list all the contactor for all dpdk libraries.
>
> You can search and contact the person for crypto issue.
>
> If you can’t find it, please let me know, I can forward to you.
>
>
>
> From: Reema Sharma <Reema.Sharma@radisys.com<mailto:Reema.Sharma@radisys.com>>
> Sent: Friday, March 6, 2026 5:45 PM
> To: Zhang, Liheng <liheng.zhang@intel.com<mailto:liheng.zhang@intel.com>>; Xiong, Tanghong <tanghong.xiong@intel.com<mailto:tanghong.xiong@intel.com>>; Liu1, Kai <kai.liu1@intel.com<mailto:kai.liu1@intel.com>>
> Cc: Durgapal, Prakash <prakash.durgapal@radisys.com<mailto:prakash.durgapal@radisys.com>>; Pratap Rana <Pratap.Rana@radisys.com<mailto:Pratap.Rana@radisys.com>>; Vijay Kumar Mahto <Vijay.Mahto@radisys.com<mailto:Vijay.Mahto@radisys.com>>; Jaydipkumar Dhameliya <Jaydipkumar.Dhameliya@radisys.com<mailto:Jaydipkumar.Dhameliya@radisys.com>>
> Subject: Re: Request for Review of Fixes Applied for DPDK 24.11 and 25.11 Compilation Errors
>
>
>
> Hi Zhang, Liheng<mailto:liheng.zhang@intel.com>,
>
> We have verified the behaviour on CU side and confirmed that this is not caused by our code. The issue appears to be related to a DPDK bug.
>
> At the moment, we do not have any contact information for the DPDK maintainers.
> Could you please check internally and share the relevant maintainer details or forward this issue on behalf of Radisys?
>
> Thanks & Regards,
>
> Reema Sharma
>
> ________________________________
>
> From: Zhang, Liheng <liheng.zhang@intel.com<mailto:liheng.zhang@intel.com>>
> Sent: Friday, March 6, 2026 2:49 PM
> To: Xiong, Tanghong <tanghong.xiong@intel.com<mailto:tanghong.xiong@intel.com>>; Reema Sharma <Reema.Sharma@radisys.com<mailto:Reema.Sharma@radisys.com>>; Liu1, Kai <kai.liu1@intel.com<mailto:kai.liu1@intel.com>>
> Cc: Prakash Durgapal <Prakash.Durgapal@radisys.com<mailto:Prakash.Durgapal@radisys.com>>; Pratap Rana <Pratap.Rana@radisys.com<mailto:Pratap.Rana@radisys.com>>; Vijay Kumar Mahto <Vijay.Mahto@radisys.com<mailto:Vijay.Mahto@radisys.com>>; Jaydipkumar Dhameliya <Jaydipkumar.Dhameliya@radisys.com<mailto:Jaydipkumar.Dhameliya@radisys.com>>
> Subject: RE: Request for Review of Fixes Applied for DPDK 24.11 and 25.11 Compilation Errors
>
>
>
> The e-mail below is from an external source. Please do not open attachments or click links from an unknown or suspicious origin.
>
> Hi Reema
>
> Looks the changes are ok, but you need verify them by compiling your code.
>
> As I know, the following are the better procedures for DPDK bug fix:
>
> 1. Please first confirm the error are not related to your own code before changing DPDK code;
> 2. If it is real DPDK bug, you need contact the according DPDK maintainer to fix it;
> 3. Finally DPDK maintainer will provide an official patch;
>
>
>
> From: Xiong, Tanghong <tanghong.xiong@intel.com<mailto:tanghong.xiong@intel.com>>
> Sent: Friday, March 6, 2026 3:52 PM
> To: Reema Sharma <Reema.Sharma@radisys.com<mailto:Reema.Sharma@radisys.com>>; Liu1, Kai <kai.liu1@intel.com<mailto:kai.liu1@intel.com>>; Zhang, Liheng <liheng.zhang@intel.com<mailto:liheng.zhang@intel.com>>
> Cc: Durgapal, Prakash <prakash.durgapal@radisys.com<mailto:prakash.durgapal@radisys.com>>; Pratap Rana <Pratap.Rana@radisys.com<mailto:Pratap.Rana@radisys.com>>; Vijay Kumar Mahto <Vijay.Mahto@radisys.com<mailto:Vijay.Mahto@radisys.com>>; Jaydipkumar Dhameliya <Jaydipkumar.Dhameliya@radisys.com<mailto:Jaydipkumar.Dhameliya@radisys.com>>
> Subject: RE: Request for Review of Fixes Applied for DPDK 24.11 and 25.11 Compilation Errors
>
>
>
> Hello Reema,
>
>
>
> Sorry for late reply, copy @Zhang, Liheng<mailto:liheng.zhang@intel.com> here for comment, thanks Liheng in advance.
>
>
>
> BRs,
>
> Tanghong
>
>
>
> From: Reema Sharma <Reema.Sharma@radisys.com<mailto:Reema.Sharma@radisys.com>>
> Sent: Friday, March 6, 2026 1:41 PM
> To: Xiong, Tanghong <tanghong.xiong@intel.com<mailto:tanghong.xiong@intel.com>>; Liu1, Kai <kai.liu1@intel.com<mailto:kai.liu1@intel.com>>
> Cc: Durgapal, Prakash <prakash.durgapal@radisys.com<mailto:prakash.durgapal@radisys.com>>; Pratap Rana <Pratap.Rana@radisys.com<mailto:Pratap.Rana@radisys.com>>; Vijay Kumar Mahto <Vijay.Mahto@radisys.com<mailto:Vijay.Mahto@radisys.com>>; Jaydipkumar Dhameliya <Jaydipkumar.Dhameliya@radisys.com<mailto:Jaydipkumar.Dhameliya@radisys.com>>
> Subject: Re: Request for Review of Fixes Applied for DPDK 24.11 and 25.11 Compilation Errors
>
>
>
> Hi Xiong, Tanghong<mailto:tanghong.xiong@intel.com>/ Liu1, Kai<mailto:kai.liu1@intel.com>,
>
> Hope you are doing well.
>
> I am writing to check if there is any update on the pending DPDK issue raised in this mail.
> Could you please share the latest status, or let me know if any additional inputs are required from my side to help move this forward?
>
> Looking forward to your response.
>
> Thanks & Regards,
>
> Reema Sharma
>
> ________________________________
>
> From: Reema Sharma
> Sent: Wednesday, February 11, 2026 11:27 AM
> To: Xiong, Tanghong <tanghong.xiong@intel.com<mailto:tanghong.xiong@intel.com>>; Liu1, Kai <kai.liu1@intel.com<mailto:kai.liu1@intel.com>>
> Cc: Prakash Durgapal <Prakash.Durgapal@radisys.com<mailto:Prakash.Durgapal@radisys.com>>; Pratap Rana <Pratap.Rana@radisys.com<mailto:Pratap.Rana@radisys.com>>; Vijay Kumar Mahto <Vijay.Mahto@radisys.com<mailto:Vijay.Mahto@radisys.com>>; Jaydipkumar Dhameliya <Jaydipkumar.Dhameliya@radisys.com<mailto:Jaydipkumar.Dhameliya@radisys.com>>
> Subject: Request for Review of Fixes Applied for DPDK 24.11 and 25.11 Compilation Errors
>
>
>
> Hi Xiong, Tanghong<mailto:tanghong.xiong@intel.com>/ Liu1, Kai<mailto:kai.liu1@intel.com>,
>
> While compiling the CU with DPDK 24.11 and DPDK 25.11, we encountered the following two errors:
>
> 1. Conflicting type definitions in the DPDK header file rte_bitops.h
>
> [cid:image001.png@01DCAD92.9C6111F0]
>
> 1. “template with C linkage” error in the DPDK header file rte_cryptodev.h
>
> To proceed with our CU compilation, we applied the required fixes in the respective DPDK include files and generated a patch named dpdk-24.11_patch_for_crypto.patch. The updated changes are included in the attached patch file for your review.
>
> Could you please review these changes from the DPDK side and confirm whether they are acceptable, or advise on the correct fix if modifications are required?
>
> Thanks & Regards,
> Reema Sharma
>
^ permalink raw reply
* [PATCH 26.07 1/5] common/mlx5: query vport VHCA ID
From: Dariusz Sosnowski @ 2026-03-25 9:07 UTC (permalink / raw)
To: Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
Matan Azrad; +Cc: dev
In-Reply-To: <20260325090758.42403-1-dsosnowski@nvidia.com>
Extend port info returned by mlx5_glue_devx_port_query()
with VHCA ID of the device related to the IB port.
This ID will be later used to implement source vport matching
without E-Switch vport metadata enabled.
Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
drivers/common/mlx5/linux/mlx5_glue.c | 4 ++++
drivers/common/mlx5/linux/mlx5_glue.h | 2 ++
2 files changed, 6 insertions(+)
diff --git a/drivers/common/mlx5/linux/mlx5_glue.c b/drivers/common/mlx5/linux/mlx5_glue.c
index a91eaa429d..56eaedf0a2 100644
--- a/drivers/common/mlx5/linux/mlx5_glue.c
+++ b/drivers/common/mlx5/linux/mlx5_glue.c
@@ -1263,6 +1263,10 @@ mlx5_glue_devx_port_query(struct ibv_context *ctx,
info->vport_id = devx_port.vport;
info->query_flags |= MLX5_PORT_QUERY_VPORT;
}
+ if (devx_port.flags & MLX5DV_QUERY_PORT_VPORT_VHCA_ID) {
+ info->vport_vhca_id = devx_port.vport_vhca_id;
+ info->query_flags |= MLX5_PORT_QUERY_VPORT_VHCA_ID;
+ }
if (devx_port.flags & MLX5DV_QUERY_PORT_ESW_OWNER_VHCA_ID) {
info->esw_owner_vhca_id = devx_port.esw_owner_vhca_id;
info->query_flags |= MLX5_PORT_QUERY_ESW_OWNER_VHCA_ID;
diff --git a/drivers/common/mlx5/linux/mlx5_glue.h b/drivers/common/mlx5/linux/mlx5_glue.h
index 81d6b0aaf9..0610e7778e 100644
--- a/drivers/common/mlx5/linux/mlx5_glue.h
+++ b/drivers/common/mlx5/linux/mlx5_glue.h
@@ -92,11 +92,13 @@ struct mlx5dv_port;
#define MLX5_PORT_QUERY_VPORT (1u << 0)
#define MLX5_PORT_QUERY_REG_C0 (1u << 1)
#define MLX5_PORT_QUERY_ESW_OWNER_VHCA_ID (1u << 2)
+#define MLX5_PORT_QUERY_VPORT_VHCA_ID (1u << 3)
struct mlx5_port_info {
uint16_t query_flags;
uint16_t vport_id; /* Associated VF vport index (if any). */
uint16_t esw_owner_vhca_id; /* Associated the esw_owner that this VF belongs to. */
+ uint16_t vport_vhca_id; /* VHCA ID of the function associated with the vport. */
uint32_t vport_meta_tag; /* Used for vport index match ove VF LAG. */
uint32_t vport_meta_mask; /* Used for vport index field match mask. */
};
--
2.47.3
^ permalink raw reply related
* [PATCH 26.07 0/5] net/mlx5: legacy vport match support with HWS
From: Dariusz Sosnowski @ 2026-03-25 9:07 UTC (permalink / raw)
To: Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
Matan Azrad; +Cc: dev
NVIDIA NICs and DPUs have an embedded switch (E-Switch)
to which all associated functions (PFs, VFs, SFs) and physical ports
are connected through virtual ports (vports).
mlx5 PMD supports devices with E-Switch enabled and
also exposes flow rule offloading to the E-Switch
through transfer flow rules.
Flow rules matching on vports (with flow item REPRESENTED_PORT)
can match on specific vport using one of the two internal mechanisms:
- vport metadata - E-Switch internally tags all packets with a metadata value
associated with originating vport.
mlx5 PMD can match on that metadata through a specialized HW register in each flow rule.
vport metadata enables features such as VF-LAG, Multiport E-Switch and
others mentioned in [1].
- legacy match - A static value known as "source vport" is assigned to each vport.
These values are not globally unique, because they are statically assigned per E-Switch
(e.g., VF0 on each of the PFs will have the same source_vport number).
Users can select the vport matching mode through devlink [1].
vport metadata matching mode is the default and is enough for most of the use cases.
However, internally tagging all the packets with metadata values,
increases the packet latency in the E-Switch. As described in the linked kernel docs,
disabling vport metadata matching can increase packet rate up to 20%.
If features provided by vport metadata matching are not required,
it can be disabled to increase E-Switch's throughput.
mlx5 PMD with HW Steering flow engine enabled, only supported
vport metadata mode when running on a device with enabled E-Switch.
Goal of this patchset is to enable support of devices with
disabled vport metadata matching. This is purely an internal change in mlx5 PMD.
No changes in DPDK applications should be required.
- Patches 1-2 - Extend information queried from the device by mlx5 PMD
to include information necessary to implement legacy vport match.
This data is always available, regardless of selected E-Switch matching mode.
- Patch 3 - Adjusts internal translation of DPDK port ID to flow rule matching data,
so that returned data is always valid even if vport metadata matching is disabled.
Also adds detection of vport matching mode to HWS layer.
- Patch 4 - Adds support for REPRESENTED_PORT item in HWS layer
whenever vport metadata matching mode is disabled.
This involves correctly translating DPDK port index to "source vport".
- Patch 5 - Removes all validation checks from mlx5 PMD which prevented probing
on devices with disabled vport metadata matching mode.
Adjusts internal PMD logic to work with "source vport" whenever needed.
[1]: https://docs.kernel.org/networking/devlink/mlx5.html
Dariusz Sosnowski (5):
common/mlx5: query vport VHCA ID
net/mlx5: store port VHCA ID
net/mlx5: return port info regardless of register mask
net/mlx5/hws: add source vport match in HWS
net/mlx5: allow legacy source vport match
drivers/common/mlx5/linux/mlx5_glue.c | 4 ++
drivers/common/mlx5/linux/mlx5_glue.h | 2 +
drivers/net/mlx5/hws/mlx5dr_cmd.c | 13 +++-
drivers/net/mlx5/hws/mlx5dr_cmd.h | 1 +
drivers/net/mlx5/hws/mlx5dr_definer.c | 99 ++++++++++++++++++++++-----
drivers/net/mlx5/hws/mlx5dr_definer.h | 2 +
drivers/net/mlx5/hws/mlx5dr_table.c | 6 ++
drivers/net/mlx5/linux/mlx5_os.c | 38 +++-------
drivers/net/mlx5/mlx5.h | 4 ++
drivers/net/mlx5/mlx5_flow.h | 3 +-
drivers/net/mlx5/mlx5_flow_dv.c | 3 +
drivers/net/mlx5/mlx5_flow_hw.c | 34 ++++-----
12 files changed, 140 insertions(+), 69 deletions(-)
--
2.47.3
^ permalink raw reply
* [PATCH 26.07 2/5] net/mlx5: store port VHCA ID
From: Dariusz Sosnowski @ 2026-03-25 9:07 UTC (permalink / raw)
To: Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
Matan Azrad; +Cc: dev
In-Reply-To: <20260325090758.42403-1-dsosnowski@nvidia.com>
Save VHCA ID (if available) of each mlx5 port and
of E-Switch Manager, and update port info cache
(mlx5_flow_hw_port_infos array) used for translating
DPDK port ID into HW matching value and mask,
when matching on source vport.
This data will be later used to implement source vport matching
without E-Switch vport metadata enabled.
Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
drivers/net/mlx5/linux/mlx5_os.c | 3 +++
drivers/net/mlx5/mlx5.h | 2 ++
drivers/net/mlx5/mlx5_flow_hw.c | 5 ++---
3 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index a717191002..3a9b019601 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -377,6 +377,7 @@ mlx5_os_capabilities_prepare(struct mlx5_dev_ctx_shared *sh)
hca_attr->scatter_fcs_w_decap_disable;
sh->dev_cap.rq_delay_drop_en = hca_attr->rq_delay_drop;
mlx5_rt_timestamp_config(sh, hca_attr);
+ sh->dev_cap.esw_info.vhca_id = hca_attr->vhca_id;
#ifdef HAVE_IBV_DEVICE_ATTR_ESW_MGR_REG_C0
if (dv_attr.comp_mask & MLX5DV_CONTEXT_MASK_REG_C0) {
sh->dev_cap.esw_info.regc_value = dv_attr.reg_c0.value;
@@ -1524,6 +1525,8 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
goto error;
}
}
+ if (vport_info.query_flags & MLX5_PORT_QUERY_VPORT_VHCA_ID)
+ priv->vport_vhca_id = vport_info.vport_vhca_id;
if (vport_info.query_flags & MLX5_PORT_QUERY_VPORT) {
priv->vport_id = vport_info.vport_id;
} else if (spawn->pf_bond >= 0 && sh->esw_mode) {
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 4da184eb47..7ae9129e46 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -152,6 +152,7 @@ struct mlx5_flow_cb_ctx {
struct flow_hw_port_info {
uint32_t regc_mask;
uint32_t regc_value;
+ uint32_t vhca_id;
uint32_t is_wire:1;
uint32_t direction:2;
};
@@ -2003,6 +2004,7 @@ struct mlx5_priv {
uint32_t jump_fdb_rx_en:1; /* Jump from FDB Tx to FDB Rx flag per port. */
uint16_t domain_id; /* Switch domain identifier. */
uint16_t vport_id; /* Associated VF vport index (if any). */
+ uint16_t vport_vhca_id; /* VHCA ID of the associated vport (if any). */
uint32_t vport_meta_tag; /* Used for vport index match ove VF LAG. */
uint32_t vport_meta_mask; /* Used for vport index field match mask. */
uint16_t representor_id; /* UINT16_MAX if not a representor. */
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index bca5b2769e..adbd4f33b0 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -12305,6 +12305,7 @@ mlx5_flow_hw_set_port_info(struct rte_eth_dev *dev)
info = &mlx5_flow_hw_port_infos[port_id];
info->regc_mask = priv->vport_meta_mask;
info->regc_value = priv->vport_meta_tag;
+ info->vhca_id = priv->vport_vhca_id;
info->is_wire = mlx5_is_port_on_mpesw_device(priv) ? priv->mpesw_uplink : priv->master;
}
@@ -12317,9 +12318,7 @@ mlx5_flow_hw_clear_port_info(struct rte_eth_dev *dev)
MLX5_ASSERT(port_id < RTE_MAX_ETHPORTS);
info = &mlx5_flow_hw_port_infos[port_id];
- info->regc_mask = 0;
- info->regc_value = 0;
- info->is_wire = 0;
+ memset(info, 0, sizeof(*info));
}
static int
--
2.47.3
^ permalink raw reply related
* [PATCH 26.07 3/5] net/mlx5: return port info regardless of register mask
From: Dariusz Sosnowski @ 2026-03-25 9:07 UTC (permalink / raw)
To: Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
Matan Azrad; +Cc: dev
In-Reply-To: <20260325090758.42403-1-dsosnowski@nvidia.com>
Previously if the port info did not have REG_C mask set,
flow_hw_conv_port_id() returned NULL.
In order to allow for support of legacy source vport match,
this patch changes that logic.
Now flow port info will be returned for all ports,
regardless of REG_C mask value.
HWS layer, depending on REG_C mask value (zero or non-zero)
will decide which matching mode should be used for
REPRESENTED_PORT items.
If REG_C mask is available, metadata matching will be used.
Otherwise, legacy source vport match will be used.
Definer handling for legacy source vport match
will be added in the follow up commit.
Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
drivers/net/mlx5/hws/mlx5dr_cmd.c | 13 +++++++--
drivers/net/mlx5/hws/mlx5dr_cmd.h | 1 +
drivers/net/mlx5/hws/mlx5dr_definer.c | 38 ++++++++++++++++-----------
drivers/net/mlx5/linux/mlx5_os.c | 1 +
drivers/net/mlx5/mlx5.h | 1 +
drivers/net/mlx5/mlx5_flow.h | 3 ++-
drivers/net/mlx5/mlx5_flow_hw.c | 1 +
7 files changed, 40 insertions(+), 18 deletions(-)
diff --git a/drivers/net/mlx5/hws/mlx5dr_cmd.c b/drivers/net/mlx5/hws/mlx5dr_cmd.c
index 47e6a1fd49..668e409988 100644
--- a/drivers/net/mlx5/hws/mlx5dr_cmd.c
+++ b/drivers/net/mlx5/hws/mlx5dr_cmd.c
@@ -1363,10 +1363,19 @@ int mlx5dr_cmd_query_caps(struct ibv_context *ctx,
strlcpy(caps->fw_ver, attr_ex.orig_attr.fw_ver, sizeof(caps->fw_ver));
port_info = flow_hw_get_wire_port(ctx);
- if (port_info)
+ if (port_info) {
caps->wire_regc_mask = port_info->regc_mask;
- else
+ /*
+ * If REG_C_0 vport mask is available, then we can assume that vport metadata is
+ * enabled on the switchdev.
+ */
+ caps->vport_metadata_match = !!port_info->regc_mask;
+ DR_LOG(DEBUG, "ibdev %s vport metadata match is %sabled",
+ ctx->device->name,
+ caps->vport_metadata_match ? "en" : "dis");
+ } else {
DR_LOG(INFO, "Failed to query wire port regc value");
+ }
return ret;
}
diff --git a/drivers/net/mlx5/hws/mlx5dr_cmd.h b/drivers/net/mlx5/hws/mlx5dr_cmd.h
index eb9643c555..3ed7c6ecb7 100644
--- a/drivers/net/mlx5/hws/mlx5dr_cmd.h
+++ b/drivers/net/mlx5/hws/mlx5dr_cmd.h
@@ -207,6 +207,7 @@ struct mlx5dr_cmd_generate_wqe_attr {
struct mlx5dr_cmd_query_caps {
uint32_t wire_regc_mask;
+ bool vport_metadata_match;
uint32_t flex_protocols;
uint8_t wqe_based_update;
uint8_t rtc_reparse_mode;
diff --git a/drivers/net/mlx5/hws/mlx5dr_definer.c b/drivers/net/mlx5/hws/mlx5dr_definer.c
index 6a016de78e..3ba69c1001 100644
--- a/drivers/net/mlx5/hws/mlx5dr_definer.c
+++ b/drivers/net/mlx5/hws/mlx5dr_definer.c
@@ -777,12 +777,13 @@ mlx5dr_definer_vport_set(struct mlx5dr_definer_fc *fc,
const struct flow_hw_port_info *port_info = NULL;
uint32_t regc_value;
- if (v)
+ if (v) {
port_info = flow_hw_conv_port_id(fc->dr_ctx, v->port_id);
- if (unlikely(!port_info))
- regc_value = BAD_PORT;
- else
+ assert(port_info != NULL);
regc_value = port_info->regc_value >> fc->bit_off;
+ } else {
+ regc_value = BAD_PORT;
+ }
/* Bit offset is set to 0 to since regc value is 32bit */
DR_SET(tag, regc_value, fc->byte_off, fc->bit_off, fc->bit_mask);
@@ -1593,20 +1594,27 @@ mlx5dr_definer_conv_item_port(struct mlx5dr_definer_conv_data *cd,
struct mlx5dr_definer_fc *fc;
if (port_id) {
- if (!caps->wire_regc_mask) {
- DR_LOG(ERR, "Port ID item not supported, missing wire REGC mask");
+ if (caps->vport_metadata_match) {
+ if (!caps->wire_regc_mask) {
+ DR_LOG(ERR, "Port ID item not supported, missing wire REGC mask");
+ rte_errno = ENOTSUP;
+ return rte_errno;
+ }
+
+ fc = &cd->fc[MLX5DR_DEFINER_FNAME_VPORT_REG_C_0];
+ fc->item_idx = item_idx;
+ fc->tag_set = &mlx5dr_definer_vport_set;
+ fc->tag_mask_set = &mlx5dr_definer_ones_set;
+ DR_CALC_SET_HDR(fc, registers, register_c_0);
+ fc->bit_off = rte_ctz32(caps->wire_regc_mask);
+ fc->bit_mask = caps->wire_regc_mask >> fc->bit_off;
+ fc->dr_ctx = cd->ctx;
+ } else {
+ /* TODO */
+ DR_LOG(ERR, "Port ID item with legacy vport match is not implemented");
rte_errno = ENOTSUP;
return rte_errno;
}
-
- fc = &cd->fc[MLX5DR_DEFINER_FNAME_VPORT_REG_C_0];
- fc->item_idx = item_idx;
- fc->tag_set = &mlx5dr_definer_vport_set;
- fc->tag_mask_set = &mlx5dr_definer_ones_set;
- DR_CALC_SET_HDR(fc, registers, register_c_0);
- fc->bit_off = rte_ctz32(caps->wire_regc_mask);
- fc->bit_mask = caps->wire_regc_mask >> fc->bit_off;
- fc->dr_ctx = cd->ctx;
}
return 0;
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 3a9b019601..75a2936023 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -387,6 +387,7 @@ mlx5_os_capabilities_prepare(struct mlx5_dev_ctx_shared *sh)
sh->dev_cap.esw_info.regc_value = 0;
sh->dev_cap.esw_info.regc_mask = 0;
#endif
+ sh->dev_cap.esw_info.is_set = 1;
return 0;
}
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 7ae9129e46..8cd6562633 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -153,6 +153,7 @@ struct flow_hw_port_info {
uint32_t regc_mask;
uint32_t regc_value;
uint32_t vhca_id;
+ uint32_t is_set:1;
uint32_t is_wire:1;
uint32_t direction:2;
};
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 4c56e638ab..c9e72a33d6 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -2156,8 +2156,9 @@ flow_hw_conv_port_id(void *ctx, const uint16_t port_id)
if (port_id >= RTE_MAX_ETHPORTS)
return NULL;
+
port_info = &mlx5_flow_hw_port_infos[port_id];
- return !!port_info->regc_mask ? port_info : NULL;
+ return port_info->is_set ? port_info : NULL;
}
#ifdef HAVE_IBV_FLOW_DV_SUPPORT
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index adbd4f33b0..4871594c35 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -12307,6 +12307,7 @@ mlx5_flow_hw_set_port_info(struct rte_eth_dev *dev)
info->regc_value = priv->vport_meta_tag;
info->vhca_id = priv->vport_vhca_id;
info->is_wire = mlx5_is_port_on_mpesw_device(priv) ? priv->mpesw_uplink : priv->master;
+ info->is_set = 1;
}
/* Clears vport tag and mask used for HWS rules. */
--
2.47.3
^ permalink raw reply related
* [PATCH 26.07 4/5] net/mlx5/hws: add source vport match in HWS
From: Dariusz Sosnowski @ 2026-03-25 9:07 UTC (permalink / raw)
To: Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
Matan Azrad; +Cc: dev
In-Reply-To: <20260325090758.42403-1-dsosnowski@nvidia.com>
Matching source vport without having vport metadata available
requires matching on 2 fields:
- source_gvmi - equal to VHCA ID property of the port,
- functional_lb
There are following cases:
- If packet comes from PF/VF/SF, then it originated on some Tx queue on
the host. In this case, source_gvmi will be populated with
the ID of the originating function.
- If packet comes from the wire, then source_gvmi will be set to 0.
There is one edge case - discriminating packets coming from PF0 and
from wire. In this case, both packets will have source_gvmi set to 0.
Distinguishing them requires additional match on functional_lb.
If packet comes from PF, functional_lb will be set to 1.
This only happens when packet was sent from PF to FDB, and
then moved to PF again.
Because of all of the above:
- Unified FDB must be disabled when vport metadata is disabled,
because packet from PF0 and from wire will not have
correct functional_lb set yet when flow rules in FDB are processed.
- Without unified FDB, when separate FDB_RX and FDB_TX tables are used
internally, match on functional_lb is not needed.
Table type already defines the direction.
- NIC_TX tables belong to a single GVMI and as a result
vport matching is not needed there.
As a result, functional_lb match is only required on NIC_RX.
This patch adds support for source_gvmi and functional_lb matching
in HWS layer.
Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
drivers/net/mlx5/hws/mlx5dr_definer.c | 65 +++++++++++++++++++++++++--
drivers/net/mlx5/hws/mlx5dr_definer.h | 2 +
drivers/net/mlx5/hws/mlx5dr_table.c | 6 +++
3 files changed, 69 insertions(+), 4 deletions(-)
diff --git a/drivers/net/mlx5/hws/mlx5dr_definer.c b/drivers/net/mlx5/hws/mlx5dr_definer.c
index 3ba69c1001..7400b8f252 100644
--- a/drivers/net/mlx5/hws/mlx5dr_definer.c
+++ b/drivers/net/mlx5/hws/mlx5dr_definer.c
@@ -6,6 +6,8 @@
#include "mlx5dr_internal.h"
+#define WIRE_GVMI 0
+#define BAD_GVMI 0xFFFF
#define GTP_PDU_SC 0x85
#define BAD_PORT 0xBAD
#define BAD_SQN 0xBAD
@@ -789,6 +791,49 @@ mlx5dr_definer_vport_set(struct mlx5dr_definer_fc *fc,
DR_SET(tag, regc_value, fc->byte_off, fc->bit_off, fc->bit_mask);
}
+static void
+mlx5dr_definer_source_gvmi_set(struct mlx5dr_definer_fc *fc,
+ const void *item_spec,
+ uint8_t *tag)
+{
+ const struct rte_flow_item_ethdev *v = item_spec;
+ const struct flow_hw_port_info *port_info;
+ uint32_t source_gvmi;
+
+ if (v) {
+ port_info = flow_hw_conv_port_id(fc->dr_ctx, v->port_id);
+ assert(port_info != NULL);
+ if (port_info->is_wire)
+ source_gvmi = WIRE_GVMI;
+ else
+ source_gvmi = port_info->vhca_id;
+ } else {
+ source_gvmi = BAD_GVMI;
+ }
+
+ DR_SET(tag, source_gvmi, fc->byte_off, fc->bit_off, fc->bit_mask);
+}
+
+static void
+mlx5dr_definer_functional_lb_set(struct mlx5dr_definer_fc *fc,
+ const void *item_spec,
+ uint8_t *tag)
+{
+ const struct rte_flow_item_ethdev *v = item_spec;
+ const struct flow_hw_port_info *port_info;
+ uint32_t functional_lb;
+
+ if (v) {
+ port_info = flow_hw_conv_port_id(fc->dr_ctx, v->port_id);
+ assert(port_info != NULL);
+ functional_lb = !port_info->is_wire;
+ } else {
+ functional_lb = 0;
+ }
+
+ DR_SET(tag, functional_lb, fc->byte_off, fc->bit_off, fc->bit_mask);
+}
+
static struct mlx5dr_definer_fc *
mlx5dr_definer_get_mpls_fc(struct mlx5dr_definer_conv_data *cd, bool inner)
{
@@ -1610,10 +1655,22 @@ mlx5dr_definer_conv_item_port(struct mlx5dr_definer_conv_data *cd,
fc->bit_mask = caps->wire_regc_mask >> fc->bit_off;
fc->dr_ctx = cd->ctx;
} else {
- /* TODO */
- DR_LOG(ERR, "Port ID item with legacy vport match is not implemented");
- rte_errno = ENOTSUP;
- return rte_errno;
+ fc = &cd->fc[MLX5DR_DEFINER_FNAME_SOURCE_GVMI];
+ fc->item_idx = item_idx;
+ fc->tag_set = &mlx5dr_definer_source_gvmi_set;
+ fc->tag_mask_set = &mlx5dr_definer_ones_set;
+ DR_CALC_SET_HDR(fc, source_qp_gvmi, source_gvmi);
+ fc->dr_ctx = cd->ctx;
+
+ if (cd->table_type != MLX5DR_TABLE_TYPE_NIC_RX)
+ return 0;
+
+ fc = &cd->fc[MLX5DR_DEFINER_FNAME_FUNCTIONAL_LB];
+ fc->item_idx = item_idx;
+ fc->tag_set = &mlx5dr_definer_functional_lb_set;
+ fc->tag_mask_set = &mlx5dr_definer_ones_set;
+ DR_CALC_SET_HDR(fc, source_qp_gvmi, functional_lb);
+ fc->dr_ctx = cd->ctx;
}
}
diff --git a/drivers/net/mlx5/hws/mlx5dr_definer.h b/drivers/net/mlx5/hws/mlx5dr_definer.h
index d0c99399ae..f5d6cce887 100644
--- a/drivers/net/mlx5/hws/mlx5dr_definer.h
+++ b/drivers/net/mlx5/hws/mlx5dr_definer.h
@@ -214,6 +214,8 @@ enum mlx5dr_definer_fname {
MLX5DR_DEFINER_FNAME_PTYPE_FRAG_O,
MLX5DR_DEFINER_FNAME_PTYPE_FRAG_I,
MLX5DR_DEFINER_FNAME_RANDOM_NUM,
+ MLX5DR_DEFINER_FNAME_SOURCE_GVMI,
+ MLX5DR_DEFINER_FNAME_FUNCTIONAL_LB,
MLX5DR_DEFINER_FNAME_MAX,
};
diff --git a/drivers/net/mlx5/hws/mlx5dr_table.c b/drivers/net/mlx5/hws/mlx5dr_table.c
index 41ffaa19e3..14e983a363 100644
--- a/drivers/net/mlx5/hws/mlx5dr_table.c
+++ b/drivers/net/mlx5/hws/mlx5dr_table.c
@@ -468,6 +468,12 @@ struct mlx5dr_table *mlx5dr_table_create(struct mlx5dr_context *ctx,
return NULL;
}
+ if (attr->type == MLX5DR_TABLE_TYPE_FDB_UNIFIED && !ctx->caps->vport_metadata_match) {
+ DR_LOG(ERR, "Table type %d requires vport metadata to be enabled", attr->type);
+ rte_errno = ENOTSUP;
+ return NULL;
+ }
+
if ((mlx5dr_table_is_fdb_any(attr->type) && attr->type != MLX5DR_TABLE_TYPE_FDB) &&
!attr->level) {
DR_LOG(ERR, "Table type %d not supported by root table", attr->type);
--
2.47.3
^ permalink raw reply related
* [PATCH 26.07 5/5] net/mlx5: allow legacy source vport match
From: Dariusz Sosnowski @ 2026-03-25 9:07 UTC (permalink / raw)
To: Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
Matan Azrad; +Cc: dev
In-Reply-To: <20260325090758.42403-1-dsosnowski@nvidia.com>
Allow running mlx5 PMD on top of a device with switchdev enabled,
where vport metadata is disabled (esw_port_metadata devlink parameter
is set to false). This requires:
- Preceding patches introducing source vport match capabilities
in HWS layer.
- Removing the check for vport metadata during port probing
(it previously was one of the requirements).
- Modify Tx representor matching flow rules logic when vport metadata
is not available - instead of metadata, match on vport ID.
- vport ID is enough, because any shared FDB use case
required vport metadata to be enabled.
- Disable internal usage of unified FDB, when vport metadata
is not available.
- Force internal usage of source_vport match on root flow rules,
when vport metadata is not available.
Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
drivers/net/mlx5/linux/mlx5_os.c | 34 +++++++-------------------------
drivers/net/mlx5/mlx5.h | 1 +
drivers/net/mlx5/mlx5_flow_dv.c | 3 +++
drivers/net/mlx5/mlx5_flow_hw.c | 28 +++++++++-----------------
4 files changed, 20 insertions(+), 46 deletions(-)
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 75a2936023..a9dd0be055 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1866,7 +1866,9 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
* 3. with unsupported FW
* 4. all representors in HWS
*/
- priv->unified_fdb_en = !!priv->master && sh->cdev->config.hca_attr.fdb_unified_en;
+ priv->unified_fdb_en = sh->cdev->config.hca_attr.fdb_unified_en &&
+ priv->master &&
+ priv->vport_meta_mask != 0;
/* Jump FDB Rx works only with unified FDB enabled. */
if (priv->unified_fdb_en)
priv->jump_fdb_rx_en = sh->cdev->config.hca_attr.jump_fdb_rx_en;
@@ -1874,32 +1876,10 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
eth_dev->data->port_id,
priv->unified_fdb_en ? "is" : "isn't",
priv->jump_fdb_rx_en ? "is" : "isn't");
- if (priv->sh->config.dv_esw_en) {
- uint32_t usable_bits;
- uint32_t required_bits;
-
- if (priv->sh->dv_regc0_mask == UINT32_MAX) {
- DRV_LOG(ERR, "E-Switch port metadata is required when using HWS "
- "but it is disabled (configure it through devlink)");
- err = ENOTSUP;
- goto error;
- }
- if (priv->sh->dv_regc0_mask == 0) {
- DRV_LOG(ERR, "E-Switch with HWS is not supported "
- "(no available bits in reg_c[0])");
- err = ENOTSUP;
- goto error;
- }
- usable_bits = rte_popcount32(priv->sh->dv_regc0_mask);
- required_bits = rte_popcount32(priv->vport_meta_mask);
- if (usable_bits < required_bits) {
- DRV_LOG(ERR, "Not enough bits available in reg_c[0] to provide "
- "representor matching.");
- err = ENOTSUP;
- goto error;
- }
- }
- if (priv->vport_meta_mask)
+ /* Without vport metadata, PMD must rely on source_vport match. */
+ if (priv->sh->config.dv_esw_en && priv->vport_meta_mask == 0)
+ priv->vport_match = 1;
+ if (priv->sh->config.dv_esw_en)
mlx5_flow_hw_set_port_info(eth_dev);
if (priv->sh->config.dv_esw_en &&
priv->sh->config.dv_xmeta_en != MLX5_XMETA_MODE_LEGACY &&
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 8cd6562633..7e8ef1d467 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -2003,6 +2003,7 @@ struct mlx5_priv {
uint32_t tunnel_enabled:1; /* If tunnel offloading is enabled on rxqs. */
uint32_t unified_fdb_en:1; /* Unified FDB flag per port. */
uint32_t jump_fdb_rx_en:1; /* Jump from FDB Tx to FDB Rx flag per port. */
+ uint32_t vport_match:1; /* True if source_vport match is used instead of metadata */
uint16_t domain_id; /* Switch domain identifier. */
uint16_t vport_id; /* Associated VF vport index (if any). */
uint16_t vport_vhca_id; /* VHCA ID of the associated vport (if any). */
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 32e75b063f..c2a2874913 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -11050,6 +11050,9 @@ flow_dv_translate_item_represented_port(struct rte_eth_dev *dev, void *key,
#ifndef HAVE_IBV_DEVICE_ATTR_ESW_MGR_REG_C0
if (priv->sh->config.dv_flow_en == 2)
vport_match = true;
+#else
+ if (priv->sh->config.dv_flow_en == 2)
+ vport_match = !!priv->vport_match;
#endif
if (!pid_m && !pid_v)
return 0;
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 4871594c35..b6bb9f12a6 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -9939,33 +9939,23 @@ static __rte_always_inline uint32_t
flow_hw_tx_tag_regc_mask(struct rte_eth_dev *dev)
{
struct mlx5_priv *priv = dev->data->dev_private;
- uint32_t mask = priv->sh->dv_regc0_mask;
- /* Mask is verified during device initialization. Sanity checking here. */
- MLX5_ASSERT(mask != 0);
- /*
- * Availability of sufficient number of bits in REG_C_0 is verified on initialization.
- * Sanity checking here.
- */
- MLX5_ASSERT(rte_popcount32(mask) >= rte_popcount32(priv->vport_meta_mask));
- return mask;
+ if (priv->vport_meta_mask != 0)
+ return priv->sh->dv_regc0_mask;
+ else
+ return UINT32_MAX;
}
static __rte_always_inline uint32_t
flow_hw_tx_tag_regc_value(struct rte_eth_dev *dev)
{
struct mlx5_priv *priv = dev->data->dev_private;
- uint32_t tag;
- /* Mask is verified during device initialization. Sanity checking here. */
- MLX5_ASSERT(priv->vport_meta_mask != 0);
- tag = priv->vport_meta_tag >> (rte_bsf32(priv->vport_meta_mask));
- /*
- * Availability of sufficient number of bits in REG_C_0 is verified on initialization.
- * Sanity checking here.
- */
- MLX5_ASSERT((tag & priv->sh->dv_regc0_mask) == tag);
- return tag;
+ if (priv->vport_meta_mask != 0)
+ return priv->vport_meta_tag >> (rte_bsf32(priv->vport_meta_mask));
+
+ /* Without REG_C match value available, resort to matching vport ID. */
+ return priv->vport_id | (priv->sh->cdev->config.hca_attr.vhca_id << 16);
}
static void
--
2.47.3
^ permalink raw reply related
* Re: [PATCH v20 25/25] app/pdump: preserve VLAN tags in captured packets
From: Bruce Richardson @ 2026-03-25 9:12 UTC (permalink / raw)
To: Morten Brørup; +Cc: Stephen Hemminger, dev, Reshma Pattan
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35F657C2@smartserver.smartshare.dk>
On Wed, Mar 25, 2026 at 08:41:39AM +0100, Morten Brørup wrote:
> > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > Sent: Tuesday, 24 March 2026 18.12
> >
> > On Mon, 16 Mar 2026 16:55:29 +0100
> > Morten Brørup <mb@smartsharesystems.com> wrote:
> >
> > > >
> > > > This is an example of something I previously flagged. Like with
> > real
> > > > hardware, I think the PMD should be inserting the VLAN tag into the
> > > > packet
> > > > as part of the Tx function, not the prepare function.
> > >
> > > Agree with Bruce on this.
> > > For simple stuff like VLAN offload, applications should not be
> > required to call tx_prep first.
> > >
> > > However, the Tx function is supposed to not modify the packets;
> > relevant when refcnt > 1.
> > >
> > > Instead of modifying the packet data to insert/strip the VLAN tag,
> > > perhaps the driver can split the write/read operation into multiple
> > write/read operations:
> > > 1. the Ethernet header
> > > 2. the VLAN tag
> > > 3. the remaining packet data
> > >
> > > I haven't really followed the pcap driver, so maybe my suggestion
> > doesn't make sense.
> >
> > The prepare code and VLAN was copied from virtio.
> > I assume virtio is widely used already.
>
> OK, that makes it harder to object to.
>
Yes, but I also believe that the topic was not previously discussed and
that the virtio driver may be wrong in how it behaves.
I still think, for consistency with HW drivers, SW drivers should do the
tagging in the Tx function.
I also think that we should provide a DPDK-lib level helper function (be it in
ethdev or elsewhere) for doing this sort of thing for all drivers. That way
we can put in the necessary copying of packets with refcnt > 1 and have it
apply globally.
/Bruce
^ permalink raw reply
* RE: [PATCH v20 25/25] app/pdump: preserve VLAN tags in captured packets
From: Morten Brørup @ 2026-03-25 9:36 UTC (permalink / raw)
To: Bruce Richardson; +Cc: Stephen Hemminger, dev, Reshma Pattan
In-Reply-To: <acOnBrH5VGJNz5CC@bricha3-mobl1.ger.corp.intel.com>
> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> Sent: Wednesday, 25 March 2026 10.13
>
> On Wed, Mar 25, 2026 at 08:41:39AM +0100, Morten Brørup wrote:
> > > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > > Sent: Tuesday, 24 March 2026 18.12
> > >
> > > On Mon, 16 Mar 2026 16:55:29 +0100
> > > Morten Brørup <mb@smartsharesystems.com> wrote:
> > >
> > > > >
> > > > > This is an example of something I previously flagged. Like with
> > > real
> > > > > hardware, I think the PMD should be inserting the VLAN tag into
> the
> > > > > packet
> > > > > as part of the Tx function, not the prepare function.
> > > >
> > > > Agree with Bruce on this.
> > > > For simple stuff like VLAN offload, applications should not be
> > > required to call tx_prep first.
> > > >
> > > > However, the Tx function is supposed to not modify the packets;
> > > relevant when refcnt > 1.
> > > >
> > > > Instead of modifying the packet data to insert/strip the VLAN
> tag,
> > > > perhaps the driver can split the write/read operation into
> multiple
> > > write/read operations:
> > > > 1. the Ethernet header
> > > > 2. the VLAN tag
> > > > 3. the remaining packet data
> > > >
> > > > I haven't really followed the pcap driver, so maybe my suggestion
> > > doesn't make sense.
> > >
> > > The prepare code and VLAN was copied from virtio.
> > > I assume virtio is widely used already.
> >
> > OK, that makes it harder to object to.
> >
>
> Yes, but I also believe that the topic was not previously discussed and
> that the virtio driver may be wrong in how it behaves.
>
> I still think, for consistency with HW drivers, SW drivers should do
> the
> tagging in the Tx function.
+1
>
> I also think that we should provide a DPDK-lib level helper function
> (be it in
> ethdev or elsewhere) for doing this sort of thing for all drivers. That
> way
> we can put in the necessary copying of packets with refcnt > 1 and have
> it
> apply globally.
If an application clones packets instead of copying them, it is probably for performance reasons.
If the drivers start copying those clones, it may defeat the performance purpose.
<brainstorming>
Maybe segmentation can be used instead of copying the full packet:
Make the "copy" packet of two (or more) segments, where the header is copied into a new mbuf (where the VLAN tag is added), and the remaining part of the packet uses an indirect mbuf referring to the "original" packet at the offset after the header.
</brainstorming>
Furthermore...
If drivers start copying packets in the Tx function, the Tx queue should have its own mbuf pool to allocate these mbufs from.
Drivers should not steal mbufs from the pools used by the packets being transmitted.
E.g. if a segmented packet has a small mbuf for the first few bytes, followed by a large mbuf (from another pool) for the remaining bytes.
Or if the "original" mbuf comes from a mempool allocated on different CPU socket, the "copy" would too.
>
> /Bruce
^ permalink raw reply
* Re: [PATCH v20 25/25] app/pdump: preserve VLAN tags in captured packets
From: Bruce Richardson @ 2026-03-25 9:42 UTC (permalink / raw)
To: Morten Brørup; +Cc: Stephen Hemminger, dev, Reshma Pattan
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35F657C3@smartserver.smartshare.dk>
On Wed, Mar 25, 2026 at 10:36:56AM +0100, Morten Brørup wrote:
> > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > Sent: Wednesday, 25 March 2026 10.13
> >
> > On Wed, Mar 25, 2026 at 08:41:39AM +0100, Morten Brørup wrote:
> > > > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > > > Sent: Tuesday, 24 March 2026 18.12
> > > >
> > > > On Mon, 16 Mar 2026 16:55:29 +0100
> > > > Morten Brørup <mb@smartsharesystems.com> wrote:
> > > >
> > > > > >
> > > > > > This is an example of something I previously flagged. Like with
> > > > real
> > > > > > hardware, I think the PMD should be inserting the VLAN tag into
> > the
> > > > > > packet
> > > > > > as part of the Tx function, not the prepare function.
> > > > >
> > > > > Agree with Bruce on this.
> > > > > For simple stuff like VLAN offload, applications should not be
> > > > required to call tx_prep first.
> > > > >
> > > > > However, the Tx function is supposed to not modify the packets;
> > > > relevant when refcnt > 1.
> > > > >
> > > > > Instead of modifying the packet data to insert/strip the VLAN
> > tag,
> > > > > perhaps the driver can split the write/read operation into
> > multiple
> > > > write/read operations:
> > > > > 1. the Ethernet header
> > > > > 2. the VLAN tag
> > > > > 3. the remaining packet data
> > > > >
> > > > > I haven't really followed the pcap driver, so maybe my suggestion
> > > > doesn't make sense.
> > > >
> > > > The prepare code and VLAN was copied from virtio.
> > > > I assume virtio is widely used already.
> > >
> > > OK, that makes it harder to object to.
> > >
> >
> > Yes, but I also believe that the topic was not previously discussed and
> > that the virtio driver may be wrong in how it behaves.
> >
> > I still think, for consistency with HW drivers, SW drivers should do
> > the
> > tagging in the Tx function.
>
> +1
>
> >
> > I also think that we should provide a DPDK-lib level helper function
> > (be it in
> > ethdev or elsewhere) for doing this sort of thing for all drivers. That
> > way
> > we can put in the necessary copying of packets with refcnt > 1 and have
> > it
> > apply globally.
>
> If an application clones packets instead of copying them, it is probably for performance reasons.
> If the drivers start copying those clones, it may defeat the performance purpose.
>
> <brainstorming>
> Maybe segmentation can be used instead of copying the full packet:
> Make the "copy" packet of two (or more) segments, where the header is copied into a new mbuf (where the VLAN tag is added), and the remaining part of the packet uses an indirect mbuf referring to the "original" packet at the offset after the header.
> </brainstorming>
>
> Furthermore...
> If drivers start copying packets in the Tx function, the Tx queue should have its own mbuf pool to allocate these mbufs from.
> Drivers should not steal mbufs from the pools used by the packets being transmitted.
> E.g. if a segmented packet has a small mbuf for the first few bytes, followed by a large mbuf (from another pool) for the remaining bytes.
> Or if the "original" mbuf comes from a mempool allocated on different CPU socket, the "copy" would too.
>
Yes, agree on using a chain of buffers for Tx in this case. Not bothered
much either way about the separate mempool.
/Bruce
^ permalink raw reply
* [wishes/any Bug 1912] free mbufs should not have next and nb_segs initialized
From: bugzilla @ 2026-03-25 9:44 UTC (permalink / raw)
To: dev
http://bugs.dpdk.org/show_bug.cgi?id=1912
Bug ID: 1912
Summary: free mbufs should not have next and nb_segs
initialized
Product: wishes
Version: unspecified
Hardware: All
OS: All
Status: UNCONFIRMED
Severity: normal
Priority: Normal
Component: any
Assignee: dev@dpdk.org
Reporter: mb@smartsharesystems.com
Target Milestone: ---
Group: wishes-managers
The invariant about which mbuf fields are initialized for free mbufs (i.e.
mbufs held in a mempool) should only cover the refcnt and the constant fields:
buf_addr, buf_iova, buf_len, pool, priv_size.
The "next" and "nb_segs" fields are the only remaining non-constant fields (in
addition to the refcnt) still covered, and they should not be.
When an mbuf has been allocated and the driver fills the mbuf's fields during
Rx, the driver might as well also set the "next" and "nb_segs" fields (if it
doesn't already).
The performance cost of setting these two fields along with the other fields is
near zero.
Not having to initialize any mbuf fields when freeing an mbuf would reduce the
performance cost of freeing mbufs.
And in the FAST_FREE case (where the refcnt is known to be 1), the driver can
free the mbufs without touching them at all.
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply
* [PATCH] net/mlx5: fix NAT64 HW registers calculation
From: Bing Zhao @ 2026-03-25 9:58 UTC (permalink / raw)
To: viacheslavo, dev, rasland
Cc: orika, dsosnowski, suanmingm, matan, thomas, stable
mlx5 PMD needs to select a set of 3 HW registers
which will be used to implement NAT64 flow action.
For compatibility reasons one of these registers has to be REG_C_6.
Offending patch introduced a bug to register selection logic.
If REG_C_6 was not available for use,
no registers were selected for NAT64.
So all the registers' information would not be initialized for the
temporary storage of headers information of NAT64.
This patch adds missing logic to use the last 3 available tag registers
in this case, allowing NAT64 flow action to be used.
Fixes: f15535128617 ("net/mlx5: fix NAT64 register selection")
Cc: dsosnowski@nvidia.com
Cc: stable@dpdk.org
Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
drivers/net/mlx5/mlx5.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index d9bc5ee197..70f52df78a 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1679,6 +1679,12 @@ mlx5_init_hws_flow_tags_registers(struct mlx5_dev_ctx_shared *sh)
reg->nat64_regs[0] = REG_C_6;
reg->nat64_regs[1] = reg->hw_avl_tags[j - 2];
reg->nat64_regs[2] = reg->hw_avl_tags[j - 1];
+ } else {
+ if (j >= MLX5_FLOW_NAT64_REGS_MAX) {
+ reg->nat64_regs[0] = reg->hw_avl_tags[j - 3];
+ reg->nat64_regs[1] = reg->hw_avl_tags[j - 2];
+ reg->nat64_regs[2] = reg->hw_avl_tags[j - 1];
+ }
}
}
--
2.34.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox