* [PATCH 1/7] net/iavf: increase max ring descriptors to hardware limit
2026-06-08 16:40 [PATCH 0/7] intel network and pcapng updates Dawid Wesierski
@ 2026-06-08 16:40 ` Dawid Wesierski
2026-06-08 16:40 ` [PATCH 2/7] net/iavf: allow runtime queue rate limit configuration Dawid Wesierski
` (8 subsequent siblings)
9 siblings, 0 replies; 27+ messages in thread
From: Dawid Wesierski @ 2026-06-08 16:40 UTC (permalink / raw)
To: dev
Cc: thomas, david.marchand, vladimir.medvedkin, bruce.richardson,
anatoly.burakov, reshma.pattan, stephen, Marek Kasiewicz,
Dawid Wesierski
From: Marek Kasiewicz <marek.kasiewicz@intel.com>
The Intel E810 hardware supports up to 8160 (8K - 32) descriptors per
TX/RX ring, but IAVF_MAX_RING_DESC caps it at 4096. Applications that
need deep descriptor rings for hardware rate-limited pacing (e.g.,
ST2110 video with thousands of packets per frame) cannot queue enough
packets before the pacing epoch begins.
Increase IAVF_MAX_RING_DESC to the hardware maximum of 8160 to allow
full utilization of the ring depth on E810 VFs.
Signed-off-by: Marek Kasiewicz <marek.kasiewicz@intel.com>
Signed-off-by: Dawid Wesierski <dawid.wesierski@intel.com>
---
drivers/net/intel/iavf/iavf_rxtx.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/intel/iavf/iavf_rxtx.h b/drivers/net/intel/iavf/iavf_rxtx.h
index 8449236d4d..22ea415f44 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.h
+++ b/drivers/net/intel/iavf/iavf_rxtx.h
@@ -16,7 +16,7 @@
/* In QLEN must be whole number of 32 descriptors. */
#define IAVF_ALIGN_RING_DESC 32
#define IAVF_MIN_RING_DESC 64
-#define IAVF_MAX_RING_DESC 4096
+#define IAVF_MAX_RING_DESC (8192 - 32)
#define IAVF_DMA_MEM_ALIGN 4096
/* Base address of the HW descriptor ring should be 128B aligned. */
#define IAVF_RING_BASE_ALIGN 128
--
2.47.3
---------------------------------------------------------------------
Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.
Spolka oswiadcza, ze posiada status duzego przedsiebiorcy w rozumieniu ustawy z dnia 8 marca 2013 r. o przeciwdzialaniu nadmiernym opoznieniom w transakcjach handlowych.
Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by others is strictly prohibited.
^ permalink raw reply related [flat|nested] 27+ messages in thread* [PATCH 2/7] net/iavf: allow runtime queue rate limit configuration
2026-06-08 16:40 [PATCH 0/7] intel network and pcapng updates Dawid Wesierski
2026-06-08 16:40 ` [PATCH 1/7] net/iavf: increase max ring descriptors to hardware limit Dawid Wesierski
@ 2026-06-08 16:40 ` Dawid Wesierski
2026-06-08 16:40 ` [PATCH 3/7] net/ice/base: reduce default scheduler burst size Dawid Wesierski
` (7 subsequent siblings)
9 siblings, 0 replies; 27+ messages in thread
From: Dawid Wesierski @ 2026-06-08 16:40 UTC (permalink / raw)
To: dev
Cc: thomas, david.marchand, vladimir.medvedkin, bruce.richardson,
anatoly.burakov, reshma.pattan, stephen, Marek Kasiewicz,
Dawid Wesierski
From: Marek Kasiewicz <marek.kasiewicz@intel.com>
Allow per-queue bandwidth rate limiting to be configured without
stopping the port when only a single TC node and single QoS element
are involved. This enables dynamic session management where individual
queue pacing rates can be changed while other queues continue
transmitting.
Also fix the queue ID assignment in the bandwidth configuration to
use the actual TM node ID rather than a sequential counter index, and
only mark the TM hierarchy as committed when the port is stopped to
permit subsequent reconfiguration.
Signed-off-by: Marek Kasiewicz <marek.kasiewicz@intel.com>
Signed-off-by: Dawid Wesierski <dawid.wesierski@intel.com>
---
drivers/net/intel/iavf/iavf_tm.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/drivers/net/intel/iavf/iavf_tm.c b/drivers/net/intel/iavf/iavf_tm.c
index 1cf7bfb106..43d7a44337 100644
--- a/drivers/net/intel/iavf/iavf_tm.c
+++ b/drivers/net/intel/iavf/iavf_tm.c
@@ -804,8 +804,10 @@ static int iavf_hierarchy_commit(struct rte_eth_dev *dev,
int index = 0, node_committed = 0;
int i, ret_val = IAVF_SUCCESS;
- /* check if port is stopped */
- if (adapter->stopped != 1) {
+ /* check if port is stopped, except for setting queue bandwidth */
+ if (vf->tm_conf.nb_tc_node != 1 &&
+ vf->qos_cap->num_elem != 1 &&
+ adapter->stopped != 1) {
PMD_DRV_LOG(ERR, "Please stop port first");
ret_val = IAVF_ERR_NOT_READY;
goto err;
@@ -856,7 +858,7 @@ static int iavf_hierarchy_commit(struct rte_eth_dev *dev,
q_tc_mapping->tc[tm_node->tc].req.queue_count++;
if (tm_node->shaper_profile) {
- q_bw->cfg[node_committed].queue_id = node_committed;
+ q_bw->cfg[node_committed].queue_id = tm_node->id;
q_bw->cfg[node_committed].shaper.peak =
tm_node->shaper_profile->profile.peak.rate /
1000 * IAVF_BITS_PER_BYTE;
@@ -900,7 +902,8 @@ static int iavf_hierarchy_commit(struct rte_eth_dev *dev,
goto fail_clear;
vf->qtc_map = qtc_map;
- vf->tm_conf.committed = true;
+ if (adapter->stopped == 1)
+ vf->tm_conf.committed = true;
return ret_val;
fail_clear:
--
2.47.3
---------------------------------------------------------------------
Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.
Spolka oswiadcza, ze posiada status duzego przedsiebiorcy w rozumieniu ustawy z dnia 8 marca 2013 r. o przeciwdzialaniu nadmiernym opoznieniom w transakcjach handlowych.
Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by others is strictly prohibited.
^ permalink raw reply related [flat|nested] 27+ messages in thread* [PATCH 3/7] net/ice/base: reduce default scheduler burst size
2026-06-08 16:40 [PATCH 0/7] intel network and pcapng updates Dawid Wesierski
2026-06-08 16:40 ` [PATCH 1/7] net/iavf: increase max ring descriptors to hardware limit Dawid Wesierski
2026-06-08 16:40 ` [PATCH 2/7] net/iavf: allow runtime queue rate limit configuration Dawid Wesierski
@ 2026-06-08 16:40 ` Dawid Wesierski
2026-06-08 16:40 ` [PATCH 4/7] net/ice: timestamp all received packets when PTP is enabled Dawid Wesierski
` (6 subsequent siblings)
9 siblings, 0 replies; 27+ messages in thread
From: Dawid Wesierski @ 2026-06-08 16:40 UTC (permalink / raw)
To: dev
Cc: thomas, david.marchand, vladimir.medvedkin, bruce.richardson,
anatoly.burakov, reshma.pattan, stephen, Marek Kasiewicz,
Dawid Wesierski
From: Marek Kasiewicz <marek.kasiewicz@intel.com>
Reduce ICE_SCHED_DFLT_BURST_SIZE from 15 KB to 2 KB to improve
TX rate limiter granularity. The E810 TX scheduler uses a token
bucket algorithm where the burst size controls the maximum bytes
sent in a single burst before the rate limiter throttles.
A 15 KB burst allows micro-bursts of ~10 max-size frames, which
violates tight inter-packet spacing requirements in time-sensitive
networking applications such as SMPTE ST 2110-21 narrow-sender
compliance. Reducing to 2 KB forces near-constant-rate output
matching the configured shaper profile.
Signed-off-by: Marek Kasiewicz <marek.kasiewicz@intel.com>
Signed-off-by: Dawid Wesierski <dawid.wesierski@intel.com>
---
drivers/net/intel/ice/base/ice_type.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/intel/ice/base/ice_type.h b/drivers/net/intel/ice/base/ice_type.h
index 6d8c187689..39569ff3e3 100644
--- a/drivers/net/intel/ice/base/ice_type.h
+++ b/drivers/net/intel/ice/base/ice_type.h
@@ -1100,7 +1100,7 @@ enum ice_rl_type {
#define ICE_SCHED_NO_SHARED_RL_PROF_ID 0xFFFF
#define ICE_SCHED_DFLT_BW_WT 4
#define ICE_SCHED_INVAL_PROF_ID 0xFFFF
-#define ICE_SCHED_DFLT_BURST_SIZE (15 * 1024) /* in bytes (15k) */
+#define ICE_SCHED_DFLT_BURST_SIZE (2 * 1024) /* in bytes (2k) */
/* Access Macros for Tx Sched RL Profile data */
#define ICE_TXSCHED_GET_RL_PROF_ID(p) LE16_TO_CPU((p)->info.profile_id)
--
2.47.3
---------------------------------------------------------------------
Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.
Spolka oswiadcza, ze posiada status duzego przedsiebiorcy w rozumieniu ustawy z dnia 8 marca 2013 r. o przeciwdzialaniu nadmiernym opoznieniom w transakcjach handlowych.
Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by others is strictly prohibited.
^ permalink raw reply related [flat|nested] 27+ messages in thread* [PATCH 4/7] net/ice: timestamp all received packets when PTP is enabled
2026-06-08 16:40 [PATCH 0/7] intel network and pcapng updates Dawid Wesierski
` (2 preceding siblings ...)
2026-06-08 16:40 ` [PATCH 3/7] net/ice/base: reduce default scheduler burst size Dawid Wesierski
@ 2026-06-08 16:40 ` Dawid Wesierski
2026-06-08 16:40 ` [PATCH 5/7] net/iavf: disable runtime queue setup capability Dawid Wesierski
` (5 subsequent siblings)
9 siblings, 0 replies; 27+ messages in thread
From: Dawid Wesierski @ 2026-06-08 16:40 UTC (permalink / raw)
To: dev
Cc: thomas, david.marchand, vladimir.medvedkin, bruce.richardson,
anatoly.burakov, reshma.pattan, stephen, Marek Kasiewicz,
Dawid Wesierski
From: Marek Kasiewicz <marek.kasiewicz@intel.com>
When PTP is enabled on the ICE PMD, hardware RX timestamps are only
applied to packets classified as IEEE 1588 (Ethertype 0x88F7). This
prevents applications from obtaining hardware timestamps on regular
UDP/IP traffic.
Remove the TIMESYNC packet type filter so that all received packets
get hardware timestamps when PTP is enabled. This is required for
time-sensitive networking applications that need per-packet arrival
timing on media traffic, such as ST 2110-21 receiver compliance
monitoring.
The change affects all three RX paths: scan, scattered, and single
packet receive functions.
Signed-off-by: Marek Kasiewicz <marek.kasiewicz@intel.com>
Signed-off-by: Dawid Wesierski <dawid.wesierski@intel.com>
---
drivers/net/intel/ice/ice_rxtx.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index c4b5454c53..8d709125f7 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -2023,8 +2023,7 @@ ice_rx_scan_hw_ring(struct ci_rx_queue *rxq)
pkt_flags |= rxq->ts_flag;
}
- if (ad->ptp_ena && ((mb->packet_type &
- RTE_PTYPE_L2_MASK) == RTE_PTYPE_L2_ETHER_TIMESYNC)) {
+ if (ad->ptp_ena) {
rxq->time_high =
rte_le_to_cpu_32(rxdp[j].wb.flex_ts.ts_high);
mb->timesync = rxq->queue_id;
@@ -2390,8 +2389,7 @@ ice_recv_scattered_pkts(void *rx_queue,
pkt_flags |= rxq->ts_flag;
}
- if (ad->ptp_ena && ((first_seg->packet_type & RTE_PTYPE_L2_MASK)
- == RTE_PTYPE_L2_ETHER_TIMESYNC)) {
+ if (ad->ptp_ena) {
rxq->time_high =
rte_le_to_cpu_32(rxd.wb.flex_ts.ts_high);
first_seg->timesync = rxq->queue_id;
@@ -2881,8 +2879,7 @@ ice_recv_pkts(void *rx_queue,
pkt_flags |= rxq->ts_flag;
}
- if (ad->ptp_ena && ((rxm->packet_type & RTE_PTYPE_L2_MASK) ==
- RTE_PTYPE_L2_ETHER_TIMESYNC)) {
+ if (ad->ptp_ena) {
rxq->time_high =
rte_le_to_cpu_32(rxd.wb.flex_ts.ts_high);
rxm->timesync = rxq->queue_id;
--
2.47.3
---------------------------------------------------------------------
Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.
Spolka oswiadcza, ze posiada status duzego przedsiebiorcy w rozumieniu ustawy z dnia 8 marca 2013 r. o przeciwdzialaniu nadmiernym opoznieniom w transakcjach handlowych.
Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by others is strictly prohibited.
^ permalink raw reply related [flat|nested] 27+ messages in thread* [PATCH 5/7] net/iavf: disable runtime queue setup capability
2026-06-08 16:40 [PATCH 0/7] intel network and pcapng updates Dawid Wesierski
` (3 preceding siblings ...)
2026-06-08 16:40 ` [PATCH 4/7] net/ice: timestamp all received packets when PTP is enabled Dawid Wesierski
@ 2026-06-08 16:40 ` Dawid Wesierski
2026-06-08 16:40 ` [PATCH 6/7] pcapng: add user-supplied timestamp support Dawid Wesierski
` (4 subsequent siblings)
9 siblings, 0 replies; 27+ messages in thread
From: Dawid Wesierski @ 2026-06-08 16:40 UTC (permalink / raw)
To: dev
Cc: thomas, david.marchand, vladimir.medvedkin, bruce.richardson,
anatoly.burakov, reshma.pattan, stephen, Marek Kasiewicz,
Dawid Wesierski
From: Marek Kasiewicz <marek.kasiewicz@intel.com>
Remove the advertisement of RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP
and RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP capabilities from the
iavf VF driver.
Runtime queue setup on E810 VFs causes queue state corruption when
queues are dynamically reconfigured while the hardware rate limiter
is actively pacing TX queues. Queue configuration messages to the PF
via virtchnl can race with ongoing TX operations, leading to undefined
behavior.
By not advertising these capabilities, all queues are configured at
port start and remain stable throughout the port lifecycle.
Signed-off-by: Marek Kasiewicz <marek.kasiewicz@intel.com>
Signed-off-by: Dawid Wesierski <dawid.wesierski@intel.com>
---
drivers/net/intel/iavf/iavf_ethdev.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/drivers/net/intel/iavf/iavf_ethdev.c b/drivers/net/intel/iavf/iavf_ethdev.c
index a8031e23a5..4f6325ef78 100644
--- a/drivers/net/intel/iavf/iavf_ethdev.c
+++ b/drivers/net/intel/iavf/iavf_ethdev.c
@@ -1159,9 +1159,6 @@ iavf_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
dev_info->reta_size = vf->vf_res->rss_lut_size;
dev_info->flow_type_rss_offloads = IAVF_RSS_OFFLOAD_ALL;
dev_info->max_mac_addrs = IAVF_NUM_MACADDR_MAX;
- dev_info->dev_capa =
- RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
- RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
dev_info->rx_offload_capa =
RTE_ETH_RX_OFFLOAD_VLAN_STRIP |
RTE_ETH_RX_OFFLOAD_QINQ_STRIP |
--
2.47.3
---------------------------------------------------------------------
Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.
Spolka oswiadcza, ze posiada status duzego przedsiebiorcy w rozumieniu ustawy z dnia 8 marca 2013 r. o przeciwdzialaniu nadmiernym opoznieniom w transakcjach handlowych.
Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by others is strictly prohibited.
^ permalink raw reply related [flat|nested] 27+ messages in thread* [PATCH 6/7] pcapng: add user-supplied timestamp support
2026-06-08 16:40 [PATCH 0/7] intel network and pcapng updates Dawid Wesierski
` (4 preceding siblings ...)
2026-06-08 16:40 ` [PATCH 5/7] net/iavf: disable runtime queue setup capability Dawid Wesierski
@ 2026-06-08 16:40 ` Dawid Wesierski
2026-06-08 17:09 ` Stephen Hemminger
2026-06-08 16:40 ` [PATCH 7/7] net/ice: add header split mbuf callback support Dawid Wesierski
` (3 subsequent siblings)
9 siblings, 1 reply; 27+ messages in thread
From: Dawid Wesierski @ 2026-06-08 16:40 UTC (permalink / raw)
To: dev
Cc: thomas, david.marchand, vladimir.medvedkin, bruce.richardson,
anatoly.burakov, reshma.pattan, stephen, Marek Kasiewicz,
Dawid Wesierski
From: Marek Kasiewicz <marek.kasiewicz@intel.com>
Add rte_pcapng_copy_ts() which accepts an optional timestamp parameter
in nanoseconds. When the timestamp is non-zero, it is used directly
instead of reading the TSC. This allows applications to provide
hardware PTP timestamps from the NIC, enabling accurate packet capture
with PTP-domain timing rather than host-local TSC values.
The existing rte_pcapng_copy() function is preserved as a static inline
wrapper that passes zero for backward compatibility.
The TSC-to-epoch conversion in the write path is removed since callers
providing hardware timestamps have already performed the conversion.
Signed-off-by: Marek Kasiewicz <marek.kasiewicz@intel.com>
Signed-off-by: Dawid Wesierski <dawid.wesierski@intel.com>
---
lib/pcapng/rte_pcapng.c | 19 ++++---------------
lib/pcapng/rte_pcapng.h | 41 +++++++++++++++++++++++++++++++++++++++--
2 files changed, 43 insertions(+), 17 deletions(-)
diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c
index b5d1026891..96b3aafeb6 100644
--- a/lib/pcapng/rte_pcapng.c
+++ b/lib/pcapng/rte_pcapng.c
@@ -546,14 +546,14 @@ pcapng_vlan_insert(struct rte_mbuf *m, uint16_t ether_type, uint16_t tci)
*/
/* Make a copy of original mbuf with pcapng header and options */
-RTE_EXPORT_SYMBOL(rte_pcapng_copy)
+RTE_EXPORT_SYMBOL(rte_pcapng_copy_ts)
struct rte_mbuf *
-rte_pcapng_copy(uint16_t port_id, uint32_t queue,
+rte_pcapng_copy_ts(uint16_t port_id, uint32_t queue,
const struct rte_mbuf *md,
struct rte_mempool *mp,
uint32_t length,
enum rte_pcapng_direction direction,
- const char *comment)
+ const char *comment, uint64_t ts)
{
struct pcapng_enhance_packet_block *epb;
uint32_t orig_len, pkt_len, padding, flags;
@@ -691,7 +691,7 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue,
mc->port = port_id;
/* Put timestamp in cycles here - adjust in packet write */
- timestamp = rte_get_tsc_cycles();
+ timestamp = ts ? ts : rte_get_tsc_cycles();
epb->timestamp_hi = timestamp >> 32;
epb->timestamp_lo = (uint32_t)timestamp;
epb->capture_length = pkt_len;
@@ -720,7 +720,6 @@ rte_pcapng_write_packets(rte_pcapng_t *self,
for (i = 0; i < nb_pkts; i++) {
struct rte_mbuf *m = pkts[i];
struct pcapng_enhance_packet_block *epb;
- uint64_t cycles, timestamp;
/* sanity check that is really a pcapng mbuf */
epb = rte_pktmbuf_mtod(m, struct pcapng_enhance_packet_block *);
@@ -737,16 +736,6 @@ rte_pcapng_write_packets(rte_pcapng_t *self,
return -1;
}
- /*
- * When data is captured by pcapng_copy the current TSC is stored.
- * Adjust the value recorded in file to PCAP epoch units.
- */
- cycles = (uint64_t)epb->timestamp_hi << 32;
- cycles += epb->timestamp_lo;
- timestamp = tsc_to_ns_epoch(&self->clock, cycles);
- epb->timestamp_hi = timestamp >> 32;
- epb->timestamp_lo = (uint32_t)timestamp;
-
/*
* Handle case of highly fragmented and large burst size
* Note: this assumes that max segments per mbuf < IOV_MAX
diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h
index d8d328f710..3d735e4ebe 100644
--- a/lib/pcapng/rte_pcapng.h
+++ b/lib/pcapng/rte_pcapng.h
@@ -109,7 +109,7 @@ enum rte_pcapng_direction {
};
/**
- * Format an mbuf for writing to file.
+ * Format an mbuf with time stamp for writing to file.
*
* @param port_id
* The Ethernet port on which packet was received
@@ -129,16 +129,53 @@ enum rte_pcapng_direction {
* @param comment
* Optional per packet comment.
* Truncated to UINT16_MAX characters.
+ * @param ts
+ * Optional timestamp in nanoseconds. If zero, the current TSC is used.
*
* @return
* - The pointer to the new mbuf formatted for pcapng_write
* - NULL on error such as invalid port or out of memory.
*/
struct rte_mbuf *
+rte_pcapng_copy_ts(uint16_t port_id, uint32_t queue,
+ const struct rte_mbuf *m, struct rte_mempool *mp,
+ uint32_t length,
+ enum rte_pcapng_direction direction, const char *comment, uint64_t ts);
+
+/**
+ * Format an mbuf for writing to file.
+ *
+ * @param port_id
+ * The Ethernet port on which packet was received
+ * or is going to be transmitted.
+ * @param queue
+ * The queue on the Ethernet port where packet was received
+ * or is going to be transmitted.
+ * @param mp
+ * The mempool from which the "clone" mbufs are allocated.
+ * @param m
+ * The mbuf to copy
+ * @param length
+ * The upper limit on bytes to copy. Passing UINT32_MAX
+ * means all data (after offset).
+ * @param direction
+ * The direction of the packer: receive, transmit or unknown.
+ * @param comment
+ * Packet comment.
+ *
+ * @return
+ * - The pointer to the new mbuf formatted for pcapng_write
+ * - NULL if allocation fails.
+ */
+static inline struct rte_mbuf *
rte_pcapng_copy(uint16_t port_id, uint32_t queue,
const struct rte_mbuf *m, struct rte_mempool *mp,
uint32_t length,
- enum rte_pcapng_direction direction, const char *comment);
+ enum rte_pcapng_direction direction, const char *comment)
+{
+ return rte_pcapng_copy_ts(port_id, queue, m, mp, length, direction,
+ comment, 0);
+}
/**
--
2.47.3
---------------------------------------------------------------------
Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.
Spolka oswiadcza, ze posiada status duzego przedsiebiorcy w rozumieniu ustawy z dnia 8 marca 2013 r. o przeciwdzialaniu nadmiernym opoznieniom w transakcjach handlowych.
Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by others is strictly prohibited.
^ permalink raw reply related [flat|nested] 27+ messages in thread* Re: [PATCH 6/7] pcapng: add user-supplied timestamp support
2026-06-08 16:40 ` [PATCH 6/7] pcapng: add user-supplied timestamp support Dawid Wesierski
@ 2026-06-08 17:09 ` Stephen Hemminger
0 siblings, 0 replies; 27+ messages in thread
From: Stephen Hemminger @ 2026-06-08 17:09 UTC (permalink / raw)
To: Dawid Wesierski
Cc: dev, thomas, david.marchand, vladimir.medvedkin, bruce.richardson,
anatoly.burakov, reshma.pattan, Marek Kasiewicz
On Mon, 8 Jun 2026 12:40:58 -0400
Dawid Wesierski <dawid.wesierski@intel.com> wrote:
> @@ -737,16 +736,6 @@ rte_pcapng_write_packets(rte_pcapng_t *self,
> return -1;
> }
>
> - /*
> - * When data is captured by pcapng_copy the current TSC is stored.
> - * Adjust the value recorded in file to PCAP epoch units.
> - */
> - cycles = (uint64_t)epb->timestamp_hi << 32;
> - cycles += epb->timestamp_lo;
> - timestamp = tsc_to_ns_epoch(&self->clock, cycles);
> - epb->timestamp_hi = timestamp >> 32;
> - epb->timestamp_lo = (uint32_t)timestamp;
> -
You can't generate valid pcapng timestamps without this.
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 7/7] net/ice: add header split mbuf callback support
2026-06-08 16:40 [PATCH 0/7] intel network and pcapng updates Dawid Wesierski
` (5 preceding siblings ...)
2026-06-08 16:40 ` [PATCH 6/7] pcapng: add user-supplied timestamp support Dawid Wesierski
@ 2026-06-08 16:40 ` Dawid Wesierski
2026-06-08 16:59 ` [PATCH 0/7] intel network and pcapng updates Thomas Monjalon
` (2 subsequent siblings)
9 siblings, 0 replies; 27+ messages in thread
From: Dawid Wesierski @ 2026-06-08 16:40 UTC (permalink / raw)
To: dev
Cc: thomas, david.marchand, vladimir.medvedkin, bruce.richardson,
anatoly.burakov, reshma.pattan, stephen, Marek Kasiewicz,
Dawid Wesierski
From: Marek Kasiewicz <marek.kasiewicz@intel.com>
Add an ethdev API rte_eth_hdrs_set_mbuf_callback() that allows
applications to register a callback providing custom payload mbufs
for header split RX mode. When registered, the ICE PMD calls this
callback at mbuf allocation points to obtain user-provided payload
buffers instead of allocating from the mempool.
This enables zero-copy RX for header split: the NIC DMAs the payload
directly into application-managed buffers (e.g., mapped frame buffers
with known IOVA), bypassing an extra memcpy from the mempool mbuf.
The callback is invoked at three allocation points in the ICE driver:
initial queue setup, bulk buffer allocation, and single-packet
receive path.
Signed-off-by: Marek Kasiewicz <marek.kasiewicz@intel.com>
Signed-off-by: Dawid Wesierski <dawid.wesierski@intel.com>
---
drivers/net/intel/common/rx.h | 2 +
drivers/net/intel/ice/ice_ethdev.c | 1 +
drivers/net/intel/ice/ice_rxtx.c | 63 ++++++++++++++++++++++++++++++
drivers/net/intel/ice/ice_rxtx.h | 2 +
lib/ethdev/ethdev_driver.h | 10 +++++
lib/ethdev/rte_ethdev.c | 17 ++++++++
lib/ethdev/rte_ethdev.h | 46 ++++++++++++++++++++++
7 files changed, 141 insertions(+)
diff --git a/drivers/net/intel/common/rx.h b/drivers/net/intel/common/rx.h
index e0bf520ebd..8abb2a3ce9 100644
--- a/drivers/net/intel/common/rx.h
+++ b/drivers/net/intel/common/rx.h
@@ -113,6 +113,8 @@ struct ci_rx_queue {
uint32_t hw_time_low; /* low 32 bits of timestamp */
int ts_offset; /* dynamic mbuf timestamp field offset */
uint64_t ts_flag; /* dynamic mbuf timestamp flag */
+ rte_eth_hdrs_mbuf_callback_fn hdrs_mbuf_cb; /* hdr split mbuf cb */
+ void *hdrs_mbuf_cb_priv; /* hdr split mbuf cb priv */
};
struct { /* iavf specific values */
const struct iavf_rxq_ops *ops; /**< queue ops */
diff --git a/drivers/net/intel/ice/ice_ethdev.c b/drivers/net/intel/ice/ice_ethdev.c
index b7cea3bfc1..fb15438dbc 100644
--- a/drivers/net/intel/ice/ice_ethdev.c
+++ b/drivers/net/intel/ice/ice_ethdev.c
@@ -282,6 +282,7 @@ static const struct eth_dev_ops ice_eth_dev_ops = {
.dev_set_link_down = ice_dev_set_link_down,
.dev_led_on = ice_dev_led_on,
.dev_led_off = ice_dev_led_off,
+ .hdrs_mbuf_set_cb = ice_hdrs_mbuf_set_cb,
.rx_queue_start = ice_rx_queue_start,
.rx_queue_stop = ice_rx_queue_stop,
.tx_queue_start = ice_tx_queue_start,
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 8d709125f7..867f595291 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -487,6 +487,17 @@ ice_alloc_rx_queue_mbufs(struct ci_rx_queue *rxq)
return -ENOMEM;
}
+ if (rxq->hdrs_mbuf_cb) {
+ struct rte_eth_hdrs_mbuf hdrs_mbuf = {0};
+ int ret = rxq->hdrs_mbuf_cb(rxq->hdrs_mbuf_cb_priv,
+ &hdrs_mbuf);
+
+ if (ret >= 0) {
+ mbuf_pay->buf_addr = hdrs_mbuf.buf_addr;
+ mbuf_pay->buf_iova = hdrs_mbuf.buf_iova;
+ }
+ }
+
mbuf_pay->next = NULL;
mbuf_pay->data_off = RTE_PKTMBUF_HEADROOM;
mbuf_pay->nb_segs = 1;
@@ -2126,6 +2137,16 @@ ice_rx_alloc_bufs(struct ci_rx_queue *rxq)
rxdp[i].read.pkt_addr = dma_addr;
} else {
mb->next = rxq->sw_split_buf[i].mbuf;
+ if (rxq->hdrs_mbuf_cb && mb->next) {
+ struct rte_eth_hdrs_mbuf hdrs_mbuf = {0};
+ int ret = rxq->hdrs_mbuf_cb(rxq->hdrs_mbuf_cb_priv,
+ &hdrs_mbuf);
+
+ if (ret >= 0) {
+ mb->next->buf_addr = hdrs_mbuf.buf_addr;
+ mb->next->buf_iova = hdrs_mbuf.buf_iova;
+ }
+ }
pay_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mb->next));
rxdp[i].read.hdr_addr = dma_addr;
rxdp[i].read.pkt_addr = pay_addr;
@@ -2810,6 +2831,17 @@ ice_recv_pkts(void *rx_queue,
break;
}
+ if (rxq->hdrs_mbuf_cb) {
+ struct rte_eth_hdrs_mbuf hdrs_mbuf = {0};
+ int ret = rxq->hdrs_mbuf_cb(rxq->hdrs_mbuf_cb_priv,
+ &hdrs_mbuf);
+
+ if (ret >= 0) {
+ nmb_pay->buf_addr = hdrs_mbuf.buf_addr;
+ nmb_pay->buf_iova = hdrs_mbuf.buf_iova;
+ }
+ }
+
nmb->next = nmb_pay;
nmb_pay->next = NULL;
@@ -4533,3 +4565,34 @@ ice_fdir_programming(struct ice_pf *pf, struct ice_fltr_desc *fdir_desc)
}
+
+int
+ice_hdrs_mbuf_set_cb(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+ void *priv, rte_eth_hdrs_mbuf_callback_fn cb)
+{
+ struct ci_rx_queue *rxq;
+
+ if (rx_queue_id >= dev->data->nb_rx_queues) {
+ PMD_DRV_LOG(ERR, "RX queue %u out of range", rx_queue_id);
+ return -EINVAL;
+ }
+
+ rxq = dev->data->rx_queues[rx_queue_id];
+ if (rxq == NULL) {
+ PMD_DRV_LOG(ERR, "RX queue %u not available or setup", rx_queue_id);
+ return -EINVAL;
+ }
+
+ if (rxq->hdrs_mbuf_cb) {
+ PMD_DRV_LOG(ERR, "RX queue %u has hdrs mbuf cb already",
+ rx_queue_id);
+ return -EEXIST;
+ }
+
+ rxq->hdrs_mbuf_cb_priv = priv;
+ rxq->hdrs_mbuf_cb = cb;
+ PMD_DRV_LOG(NOTICE, "RX queue %u register hdrs mbuf cb at %p",
+ rx_queue_id, cb);
+
+ return 0;
+}
diff --git a/drivers/net/intel/ice/ice_rxtx.h b/drivers/net/intel/ice/ice_rxtx.h
index 999b6b30d6..7ed114ee94 100644
--- a/drivers/net/intel/ice/ice_rxtx.h
+++ b/drivers/net/intel/ice/ice_rxtx.h
@@ -303,6 +303,8 @@ uint16_t ice_xmit_pkts_vec_avx512_offload(void *tx_queue,
int ice_fdir_programming(struct ice_pf *pf, struct ice_fltr_desc *fdir_desc);
int ice_tx_done_cleanup(void *txq, uint32_t free_cnt);
int ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc);
+int ice_hdrs_mbuf_set_cb(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+ void *priv, rte_eth_hdrs_mbuf_callback_fn cb);
enum rte_vect_max_simd ice_get_max_simd_bitwidth(void);
#define FDIR_PARSING_ENABLE_PER_QUEUE(ad, on) do { \
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 0f336f9567..b48681268c 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -1292,6 +1292,13 @@ typedef int (*eth_cman_config_set_t)(struct rte_eth_dev *dev,
typedef int (*eth_cman_config_get_t)(struct rte_eth_dev *dev,
struct rte_eth_cman_config *config);
+/** @internal
+ * Set header split payload mbuf callback for a receive queue.
+ */
+typedef int (*eth_hdrs_mbuf_set_cb_t)(struct rte_eth_dev *dev,
+ uint16_t rx_queue_id, void *priv,
+ rte_eth_hdrs_mbuf_callback_fn cb);
+
/**
* @internal
* Dump Rx descriptor info to a file.
@@ -1652,6 +1659,9 @@ struct eth_dev_ops {
/** Dump Tx descriptor info */
eth_tx_descriptor_dump_t eth_tx_descriptor_dump;
+ /** Set header split mbuf callback */
+ eth_hdrs_mbuf_set_cb_t hdrs_mbuf_set_cb;
+
/** Get congestion management information */
eth_cman_info_get_t cman_info_get;
/** Initialize congestion management structure with default values */
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 9efeaf77cb..d5820ccd22 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -7316,6 +7316,23 @@ rte_eth_ip_reassembly_conf_set(uint16_t port_id,
return ret;
}
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_hdrs_set_mbuf_callback, 26.07)
+int
+rte_eth_hdrs_set_mbuf_callback(uint16_t port_id, uint16_t rx_queue_id,
+ void *priv, rte_eth_hdrs_mbuf_callback_fn cb)
+{
+ struct rte_eth_dev *dev;
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+ dev = &rte_eth_devices[port_id];
+
+ if (dev->dev_ops->hdrs_mbuf_set_cb == NULL)
+ return -ENOTSUP;
+
+ return eth_err(port_id,
+ dev->dev_ops->hdrs_mbuf_set_cb(dev, rx_queue_id, priv, cb));
+}
+
RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_dev_priv_dump, 22.03)
int
rte_eth_dev_priv_dump(uint16_t port_id, FILE *file)
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index ee400b386f..dbf2c23a35 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -6985,6 +6985,52 @@ rte_eth_tx_buffer(uint16_t port_id, uint16_t queue_id,
return rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
}
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice.
+ *
+ * Buffer descriptor for header split payload mbuf callback.
+ */
+struct rte_eth_hdrs_mbuf {
+ void *buf_addr; /**< Virtual address of payload buffer. */
+ rte_iova_t buf_iova; /**< IOVA of payload buffer. */
+};
+
+/**
+ * Callback function type for providing custom payload mbufs
+ * in header split mode.
+ *
+ * @param priv
+ * User-provided private context.
+ * @param mbuf
+ * Pointer to buffer descriptor to be filled by the callback.
+ * @return
+ * 0 on success, negative errno on failure.
+ */
+typedef int (*rte_eth_hdrs_mbuf_callback_fn)(void *priv,
+ struct rte_eth_hdrs_mbuf *mbuf);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice.
+ *
+ * Register a callback to provide custom payload mbufs for header split RX.
+ *
+ * @param port_id
+ * The port identifier of the Ethernet device.
+ * @param rx_queue_id
+ * The index of the receive queue.
+ * @param priv
+ * User-provided private context passed to the callback.
+ * @param cb
+ * Callback function that provides payload buffer descriptors.
+ * @return
+ * 0 on success, negative errno on failure.
+ */
+__rte_experimental
+int rte_eth_hdrs_set_mbuf_callback(uint16_t port_id, uint16_t rx_queue_id,
+ void *priv, rte_eth_hdrs_mbuf_callback_fn cb);
+
/**
* @warning
* @b EXPERIMENTAL: this API may change, or be removed, without prior notice
--
2.47.3
---------------------------------------------------------------------
Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.
Spolka oswiadcza, ze posiada status duzego przedsiebiorcy w rozumieniu ustawy z dnia 8 marca 2013 r. o przeciwdzialaniu nadmiernym opoznieniom w transakcjach handlowych.
Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by others is strictly prohibited.
^ permalink raw reply related [flat|nested] 27+ messages in thread* Re: [PATCH 0/7] intel network and pcapng updates
2026-06-08 16:40 [PATCH 0/7] intel network and pcapng updates Dawid Wesierski
` (6 preceding siblings ...)
2026-06-08 16:40 ` [PATCH 7/7] net/ice: add header split mbuf callback support Dawid Wesierski
@ 2026-06-08 16:59 ` Thomas Monjalon
2026-06-18 14:44 ` [PATCH v3 1/1] pcapng: add user-supplied timestamp support Dawid Wesierski
2026-06-18 14:44 ` [PATCH v2 0/7] Intel network drivers enhancements Dawid Wesierski
9 siblings, 0 replies; 27+ messages in thread
From: Thomas Monjalon @ 2026-06-08 16:59 UTC (permalink / raw)
To: Dawid Wesierski
Cc: dev, david.marchand, vladimir.medvedkin, bruce.richardson,
anatoly.burakov, reshma.pattan, stephen, Wesierski, Dawid
08/06/2026 18:40, Dawid Wesierski:
> From: "Wesierski, Dawid" <dawid.wesierski@intel.com>
>
> These patches provide various updates for Intel iavf/ice drivers and pcapng.
Please would you mind sending the pcapng changes in a separate series?
Thank you
^ permalink raw reply [flat|nested] 27+ messages in thread* [PATCH v3 1/1] pcapng: add user-supplied timestamp support
2026-06-08 16:40 [PATCH 0/7] intel network and pcapng updates Dawid Wesierski
` (7 preceding siblings ...)
2026-06-08 16:59 ` [PATCH 0/7] intel network and pcapng updates Thomas Monjalon
@ 2026-06-18 14:44 ` Dawid Wesierski
2026-06-18 15:20 ` Stephen Hemminger
2026-06-18 14:44 ` [PATCH v2 0/7] Intel network drivers enhancements Dawid Wesierski
9 siblings, 1 reply; 27+ messages in thread
From: Dawid Wesierski @ 2026-06-18 14:44 UTC (permalink / raw)
To: dev; +Cc: stephen, thomas, Marek Kasiewicz, Dawid Wesierski
From: Marek Kasiewicz <marek.kasiewicz@intel.com>
Add rte_pcapng_copy_ts() which accepts a timestamp parameter in
nanoseconds since the Unix epoch. When non-zero, the supplied value is
used directly. This allows applications to provide hardware PTP
timestamps from the NIC, enabling accurate packet capture with
PTP-domain timing rather than host-local TSC values.
The existing rte_pcapng_copy() function is preserved as a static
inline wrapper that passes zero, keeping the original TSC-based
behaviour for callers that do not have a hardware timestamp.
To support both timestamp sources, the per-mbuf timestamp now carries
a sentinel bit: when rte_pcapng_copy_ts() is called with ts == 0 it
stores the current TSC with bit 63 set. rte_pcapng_write_packets()
detects the sentinel, clears it and converts TSC -> epoch ns using
the per-file clock before writing. A timestamp supplied by the caller
has bit 63 clear and is written unchanged. The sentinel space is safe
because the TSC counter does not reach bit 63 for centuries and
epoch-ns values stay below bit 63 until the year 2554.
Signed-off-by: Marek Kasiewicz <marek.kasiewicz@intel.com>
Signed-off-by: Dawid Wesierski <dawid.wesierski@intel.com>
---
.mailmap | 2 ++
lib/pcapng/rte_pcapng.c | 42 +++++++++++++++++++++++++++-------------
lib/pcapng/rte_pcapng.h | 43 +++++++++++++++++++++++++++++++++++++++--
3 files changed, 72 insertions(+), 15 deletions(-)
diff --git a/.mailmap b/.mailmap
index 4001e5fb0e..a7d97a631e 100644
--- a/.mailmap
+++ b/.mailmap
@@ -366,6 +366,7 @@ David Zeng <zengxhsh@cn.ibm.com>
Davide Caratti <dcaratti@redhat.com>
Dawid Gorecki <dgr@semihalf.com>
Dawid Jurczak <dawid_jurek@vp.pl>
+Dawid Wesierski <dawid.wesierski@intel.com> Wesierski, Dawid <dawid.wesierski@intel.com>
Dawid Zielinski <dawid.zielinski@intel.com>
Dawid Łukwiński <dawid.lukwinski@intel.com>
Daxue Gao <daxuex.gao@intel.com>
@@ -1014,6 +1015,7 @@ Marcin Wilk <marcin.wilk@caviumnetworks.com>
Marcin Wojtas <mw@semihalf.com>
Marcin Zapolski <marcinx.a.zapolski@intel.com>
Marco Varlese <mvarlese@suse.de>
+Marek Kasiewicz <marek.kasiewicz@intel.com>
Marek Mical <marekx.mical@intel.com>
Marek Zalfresso-jundzillo <marekx.zalfresso-jundzillo@intel.com>
Maria Lingemark <maria.lingemark@ericsson.com>
diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c
index b5d1026891..29090a2ae4 100644
--- a/lib/pcapng/rte_pcapng.c
+++ b/lib/pcapng/rte_pcapng.c
@@ -546,14 +546,14 @@ pcapng_vlan_insert(struct rte_mbuf *m, uint16_t ether_type, uint16_t tci)
*/
/* Make a copy of original mbuf with pcapng header and options */
-RTE_EXPORT_SYMBOL(rte_pcapng_copy)
+RTE_EXPORT_SYMBOL(rte_pcapng_copy_ts)
struct rte_mbuf *
-rte_pcapng_copy(uint16_t port_id, uint32_t queue,
+rte_pcapng_copy_ts(uint16_t port_id, uint32_t queue,
const struct rte_mbuf *md,
struct rte_mempool *mp,
uint32_t length,
enum rte_pcapng_direction direction,
- const char *comment)
+ const char *comment, uint64_t ts)
{
struct pcapng_enhance_packet_block *epb;
uint32_t orig_len, pkt_len, padding, flags;
@@ -690,8 +690,20 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue,
/* Interface index is filled in later during write */
mc->port = port_id;
- /* Put timestamp in cycles here - adjust in packet write */
- timestamp = rte_get_tsc_cycles();
+ /*
+ * Timestamp handling:
+ * - If the caller supplied an explicit timestamp (ts != 0), it is
+ * already in nanoseconds since the Unix epoch, so store it as-is.
+ * - If the caller did not (ts == 0), store the current TSC and set
+ * the high bit as a sentinel so rte_pcapng_write_packets() knows
+ * it must convert TSC -> epoch ns at write time. The TSC counter
+ * will not reach bit 63 for centuries, and epoch-ns values stay
+ * below bit 63 until the year 2554, so the bit is safe to use.
+ */
+ if (ts != 0)
+ timestamp = ts;
+ else
+ timestamp = rte_get_tsc_cycles() | (UINT64_C(1) << 63);
epb->timestamp_hi = timestamp >> 32;
epb->timestamp_lo = (uint32_t)timestamp;
epb->capture_length = pkt_len;
@@ -720,7 +732,7 @@ rte_pcapng_write_packets(rte_pcapng_t *self,
for (i = 0; i < nb_pkts; i++) {
struct rte_mbuf *m = pkts[i];
struct pcapng_enhance_packet_block *epb;
- uint64_t cycles, timestamp;
+ uint64_t timestamp;
/* sanity check that is really a pcapng mbuf */
epb = rte_pktmbuf_mtod(m, struct pcapng_enhance_packet_block *);
@@ -738,14 +750,18 @@ rte_pcapng_write_packets(rte_pcapng_t *self,
}
/*
- * When data is captured by pcapng_copy the current TSC is stored.
- * Adjust the value recorded in file to PCAP epoch units.
+ * If rte_pcapng_copy[_ts]() stored a TSC value (high bit set
+ * as sentinel), convert it to nanoseconds since the Unix epoch
+ * using the per-file clock. Otherwise the timestamp is already
+ * in epoch ns and is written unchanged.
*/
- cycles = (uint64_t)epb->timestamp_hi << 32;
- cycles += epb->timestamp_lo;
- timestamp = tsc_to_ns_epoch(&self->clock, cycles);
- epb->timestamp_hi = timestamp >> 32;
- epb->timestamp_lo = (uint32_t)timestamp;
+ timestamp = ((uint64_t)epb->timestamp_hi << 32) | epb->timestamp_lo;
+ if (timestamp & (UINT64_C(1) << 63)) {
+ timestamp &= ~(UINT64_C(1) << 63);
+ timestamp = tsc_to_ns_epoch(&self->clock, timestamp);
+ epb->timestamp_hi = timestamp >> 32;
+ epb->timestamp_lo = (uint32_t)timestamp;
+ }
/*
* Handle case of highly fragmented and large burst size
diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h
index d8d328f710..975e7996f0 100644
--- a/lib/pcapng/rte_pcapng.h
+++ b/lib/pcapng/rte_pcapng.h
@@ -109,7 +109,7 @@ enum rte_pcapng_direction {
};
/**
- * Format an mbuf for writing to file.
+ * Format an mbuf with time stamp for writing to file.
*
* @param port_id
* The Ethernet port on which packet was received
@@ -129,16 +129,55 @@ enum rte_pcapng_direction {
* @param comment
* Optional per packet comment.
* Truncated to UINT16_MAX characters.
+ * @param ts
+ * Packet timestamp in nanoseconds since the Unix epoch. If zero, the
+ * current TSC is captured and converted to epoch ns by
+ * rte_pcapng_write_packets() when the packet is written.
*
* @return
* - The pointer to the new mbuf formatted for pcapng_write
* - NULL on error such as invalid port or out of memory.
*/
struct rte_mbuf *
+rte_pcapng_copy_ts(uint16_t port_id, uint32_t queue,
+ const struct rte_mbuf *m, struct rte_mempool *mp,
+ uint32_t length,
+ enum rte_pcapng_direction direction, const char *comment, uint64_t ts);
+
+/**
+ * Format an mbuf for writing to file.
+ *
+ * @param port_id
+ * The Ethernet port on which packet was received
+ * or is going to be transmitted.
+ * @param queue
+ * The queue on the Ethernet port where packet was received
+ * or is going to be transmitted.
+ * @param mp
+ * The mempool from which the "clone" mbufs are allocated.
+ * @param m
+ * The mbuf to copy
+ * @param length
+ * The upper limit on bytes to copy. Passing UINT32_MAX
+ * means all data (after offset).
+ * @param direction
+ * The direction of the packer: receive, transmit or unknown.
+ * @param comment
+ * Packet comment.
+ *
+ * @return
+ * - The pointer to the new mbuf formatted for pcapng_write
+ * - NULL if allocation fails.
+ */
+static inline struct rte_mbuf *
rte_pcapng_copy(uint16_t port_id, uint32_t queue,
const struct rte_mbuf *m, struct rte_mempool *mp,
uint32_t length,
- enum rte_pcapng_direction direction, const char *comment);
+ enum rte_pcapng_direction direction, const char *comment)
+{
+ return rte_pcapng_copy_ts(port_id, queue, m, mp, length, direction,
+ comment, 0);
+}
/**
--
2.47.3
---------------------------------------------------------------------
Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.
Spolka oswiadcza, ze posiada status duzego przedsiebiorcy w rozumieniu ustawy z dnia 8 marca 2013 r. o przeciwdzialaniu nadmiernym opoznieniom w transakcjach handlowych.
Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by others is strictly prohibited.
^ permalink raw reply related [flat|nested] 27+ messages in thread* Re: [PATCH v3 1/1] pcapng: add user-supplied timestamp support
2026-06-18 14:44 ` [PATCH v3 1/1] pcapng: add user-supplied timestamp support Dawid Wesierski
@ 2026-06-18 15:20 ` Stephen Hemminger
0 siblings, 0 replies; 27+ messages in thread
From: Stephen Hemminger @ 2026-06-18 15:20 UTC (permalink / raw)
To: Dawid Wesierski; +Cc: dev, thomas, Marek Kasiewicz
On Thu, 18 Jun 2026 10:44:29 -0400
Dawid Wesierski <dawid.wesierski@intel.com> wrote:
> +static inline struct rte_mbuf *
> rte_pcapng_copy(uint16_t port_id, uint32_t queue,
> const struct rte_mbuf *m, struct rte_mempool *mp,
> uint32_t length,
> - enum rte_pcapng_direction direction, const char *comment);
> + enum rte_pcapng_direction direction, const char *comment)
> +{
> + return rte_pcapng_copy_ts(port_id, queue, m, mp, length, direction,
> + comment, 0);
> +}
>
Switching from function to inline in header would cause ABI breakage.
New build would not have old function to runtime linking.
In this case, please just keep the old function name but add
a parameter using function versioning.
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v2 0/7] Intel network drivers enhancements
2026-06-08 16:40 [PATCH 0/7] intel network and pcapng updates Dawid Wesierski
` (8 preceding siblings ...)
2026-06-18 14:44 ` [PATCH v3 1/1] pcapng: add user-supplied timestamp support Dawid Wesierski
@ 2026-06-18 14:44 ` Dawid Wesierski
2026-06-18 14:44 ` [PATCH v2 1/7] ethdev: add header split mbuf callback API Dawid Wesierski
` (8 more replies)
9 siblings, 9 replies; 27+ messages in thread
From: Dawid Wesierski @ 2026-06-18 14:44 UTC (permalink / raw)
To: dev; +Cc: thomas, david.marchand, Marek Kasiewicz
From: Marek Kasiewicz <marek.kasiewicz@intel.com>
This series introduces several improvements to Intel iavf and ice
drivers, including a new ethdev API for header split mbuf callbacks,
increased ring descriptors, and improved PTP timestamping.
Marek Kasiewicz (7):
ethdev: add header split mbuf callback API
net/iavf: increase max ring descriptors to hardware limit
net/iavf: allow runtime queue rate limit configuration
net/ice/base: reduce default scheduler burst size
net/ice: timestamp all received packets when PTP is enabled
net/iavf: disable runtime queue setup capability
net/intel: support header split mbuf callback
drivers/net/intel/common/rx.h | 2 +
drivers/net/intel/iavf/iavf_ethdev.c | 3 --
drivers/net/intel/iavf/iavf_rxtx.h | 2 +-
drivers/net/intel/iavf/iavf_tm.c | 11 ++--
drivers/net/intel/ice/base/ice_type.h | 2 +-
drivers/net/intel/ice/ice_ethdev.c | 1 +
drivers/net/intel/ice/ice_rxtx.c | 72 ++++++++++++++++++++++++---
drivers/net/intel/ice/ice_rxtx.h | 2 +
lib/ethdev/ethdev_driver.h | 15 +++++++
lib/ethdev/rte_ethdev.c | 51 ++++++++++++++++++++++
lib/ethdev/rte_ethdev.h | 7 +++
11 files changed, 153 insertions(+), 15 deletions(-)
--
2.47.3
---------------------------------------------------------------------
Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.
Spolka oswiadcza, ze posiada status duzego przedsiebiorcy w rozumieniu ustawy z dnia 8 marca 2013 r. o przeciwdzialaniu nadmiernym opoznieniom w transakcjach handlowych.
Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by others is strictly prohibited.
^ permalink raw reply [flat|nested] 27+ messages in thread* [PATCH v2 1/7] ethdev: add header split mbuf callback API
2026-06-18 14:44 ` [PATCH v2 0/7] Intel network drivers enhancements Dawid Wesierski
@ 2026-06-18 14:44 ` Dawid Wesierski
2026-06-18 16:26 ` Thomas Monjalon
2026-06-18 14:44 ` [PATCH v2 2/7] net/iavf: increase max ring descriptors to hardware limit Dawid Wesierski
` (7 subsequent siblings)
8 siblings, 1 reply; 27+ messages in thread
From: Dawid Wesierski @ 2026-06-18 14:44 UTC (permalink / raw)
To: dev; +Cc: thomas, david.marchand, Marek Kasiewicz, Dawid Wesierski
From: Marek Kasiewicz <marek.kasiewicz@intel.com>
Add rte_eth_hdrs_set_mbuf_callback() that allows applications to
register a callback providing custom payload mbufs for header split RX
mode. When registered, a PMD that supports header split is expected to
call this callback at mbuf allocation points to obtain user-provided
payload buffers instead of allocating from the mempool.
This enables zero-copy RX for header split: the NIC DMAs the payload
directly into application-managed buffers (e.g., mapped frame buffers
with known IOVA), bypassing an extra memcpy from the mempool mbuf.
A new struct rte_eth_hdrs_mbuf describes the payload buffer (virtual
address and IOVA), and the new dev_ops hook hdrs_mbuf_set_cb lets each
PMD wire the callback to its receive queue state.
The API is marked experimental and exported with version 26.07.
Signed-off-by: Marek Kasiewicz <marek.kasiewicz@intel.com>
Signed-off-by: Dawid Wesierski <dawid.wesierski@intel.com>
---
lib/ethdev/ethdev_driver.h | 10 +++++++++
lib/ethdev/rte_ethdev.c | 17 ++++++++++++++
lib/ethdev/rte_ethdev.h | 46 ++++++++++++++++++++++++++++++++++++++
3 files changed, 73 insertions(+)
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 0f336f9567..b48681268c 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -1292,6 +1292,13 @@ typedef int (*eth_cman_config_set_t)(struct rte_eth_dev *dev,
typedef int (*eth_cman_config_get_t)(struct rte_eth_dev *dev,
struct rte_eth_cman_config *config);
+/** @internal
+ * Set header split payload mbuf callback for a receive queue.
+ */
+typedef int (*eth_hdrs_mbuf_set_cb_t)(struct rte_eth_dev *dev,
+ uint16_t rx_queue_id, void *priv,
+ rte_eth_hdrs_mbuf_callback_fn cb);
+
/**
* @internal
* Dump Rx descriptor info to a file.
@@ -1652,6 +1659,9 @@ struct eth_dev_ops {
/** Dump Tx descriptor info */
eth_tx_descriptor_dump_t eth_tx_descriptor_dump;
+ /** Set header split mbuf callback */
+ eth_hdrs_mbuf_set_cb_t hdrs_mbuf_set_cb;
+
/** Get congestion management information */
eth_cman_info_get_t cman_info_get;
/** Initialize congestion management structure with default values */
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 9efeaf77cb..d5820ccd22 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -7316,6 +7316,23 @@ rte_eth_ip_reassembly_conf_set(uint16_t port_id,
return ret;
}
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_hdrs_set_mbuf_callback, 26.07)
+int
+rte_eth_hdrs_set_mbuf_callback(uint16_t port_id, uint16_t rx_queue_id,
+ void *priv, rte_eth_hdrs_mbuf_callback_fn cb)
+{
+ struct rte_eth_dev *dev;
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+ dev = &rte_eth_devices[port_id];
+
+ if (dev->dev_ops->hdrs_mbuf_set_cb == NULL)
+ return -ENOTSUP;
+
+ return eth_err(port_id,
+ dev->dev_ops->hdrs_mbuf_set_cb(dev, rx_queue_id, priv, cb));
+}
+
RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_dev_priv_dump, 22.03)
int
rte_eth_dev_priv_dump(uint16_t port_id, FILE *file)
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index ee400b386f..dbf2c23a35 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -6985,6 +6985,52 @@ rte_eth_tx_buffer(uint16_t port_id, uint16_t queue_id,
return rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
}
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice.
+ *
+ * Buffer descriptor for header split payload mbuf callback.
+ */
+struct rte_eth_hdrs_mbuf {
+ void *buf_addr; /**< Virtual address of payload buffer. */
+ rte_iova_t buf_iova; /**< IOVA of payload buffer. */
+};
+
+/**
+ * Callback function type for providing custom payload mbufs
+ * in header split mode.
+ *
+ * @param priv
+ * User-provided private context.
+ * @param mbuf
+ * Pointer to buffer descriptor to be filled by the callback.
+ * @return
+ * 0 on success, negative errno on failure.
+ */
+typedef int (*rte_eth_hdrs_mbuf_callback_fn)(void *priv,
+ struct rte_eth_hdrs_mbuf *mbuf);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice.
+ *
+ * Register a callback to provide custom payload mbufs for header split RX.
+ *
+ * @param port_id
+ * The port identifier of the Ethernet device.
+ * @param rx_queue_id
+ * The index of the receive queue.
+ * @param priv
+ * User-provided private context passed to the callback.
+ * @param cb
+ * Callback function that provides payload buffer descriptors.
+ * @return
+ * 0 on success, negative errno on failure.
+ */
+__rte_experimental
+int rte_eth_hdrs_set_mbuf_callback(uint16_t port_id, uint16_t rx_queue_id,
+ void *priv, rte_eth_hdrs_mbuf_callback_fn cb);
+
/**
* @warning
* @b EXPERIMENTAL: this API may change, or be removed, without prior notice
--
2.47.3
---------------------------------------------------------------------
Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.
Spolka oswiadcza, ze posiada status duzego przedsiebiorcy w rozumieniu ustawy z dnia 8 marca 2013 r. o przeciwdzialaniu nadmiernym opoznieniom w transakcjach handlowych.
Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by others is strictly prohibited.
^ permalink raw reply related [flat|nested] 27+ messages in thread* Re: [PATCH v2 1/7] ethdev: add header split mbuf callback API
2026-06-18 14:44 ` [PATCH v2 1/7] ethdev: add header split mbuf callback API Dawid Wesierski
@ 2026-06-18 16:26 ` Thomas Monjalon
0 siblings, 0 replies; 27+ messages in thread
From: Thomas Monjalon @ 2026-06-18 16:26 UTC (permalink / raw)
To: Marek Kasiewicz, Dawid Wesierski
Cc: dev, david.marchand, Stephen Hemminger, andrew.rybchenko
Hello,
Please use --cc-cmd devtools/get-maintainer.sh so maintainers are Cc'ed.
18/06/2026 16:44, Dawid Wesierski:
> From: Marek Kasiewicz <marek.kasiewicz@intel.com>
>
> Add rte_eth_hdrs_set_mbuf_callback() that allows applications to
> register a callback providing custom payload mbufs for header split RX
> mode. When registered, a PMD that supports header split is expected to
> call this callback at mbuf allocation points to obtain user-provided
> payload buffers instead of allocating from the mempool.
What is not working with the current API?
You can already use an external buffer.
> This enables zero-copy RX for header split: the NIC DMAs the payload
> directly into application-managed buffers (e.g., mapped frame buffers
> with known IOVA), bypassing an extra memcpy from the mempool mbuf.
Do you know the flag RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF?
> A new struct rte_eth_hdrs_mbuf describes the payload buffer (virtual
> address and IOVA), and the new dev_ops hook hdrs_mbuf_set_cb lets each
> PMD wire the callback to its receive queue state.
>
> The API is marked experimental and exported with version 26.07.
I think this API is not necessary.
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v2 2/7] net/iavf: increase max ring descriptors to hardware limit
2026-06-18 14:44 ` [PATCH v2 0/7] Intel network drivers enhancements Dawid Wesierski
2026-06-18 14:44 ` [PATCH v2 1/7] ethdev: add header split mbuf callback API Dawid Wesierski
@ 2026-06-18 14:44 ` Dawid Wesierski
2026-06-18 14:44 ` [PATCH v2 3/7] net/iavf: allow runtime queue rate limit configuration Dawid Wesierski
` (6 subsequent siblings)
8 siblings, 0 replies; 27+ messages in thread
From: Dawid Wesierski @ 2026-06-18 14:44 UTC (permalink / raw)
To: dev; +Cc: thomas, david.marchand, Marek Kasiewicz, Dawid Wesierski
From: Marek Kasiewicz <marek.kasiewicz@intel.com>
The Intel E810 hardware supports up to 8160 (8K - 32) descriptors per
TX/RX ring, but IAVF_MAX_RING_DESC caps it at 4096. Applications that
need deep descriptor rings for hardware rate-limited pacing (e.g.,
ST2110 video with thousands of packets per frame) cannot queue enough
packets before the pacing epoch begins.
Increase IAVF_MAX_RING_DESC to the hardware maximum of 8160 to allow
full utilization of the ring depth on E810 VFs.
Signed-off-by: Marek Kasiewicz <marek.kasiewicz@intel.com>
Signed-off-by: Dawid Wesierski <dawid.wesierski@intel.com>
---
drivers/net/intel/iavf/iavf_rxtx.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/intel/iavf/iavf_rxtx.h b/drivers/net/intel/iavf/iavf_rxtx.h
index 8449236d4d..22ea415f44 100644
--- a/drivers/net/intel/iavf/iavf_rxtx.h
+++ b/drivers/net/intel/iavf/iavf_rxtx.h
@@ -16,7 +16,7 @@
/* In QLEN must be whole number of 32 descriptors. */
#define IAVF_ALIGN_RING_DESC 32
#define IAVF_MIN_RING_DESC 64
-#define IAVF_MAX_RING_DESC 4096
+#define IAVF_MAX_RING_DESC (8192 - 32)
#define IAVF_DMA_MEM_ALIGN 4096
/* Base address of the HW descriptor ring should be 128B aligned. */
#define IAVF_RING_BASE_ALIGN 128
--
2.47.3
---------------------------------------------------------------------
Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.
Spolka oswiadcza, ze posiada status duzego przedsiebiorcy w rozumieniu ustawy z dnia 8 marca 2013 r. o przeciwdzialaniu nadmiernym opoznieniom w transakcjach handlowych.
Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by others is strictly prohibited.
^ permalink raw reply related [flat|nested] 27+ messages in thread* [PATCH v2 3/7] net/iavf: allow runtime queue rate limit configuration
2026-06-18 14:44 ` [PATCH v2 0/7] Intel network drivers enhancements Dawid Wesierski
2026-06-18 14:44 ` [PATCH v2 1/7] ethdev: add header split mbuf callback API Dawid Wesierski
2026-06-18 14:44 ` [PATCH v2 2/7] net/iavf: increase max ring descriptors to hardware limit Dawid Wesierski
@ 2026-06-18 14:44 ` Dawid Wesierski
2026-06-18 14:44 ` [PATCH v2 4/7] net/ice/base: reduce default scheduler burst size Dawid Wesierski
` (5 subsequent siblings)
8 siblings, 0 replies; 27+ messages in thread
From: Dawid Wesierski @ 2026-06-18 14:44 UTC (permalink / raw)
To: dev; +Cc: thomas, david.marchand, Marek Kasiewicz, Dawid Wesierski
From: Marek Kasiewicz <marek.kasiewicz@intel.com>
Allow per-queue bandwidth rate limiting to be configured without
stopping the port when only a single TC node and single QoS element
are involved. This enables dynamic session management where individual
queue pacing rates can be changed while other queues continue
transmitting.
Also fix the queue ID assignment in the bandwidth configuration to
use the actual TM node ID rather than a sequential counter index, and
only mark the TM hierarchy as committed when the port is stopped to
permit subsequent reconfiguration.
Signed-off-by: Marek Kasiewicz <marek.kasiewicz@intel.com>
Signed-off-by: Dawid Wesierski <dawid.wesierski@intel.com>
---
drivers/net/intel/iavf/iavf_tm.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/drivers/net/intel/iavf/iavf_tm.c b/drivers/net/intel/iavf/iavf_tm.c
index 1cf7bfb106..43d7a44337 100644
--- a/drivers/net/intel/iavf/iavf_tm.c
+++ b/drivers/net/intel/iavf/iavf_tm.c
@@ -804,8 +804,10 @@ static int iavf_hierarchy_commit(struct rte_eth_dev *dev,
int index = 0, node_committed = 0;
int i, ret_val = IAVF_SUCCESS;
- /* check if port is stopped */
- if (adapter->stopped != 1) {
+ /* check if port is stopped, except for setting queue bandwidth */
+ if (vf->tm_conf.nb_tc_node != 1 &&
+ vf->qos_cap->num_elem != 1 &&
+ adapter->stopped != 1) {
PMD_DRV_LOG(ERR, "Please stop port first");
ret_val = IAVF_ERR_NOT_READY;
goto err;
@@ -856,7 +858,7 @@ static int iavf_hierarchy_commit(struct rte_eth_dev *dev,
q_tc_mapping->tc[tm_node->tc].req.queue_count++;
if (tm_node->shaper_profile) {
- q_bw->cfg[node_committed].queue_id = node_committed;
+ q_bw->cfg[node_committed].queue_id = tm_node->id;
q_bw->cfg[node_committed].shaper.peak =
tm_node->shaper_profile->profile.peak.rate /
1000 * IAVF_BITS_PER_BYTE;
@@ -900,7 +902,8 @@ static int iavf_hierarchy_commit(struct rte_eth_dev *dev,
goto fail_clear;
vf->qtc_map = qtc_map;
- vf->tm_conf.committed = true;
+ if (adapter->stopped == 1)
+ vf->tm_conf.committed = true;
return ret_val;
fail_clear:
--
2.47.3
---------------------------------------------------------------------
Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.
Spolka oswiadcza, ze posiada status duzego przedsiebiorcy w rozumieniu ustawy z dnia 8 marca 2013 r. o przeciwdzialaniu nadmiernym opoznieniom w transakcjach handlowych.
Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by others is strictly prohibited.
^ permalink raw reply related [flat|nested] 27+ messages in thread* [PATCH v2 4/7] net/ice/base: reduce default scheduler burst size
2026-06-18 14:44 ` [PATCH v2 0/7] Intel network drivers enhancements Dawid Wesierski
` (2 preceding siblings ...)
2026-06-18 14:44 ` [PATCH v2 3/7] net/iavf: allow runtime queue rate limit configuration Dawid Wesierski
@ 2026-06-18 14:44 ` Dawid Wesierski
2026-06-18 14:44 ` [PATCH v2 5/7] net/ice: timestamp all received packets when PTP is enabled Dawid Wesierski
` (4 subsequent siblings)
8 siblings, 0 replies; 27+ messages in thread
From: Dawid Wesierski @ 2026-06-18 14:44 UTC (permalink / raw)
To: dev; +Cc: thomas, david.marchand, Marek Kasiewicz, Dawid Wesierski
From: Marek Kasiewicz <marek.kasiewicz@intel.com>
Reduce ICE_SCHED_DFLT_BURST_SIZE from 15 KB to 2 KB to improve
TX rate limiter granularity. The E810 TX scheduler uses a token
bucket algorithm where the burst size controls the maximum bytes
sent in a single burst before the rate limiter throttles.
A 15 KB burst allows micro-bursts of ~10 max-size frames, which
violates tight inter-packet spacing requirements in time-sensitive
networking applications such as SMPTE ST 2110-21 narrow-sender
compliance. Reducing to 2 KB forces near-constant-rate output
matching the configured shaper profile.
Signed-off-by: Marek Kasiewicz <marek.kasiewicz@intel.com>
Signed-off-by: Dawid Wesierski <dawid.wesierski@intel.com>
---
drivers/net/intel/ice/base/ice_type.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/intel/ice/base/ice_type.h b/drivers/net/intel/ice/base/ice_type.h
index 6d8c187689..39569ff3e3 100644
--- a/drivers/net/intel/ice/base/ice_type.h
+++ b/drivers/net/intel/ice/base/ice_type.h
@@ -1100,7 +1100,7 @@ enum ice_rl_type {
#define ICE_SCHED_NO_SHARED_RL_PROF_ID 0xFFFF
#define ICE_SCHED_DFLT_BW_WT 4
#define ICE_SCHED_INVAL_PROF_ID 0xFFFF
-#define ICE_SCHED_DFLT_BURST_SIZE (15 * 1024) /* in bytes (15k) */
+#define ICE_SCHED_DFLT_BURST_SIZE (2 * 1024) /* in bytes (2k) */
/* Access Macros for Tx Sched RL Profile data */
#define ICE_TXSCHED_GET_RL_PROF_ID(p) LE16_TO_CPU((p)->info.profile_id)
--
2.47.3
---------------------------------------------------------------------
Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.
Spolka oswiadcza, ze posiada status duzego przedsiebiorcy w rozumieniu ustawy z dnia 8 marca 2013 r. o przeciwdzialaniu nadmiernym opoznieniom w transakcjach handlowych.
Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by others is strictly prohibited.
^ permalink raw reply related [flat|nested] 27+ messages in thread* [PATCH v2 5/7] net/ice: timestamp all received packets when PTP is enabled
2026-06-18 14:44 ` [PATCH v2 0/7] Intel network drivers enhancements Dawid Wesierski
` (3 preceding siblings ...)
2026-06-18 14:44 ` [PATCH v2 4/7] net/ice/base: reduce default scheduler burst size Dawid Wesierski
@ 2026-06-18 14:44 ` Dawid Wesierski
2026-06-18 14:44 ` [PATCH v2 6/7] net/iavf: disable runtime queue setup capability Dawid Wesierski
` (3 subsequent siblings)
8 siblings, 0 replies; 27+ messages in thread
From: Dawid Wesierski @ 2026-06-18 14:44 UTC (permalink / raw)
To: dev; +Cc: thomas, david.marchand, Marek Kasiewicz, Dawid Wesierski
From: Marek Kasiewicz <marek.kasiewicz@intel.com>
When PTP is enabled on the ICE PMD, hardware RX timestamps are only
applied to packets classified as IEEE 1588 (Ethertype 0x88F7). This
prevents applications from obtaining hardware timestamps on regular
UDP/IP traffic.
Remove the TIMESYNC packet type filter so that all received packets
get hardware timestamps when PTP is enabled. This is required for
time-sensitive networking applications that need per-packet arrival
timing on media traffic, such as ST 2110-21 receiver compliance
monitoring.
The change affects all three RX paths: scan, scattered, and single
packet receive functions.
Signed-off-by: Marek Kasiewicz <marek.kasiewicz@intel.com>
Signed-off-by: Dawid Wesierski <dawid.wesierski@intel.com>
---
drivers/net/intel/ice/ice_rxtx.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index c4b5454c53..8d709125f7 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -2023,8 +2023,7 @@ ice_rx_scan_hw_ring(struct ci_rx_queue *rxq)
pkt_flags |= rxq->ts_flag;
}
- if (ad->ptp_ena && ((mb->packet_type &
- RTE_PTYPE_L2_MASK) == RTE_PTYPE_L2_ETHER_TIMESYNC)) {
+ if (ad->ptp_ena) {
rxq->time_high =
rte_le_to_cpu_32(rxdp[j].wb.flex_ts.ts_high);
mb->timesync = rxq->queue_id;
@@ -2390,8 +2389,7 @@ ice_recv_scattered_pkts(void *rx_queue,
pkt_flags |= rxq->ts_flag;
}
- if (ad->ptp_ena && ((first_seg->packet_type & RTE_PTYPE_L2_MASK)
- == RTE_PTYPE_L2_ETHER_TIMESYNC)) {
+ if (ad->ptp_ena) {
rxq->time_high =
rte_le_to_cpu_32(rxd.wb.flex_ts.ts_high);
first_seg->timesync = rxq->queue_id;
@@ -2881,8 +2879,7 @@ ice_recv_pkts(void *rx_queue,
pkt_flags |= rxq->ts_flag;
}
- if (ad->ptp_ena && ((rxm->packet_type & RTE_PTYPE_L2_MASK) ==
- RTE_PTYPE_L2_ETHER_TIMESYNC)) {
+ if (ad->ptp_ena) {
rxq->time_high =
rte_le_to_cpu_32(rxd.wb.flex_ts.ts_high);
rxm->timesync = rxq->queue_id;
--
2.47.3
---------------------------------------------------------------------
Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.
Spolka oswiadcza, ze posiada status duzego przedsiebiorcy w rozumieniu ustawy z dnia 8 marca 2013 r. o przeciwdzialaniu nadmiernym opoznieniom w transakcjach handlowych.
Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by others is strictly prohibited.
^ permalink raw reply related [flat|nested] 27+ messages in thread* [PATCH v2 6/7] net/iavf: disable runtime queue setup capability
2026-06-18 14:44 ` [PATCH v2 0/7] Intel network drivers enhancements Dawid Wesierski
` (4 preceding siblings ...)
2026-06-18 14:44 ` [PATCH v2 5/7] net/ice: timestamp all received packets when PTP is enabled Dawid Wesierski
@ 2026-06-18 14:44 ` Dawid Wesierski
2026-06-18 14:44 ` [PATCH v2 7/7] net/intel: support header split mbuf callback Dawid Wesierski
` (2 subsequent siblings)
8 siblings, 0 replies; 27+ messages in thread
From: Dawid Wesierski @ 2026-06-18 14:44 UTC (permalink / raw)
To: dev; +Cc: thomas, david.marchand, Marek Kasiewicz, Dawid Wesierski
From: Marek Kasiewicz <marek.kasiewicz@intel.com>
Remove the advertisement of RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP
and RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP capabilities from the
iavf VF driver.
Runtime queue setup on E810 VFs causes queue state corruption when
queues are dynamically reconfigured while the hardware rate limiter
is actively pacing TX queues. Queue configuration messages to the PF
via virtchnl can race with ongoing TX operations, leading to undefined
behavior.
By not advertising these capabilities, all queues are configured at
port start and remain stable throughout the port lifecycle.
Signed-off-by: Marek Kasiewicz <marek.kasiewicz@intel.com>
Signed-off-by: Dawid Wesierski <dawid.wesierski@intel.com>
---
drivers/net/intel/iavf/iavf_ethdev.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/drivers/net/intel/iavf/iavf_ethdev.c b/drivers/net/intel/iavf/iavf_ethdev.c
index ec1ad02826..ab223e6afd 100644
--- a/drivers/net/intel/iavf/iavf_ethdev.c
+++ b/drivers/net/intel/iavf/iavf_ethdev.c
@@ -1160,9 +1160,6 @@ iavf_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
dev_info->reta_size = vf->vf_res->rss_lut_size;
dev_info->flow_type_rss_offloads = IAVF_RSS_OFFLOAD_ALL;
dev_info->max_mac_addrs = IAVF_NUM_MACADDR_MAX;
- dev_info->dev_capa =
- RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
- RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
dev_info->rx_offload_capa =
RTE_ETH_RX_OFFLOAD_VLAN_STRIP |
RTE_ETH_RX_OFFLOAD_QINQ_STRIP |
--
2.47.3
---------------------------------------------------------------------
Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.
Spolka oswiadcza, ze posiada status duzego przedsiebiorcy w rozumieniu ustawy z dnia 8 marca 2013 r. o przeciwdzialaniu nadmiernym opoznieniom w transakcjach handlowych.
Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by others is strictly prohibited.
^ permalink raw reply related [flat|nested] 27+ messages in thread* [PATCH v2 7/7] net/intel: support header split mbuf callback
2026-06-18 14:44 ` [PATCH v2 0/7] Intel network drivers enhancements Dawid Wesierski
` (5 preceding siblings ...)
2026-06-18 14:44 ` [PATCH v2 6/7] net/iavf: disable runtime queue setup capability Dawid Wesierski
@ 2026-06-18 14:44 ` Dawid Wesierski
2026-06-18 15:45 ` [PATCH v2 0/7] Intel network drivers enhancements Stephen Hemminger
2026-06-18 15:46 ` Stephen Hemminger
8 siblings, 0 replies; 27+ messages in thread
From: Dawid Wesierski @ 2026-06-18 14:44 UTC (permalink / raw)
To: dev; +Cc: thomas, david.marchand, Marek Kasiewicz, Dawid Wesierski
From: Marek Kasiewicz <marek.kasiewicz@intel.com>
Wire the new ethdev header split mbuf callback API into the ICE PMD.
A new dev_ops hook, hdrs_mbuf_set_cb, lets applications register a
callback (and private context) on a receive queue; the callback returns
a payload buffer (virtual address and IOVA) that overrides the default
mempool-backed payload mbuf for header split RX.
The callback is invoked at three allocation points in the ICE driver:
- initial queue setup (ice_alloc_rx_queue_mbufs),
- bulk buffer allocation (ice_rx_alloc_bufs),
- single-packet receive path (ice_recv_pkts).
This enables zero-copy RX for header split: the NIC DMAs the payload
directly into application-managed buffers (e.g., mapped frame buffers
with known IOVA), bypassing an extra memcpy from the mempool mbuf.
Depends on: "ethdev: add header split mbuf callback API"
Signed-off-by: Marek Kasiewicz <marek.kasiewicz@intel.com>
Signed-off-by: Dawid Wesierski <dawid.wesierski@intel.com>
---
drivers/net/intel/common/rx.h | 2 +
drivers/net/intel/ice/ice_ethdev.c | 1 +
drivers/net/intel/ice/ice_rxtx.c | 63 ++++++++++++++++++++++++++++++
drivers/net/intel/ice/ice_rxtx.h | 2 +
4 files changed, 68 insertions(+)
diff --git a/drivers/net/intel/common/rx.h b/drivers/net/intel/common/rx.h
index e0bf520ebd..8abb2a3ce9 100644
--- a/drivers/net/intel/common/rx.h
+++ b/drivers/net/intel/common/rx.h
@@ -113,6 +113,8 @@ struct ci_rx_queue {
uint32_t hw_time_low; /* low 32 bits of timestamp */
int ts_offset; /* dynamic mbuf timestamp field offset */
uint64_t ts_flag; /* dynamic mbuf timestamp flag */
+ rte_eth_hdrs_mbuf_callback_fn hdrs_mbuf_cb; /* hdr split mbuf cb */
+ void *hdrs_mbuf_cb_priv; /* hdr split mbuf cb priv */
};
struct { /* iavf specific values */
const struct iavf_rxq_ops *ops; /**< queue ops */
diff --git a/drivers/net/intel/ice/ice_ethdev.c b/drivers/net/intel/ice/ice_ethdev.c
index ad9c49b339..353da8f2bd 100644
--- a/drivers/net/intel/ice/ice_ethdev.c
+++ b/drivers/net/intel/ice/ice_ethdev.c
@@ -282,6 +282,7 @@ static const struct eth_dev_ops ice_eth_dev_ops = {
.dev_set_link_down = ice_dev_set_link_down,
.dev_led_on = ice_dev_led_on,
.dev_led_off = ice_dev_led_off,
+ .hdrs_mbuf_set_cb = ice_hdrs_mbuf_set_cb,
.rx_queue_start = ice_rx_queue_start,
.rx_queue_stop = ice_rx_queue_stop,
.tx_queue_start = ice_tx_queue_start,
diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
index 8d709125f7..867f595291 100644
--- a/drivers/net/intel/ice/ice_rxtx.c
+++ b/drivers/net/intel/ice/ice_rxtx.c
@@ -487,6 +487,17 @@ ice_alloc_rx_queue_mbufs(struct ci_rx_queue *rxq)
return -ENOMEM;
}
+ if (rxq->hdrs_mbuf_cb) {
+ struct rte_eth_hdrs_mbuf hdrs_mbuf = {0};
+ int ret = rxq->hdrs_mbuf_cb(rxq->hdrs_mbuf_cb_priv,
+ &hdrs_mbuf);
+
+ if (ret >= 0) {
+ mbuf_pay->buf_addr = hdrs_mbuf.buf_addr;
+ mbuf_pay->buf_iova = hdrs_mbuf.buf_iova;
+ }
+ }
+
mbuf_pay->next = NULL;
mbuf_pay->data_off = RTE_PKTMBUF_HEADROOM;
mbuf_pay->nb_segs = 1;
@@ -2126,6 +2137,16 @@ ice_rx_alloc_bufs(struct ci_rx_queue *rxq)
rxdp[i].read.pkt_addr = dma_addr;
} else {
mb->next = rxq->sw_split_buf[i].mbuf;
+ if (rxq->hdrs_mbuf_cb && mb->next) {
+ struct rte_eth_hdrs_mbuf hdrs_mbuf = {0};
+ int ret = rxq->hdrs_mbuf_cb(rxq->hdrs_mbuf_cb_priv,
+ &hdrs_mbuf);
+
+ if (ret >= 0) {
+ mb->next->buf_addr = hdrs_mbuf.buf_addr;
+ mb->next->buf_iova = hdrs_mbuf.buf_iova;
+ }
+ }
pay_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(mb->next));
rxdp[i].read.hdr_addr = dma_addr;
rxdp[i].read.pkt_addr = pay_addr;
@@ -2810,6 +2831,17 @@ ice_recv_pkts(void *rx_queue,
break;
}
+ if (rxq->hdrs_mbuf_cb) {
+ struct rte_eth_hdrs_mbuf hdrs_mbuf = {0};
+ int ret = rxq->hdrs_mbuf_cb(rxq->hdrs_mbuf_cb_priv,
+ &hdrs_mbuf);
+
+ if (ret >= 0) {
+ nmb_pay->buf_addr = hdrs_mbuf.buf_addr;
+ nmb_pay->buf_iova = hdrs_mbuf.buf_iova;
+ }
+ }
+
nmb->next = nmb_pay;
nmb_pay->next = NULL;
@@ -4533,3 +4565,34 @@ ice_fdir_programming(struct ice_pf *pf, struct ice_fltr_desc *fdir_desc)
}
+
+int
+ice_hdrs_mbuf_set_cb(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+ void *priv, rte_eth_hdrs_mbuf_callback_fn cb)
+{
+ struct ci_rx_queue *rxq;
+
+ if (rx_queue_id >= dev->data->nb_rx_queues) {
+ PMD_DRV_LOG(ERR, "RX queue %u out of range", rx_queue_id);
+ return -EINVAL;
+ }
+
+ rxq = dev->data->rx_queues[rx_queue_id];
+ if (rxq == NULL) {
+ PMD_DRV_LOG(ERR, "RX queue %u not available or setup", rx_queue_id);
+ return -EINVAL;
+ }
+
+ if (rxq->hdrs_mbuf_cb) {
+ PMD_DRV_LOG(ERR, "RX queue %u has hdrs mbuf cb already",
+ rx_queue_id);
+ return -EEXIST;
+ }
+
+ rxq->hdrs_mbuf_cb_priv = priv;
+ rxq->hdrs_mbuf_cb = cb;
+ PMD_DRV_LOG(NOTICE, "RX queue %u register hdrs mbuf cb at %p",
+ rx_queue_id, cb);
+
+ return 0;
+}
diff --git a/drivers/net/intel/ice/ice_rxtx.h b/drivers/net/intel/ice/ice_rxtx.h
index 999b6b30d6..7ed114ee94 100644
--- a/drivers/net/intel/ice/ice_rxtx.h
+++ b/drivers/net/intel/ice/ice_rxtx.h
@@ -303,6 +303,8 @@ uint16_t ice_xmit_pkts_vec_avx512_offload(void *tx_queue,
int ice_fdir_programming(struct ice_pf *pf, struct ice_fltr_desc *fdir_desc);
int ice_tx_done_cleanup(void *txq, uint32_t free_cnt);
int ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc);
+int ice_hdrs_mbuf_set_cb(struct rte_eth_dev *dev, uint16_t rx_queue_id,
+ void *priv, rte_eth_hdrs_mbuf_callback_fn cb);
enum rte_vect_max_simd ice_get_max_simd_bitwidth(void);
#define FDIR_PARSING_ENABLE_PER_QUEUE(ad, on) do { \
--
2.47.3
---------------------------------------------------------------------
Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.
Spolka oswiadcza, ze posiada status duzego przedsiebiorcy w rozumieniu ustawy z dnia 8 marca 2013 r. o przeciwdzialaniu nadmiernym opoznieniom w transakcjach handlowych.
Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by others is strictly prohibited.
^ permalink raw reply related [flat|nested] 27+ messages in thread* Re: [PATCH v2 0/7] Intel network drivers enhancements
2026-06-18 14:44 ` [PATCH v2 0/7] Intel network drivers enhancements Dawid Wesierski
` (6 preceding siblings ...)
2026-06-18 14:44 ` [PATCH v2 7/7] net/intel: support header split mbuf callback Dawid Wesierski
@ 2026-06-18 15:45 ` Stephen Hemminger
2026-06-18 15:46 ` Stephen Hemminger
8 siblings, 0 replies; 27+ messages in thread
From: Stephen Hemminger @ 2026-06-18 15:45 UTC (permalink / raw)
To: Dawid Wesierski; +Cc: dev, thomas, david.marchand, Marek Kasiewicz
On Thu, 18 Jun 2026 10:44:35 -0400
Dawid Wesierski <dawid.wesierski@intel.com> wrote:
> From: Marek Kasiewicz <marek.kasiewicz@intel.com>
>
> This series introduces several improvements to Intel iavf and ice
> drivers, including a new ethdev API for header split mbuf callbacks,
> increased ring descriptors, and improved PTP timestamping.
>
> Marek Kasiewicz (7):
> ethdev: add header split mbuf callback API
> net/iavf: increase max ring descriptors to hardware limit
> net/iavf: allow runtime queue rate limit configuration
> net/ice/base: reduce default scheduler burst size
> net/ice: timestamp all received packets when PTP is enabled
> net/iavf: disable runtime queue setup capability
> net/intel: support header split mbuf callback
>
> drivers/net/intel/common/rx.h | 2 +
> drivers/net/intel/iavf/iavf_ethdev.c | 3 --
> drivers/net/intel/iavf/iavf_rxtx.h | 2 +-
> drivers/net/intel/iavf/iavf_tm.c | 11 ++--
> drivers/net/intel/ice/base/ice_type.h | 2 +-
> drivers/net/intel/ice/ice_ethdev.c | 1 +
> drivers/net/intel/ice/ice_rxtx.c | 72 ++++++++++++++++++++++++---
> drivers/net/intel/ice/ice_rxtx.h | 2 +
> lib/ethdev/ethdev_driver.h | 15 +++++++
> lib/ethdev/rte_ethdev.c | 51 ++++++++++++++++++++++
> lib/ethdev/rte_ethdev.h | 7 +++
> 11 files changed, 153 insertions(+), 15 deletions(-)
>
This looks interesting but I don't understand the motivation
or use case for the new code. There is no documentation, examples
or test integration. At this point the patch is in pure RFC state.
More wordy answer from AI...
Documentation and motivation
This adds a new public ethdev RX mechanism but ships no prose
rationale, no documentation, and no worked example. For a new API
the series needs to make the case before the code can be evaluated.
Missing documentation:
- No prog_guide section. A new RX model -- header split where the
application supplies the payload buffer -- needs a write-up (e.g.
in doc/guides/prog_guide) covering the buffer-lifetime contract,
the IOVA/headroom semantics, which offloads must be enabled
(RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT), and what happens on a PMD that
does not implement the hook.
- No example or testpmd integration. There is no end-to-end flow a
reviewer can read or run: register the callback, supply buffers,
receive, consume, recycle. Without one there is no demonstrated
use case and no way to validate the design (the mempool-corruption
and headroom issues in 7/7 would likely have surfaced in a working
example).
Missing motivation:
- The actual use case is not named. "Zero-copy RX into mapped frame
buffers with known IOVA" describes a mechanism, not a user. Who
needs this -- an AF_XDP-style external umem, a GPU/DMA target
(gpudev), something else? State it.
- The benefit is unquantified. The claim is avoiding "an extra
memcpy from the mempool mbuf," but the patch replaces that with a
per-allocation indirect callback on the Rx fast path
(ice_recv_pkts, ice_rx_alloc_bufs). The rationale should show the
memcpy actually exists in the path being optimized and that the
indirect call per buffer is a net win.
- It does not explain why existing mechanisms are insufficient. DPDK
already lets an application own payload memory via
rte_pktmbuf_attach_extbuf() and via external-buffer mempools
(pinned external memory passed to the Rx mempool). The series must
argue why a new callback API is needed instead of those -- this is
the same alternative raised against the 7/7 implementation, so the
motivation and the correctness fix point at the same question.
- Constraints are undocumented: PMD support scope, fast-path
threading and reentrancy of the callback, and interaction with the
buffer-split configuration.
Until there is a documented model and at least one example showing
the buffer lifecycle, it is hard to tell whether the callback API is
the right shape -- or whether attach_extbuf / an external-mempool
already covers the use case.
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH v2 0/7] Intel network drivers enhancements
2026-06-18 14:44 ` [PATCH v2 0/7] Intel network drivers enhancements Dawid Wesierski
` (7 preceding siblings ...)
2026-06-18 15:45 ` [PATCH v2 0/7] Intel network drivers enhancements Stephen Hemminger
@ 2026-06-18 15:46 ` Stephen Hemminger
8 siblings, 0 replies; 27+ messages in thread
From: Stephen Hemminger @ 2026-06-18 15:46 UTC (permalink / raw)
To: Dawid Wesierski; +Cc: dev, thomas, david.marchand, Marek Kasiewicz
On Thu, 18 Jun 2026 10:44:35 -0400
Dawid Wesierski <dawid.wesierski@intel.com> wrote:
> From: Marek Kasiewicz <marek.kasiewicz@intel.com>
>
> This series introduces several improvements to Intel iavf and ice
> drivers, including a new ethdev API for header split mbuf callbacks,
> increased ring descriptors, and improved PTP timestamping.
>
> Marek Kasiewicz (7):
> ethdev: add header split mbuf callback API
> net/iavf: increase max ring descriptors to hardware limit
> net/iavf: allow runtime queue rate limit configuration
> net/ice/base: reduce default scheduler burst size
> net/ice: timestamp all received packets when PTP is enabled
> net/iavf: disable runtime queue setup capability
> net/intel: support header split mbuf callback
>
> drivers/net/intel/common/rx.h | 2 +
> drivers/net/intel/iavf/iavf_ethdev.c | 3 --
> drivers/net/intel/iavf/iavf_rxtx.h | 2 +-
> drivers/net/intel/iavf/iavf_tm.c | 11 ++--
> drivers/net/intel/ice/base/ice_type.h | 2 +-
> drivers/net/intel/ice/ice_ethdev.c | 1 +
> drivers/net/intel/ice/ice_rxtx.c | 72 ++++++++++++++++++++++++---
> drivers/net/intel/ice/ice_rxtx.h | 2 +
> lib/ethdev/ethdev_driver.h | 15 +++++++
> lib/ethdev/rte_ethdev.c | 51 ++++++++++++++++++++++
> lib/ethdev/rte_ethdev.h | 7 +++
> 11 files changed, 153 insertions(+), 15 deletions(-)
>
AI review found lots of problems with the implementation as well.
Series composition
This v2 series bundles seven patches under one cover letter, but only
1/7 (ethdev API) and 7/7 (net/intel) implement the header-split mbuf
callback. Patches 2/7 (iavf max ring desc), 3/7 (iavf queue rate
limit), 4/7 (ice base burst size), 5/7 (ice PTP timestamp), and 6/7
(iavf runtime queue setup) are unrelated driver changes. Please split
these into a separate series so each can be reviewed and merged on its
own merits; mixing them obscures the dependency and complicates bisect.
[PATCH v2 1/7] ethdev: add header split mbuf callback API
Warning: struct rte_eth_hdrs_mbuf carries only buf_addr and buf_iova,
no length. The callback therefore cannot tell the PMD how large the
application buffer is. In buffer-split mode the NIC DMAs up to the
queue's configured payload buffer size; if the application buffer is
smaller, the hardware writes past it. Add a length field and have the
PMD (and/or ethdev) validate it against the configured payload size.
Warning: the buffer lifetime and ownership are undocumented. The
Doxygen does not state who owns the buffer after the payload mbuf is
freed, whether the address is consumed as-is or with headroom applied
(the 7/7 implementation adds RTE_PKTMBUF_HEADROOM to the IOVA -- see
below), or that the callback must return a distinct buffer per call.
These are contract details an application cannot guess.
Warning: new experimental API with no testpmd hook and no functional
test in app/test. Both are required for new ethdev API.
Warning: new experimental public API but no release notes entry
(doc/guides/rel_notes/release_26_07.rst). The series touches only the
three code files.
Info: rte_eth_hdrs_set_mbuf_callback() validates port_id and the
hdrs_mbuf_set_cb hook, but passes rx_queue_id to the PMD without
checking it against dev->data->nb_rx_queues. Sibling per-queue ethdev
calls (e.g. rte_eth_rx_queue_setup) validate the queue id at the
ethdev layer. The ICE PMD does check it in 7/7, but validating in the
wrapper would be consistent and would protect future PMDs that forget.
[PATCH v2 7/7] net/intel: support header split mbuf callback
Error: overwriting buf_addr/buf_iova on a pool mbuf corrupts the
payload mempool. At all three sites (ice_alloc_rx_queue_mbufs,
ice_rx_alloc_bufs, ice_recv_pkts) the payload mbuf is allocated from
rxq->rxseg[1].mp and then has its buf_addr/buf_iova rewritten to point
at application memory:
if (ret >= 0) {
mbuf_pay->buf_addr = hdrs_mbuf.buf_addr;
mbuf_pay->buf_iova = hdrs_mbuf.buf_iova;
}
DPDK never restores buf_addr/buf_iova on free. When these mbufs are
returned to rxseg[1].mp (via rte_pktmbuf_free of the received chain,
or _ice_rx_queue_release_mbufs on queue stop), the pool objects retain
the foreign buffer pointer, and buf_len still describes the original
mempool buffer. The next rte_mbuf_raw_alloc()/raw_alloc_bulk() from
that pool hands back an mbuf pointing at application memory. The
correct mechanism for pointing an mbuf at non-pool memory is
rte_pktmbuf_attach_extbuf() (sets buf_addr/buf_iova/buf_len, the
EXTERNAL flag, and a free callback), or a dedicated external-buffer
mempool. As written the payload pool is progressively poisoned.
Error: the callback's failure return is swallowed. On ret < 0 the code
skips the address update but still programs the descriptor from the
mbuf's current buf_iova:
rxdp->read.pkt_addr =
rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb_pay));
Because of the pool corruption above, a recycled mbuf's buf_iova may
point at a buffer the application already reclaimed, so a failed
callback silently arms the NIC to DMA into stale/foreign memory. A
callback failure should abort the refill (or fall back to a known-good
pool buffer), not fall through. Separately, the documented contract is
"0 on success, negative on failure"; the test should be ret == 0, not
ret >= 0.
Warning: RTE_PKTMBUF_HEADROOM is silently applied to the supplied
IOVA. The descriptor is programmed with rte_mbuf_data_iova_default(),
i.e. buf_iova + data_off (128), but the API documents buf_iova as "the
IOVA of the payload buffer". The NIC therefore writes 128 bytes into
the application buffer, and the tail of a buffer sized to the payload
overruns by the headroom. Either document that the headroom offset is
applied, or program pkt_addr from the raw buf_iova for callback-
provided buffers.
^ permalink raw reply [flat|nested] 27+ messages in thread