* [PATCH 1/2] ethdev: support selective Rx data
@ 2026-02-02 16:09 Gregory Etelson
2026-02-02 16:09 ` [PATCH 2/2] app/testpmd: " Gregory Etelson
2026-02-02 18:17 ` [PATCH 1/2] ethdev: " Stephen Hemminger
0 siblings, 2 replies; 4+ messages in thread
From: Gregory Etelson @ 2026-02-02 16:09 UTC (permalink / raw)
To: dev; +Cc: getelson, mkashani, Thomas Monjalon, Andrew Rybchenko
In some cases application does not need to receive entire packet
from port hardware.
If application could receive required Rx data only and safely discard
the rest of Rx packet data, that could improve port performance by
reducing PCI bandwidth and application memory consumption.
Selective Rx data allows application to receive
only pre-configured packet segments and discard the rest.
For example:
- Deliver the first N bytes only.
- Deliver the last N bytes only.
- Deliver N1 bytes from offset Off1 and N2 bytes from offset Off2.
Selective Rx data is implemented on-top of the existing Rx
BUFFER_SPLIT functionality:
- The rte_eth_rxseg_split will use the NULL mempool for data segments
that should be discarded.
- PMD will not create MBUF segments if no data was read.
For example: Deliver Ethernet header only
Rx queue segments configuration:
struct rte_eth_rxseg_split split[2] = {
{
.mp = <some mempool>,
.length = sizeof(struct rte_ether_hdr)
},
{
.mp = NULL, /* discard data */
.length = <MTU>
}
};
Received MBUF configuration:
mbuf[0].pkt_len = sizeof(struct rte_ether_hdr);
mbuf[0].data_len = sizeof(struct rte_ether_hdr);
mbuf[0].next = NULL; /* The next segment did not deliver data */
After selective Rx, the mbuf packet length reflects only the
existing data that was actually received, and can be less than the
original wire packet length.
A PMD activates the selective Rx data capability by setting the
rte_eth_rxseg_capa.selective_read bit.
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
---
lib/ethdev/rte_ethdev.c | 10 +++++++---
lib/ethdev/rte_ethdev.h | 8 +++++++-
2 files changed, 14 insertions(+), 4 deletions(-)
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index c6fe0d5165..68a51c97c5 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -2161,9 +2161,11 @@ rte_eth_rx_queue_check_split(uint16_t port_id,
uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr;
if (mpl == NULL) {
- RTE_ETHDEV_LOG_LINE(ERR, "null mempool pointer");
- ret = -EINVAL;
- goto out;
+ if (dev_info->rx_seg_capa.selective_read == 0) {
+ RTE_ETHDEV_LOG_LINE(ERR, "null mempool pointer");
+ ret = -EINVAL;
+ goto out;
+ }
}
if (seg_idx != 0 && mp_first != mpl &&
seg_capa->multi_pools == 0) {
@@ -2185,6 +2187,8 @@ rte_eth_rx_queue_check_split(uint16_t port_id,
goto out;
}
}
+ if (mpl == NULL)
+ goto out;
offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
*mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index a66c2abbdb..173c773e72 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1121,7 +1121,12 @@ struct rte_eth_txmode {
* The rest will be put into the last valid pool.
*/
struct rte_eth_rxseg_split {
- struct rte_mempool *mp; /**< Memory pool to allocate segment from. */
+ /**
+ * Memory pool to allocate segment from.
+ * NULL means skipped segment in selective Rx data. @see selective_read.
+ * Skipped Rx segment length is not reflected in mbuf packet length.
+ */
+ struct rte_mempool *mp;
uint16_t length; /**< Segment data length, configures split point. */
uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */
/**
@@ -1758,6 +1763,7 @@ struct rte_eth_rxseg_capa {
uint32_t multi_pools:1; /**< Supports receiving to multiple pools.*/
uint32_t offset_allowed:1; /**< Supports buffer offsets. */
uint32_t offset_align_log2:4; /**< Required offset alignment. */
+ uint32_t selective_read:1; /**< Supports selective read. */
uint16_t max_nseg; /**< Maximum amount of segments to split. */
uint16_t reserved; /**< Reserved field. */
};
--
2.51.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 2/2] app/testpmd: support selective Rx data
2026-02-02 16:09 [PATCH 1/2] ethdev: support selective Rx data Gregory Etelson
@ 2026-02-02 16:09 ` Gregory Etelson
2026-02-02 17:37 ` Stephen Hemminger
2026-02-02 18:17 ` [PATCH 1/2] ethdev: " Stephen Hemminger
1 sibling, 1 reply; 4+ messages in thread
From: Gregory Etelson @ 2026-02-02 16:09 UTC (permalink / raw)
To: dev; +Cc: getelson, mkashani, Claude Sonnet 4.5, Aman Singh
Add support for selective Rx data using existing rxoffs and rxpkts
command line parameters.
When both rxoffs and rxpkts are specified on PMDs supporting
selective Rx data (selective_read capability), testpmd automatically:
1. Inserts segments with NULL mempool for gaps between configured
segments to discard unwanted data.
2. Adds a trailing segment with NULL mempool to cover any remaining
data up to MTU.
Example usage to receive only Ethernet header and a segment at
offset 128:
--rxoffs=0,128 --rxpkts=14,64
This creates segments:
- [0-13]: 14 bytes with mempool (received)
- [14-127]: 114 bytes with NULL mempool (discarded)
- [128-191]: 64 bytes with mempool (received)
- [192-MTU]: remaining bytes with NULL mempool (discarded)
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Signed-off-by: Claude Sonnet 4.5 <noreply@anthropic.com>
---
app/test-pmd/testpmd.c | 74 +++++++++++++++++++++++++--
doc/guides/testpmd_app_ug/run_app.rst | 19 +++++++
2 files changed, 88 insertions(+), 5 deletions(-)
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 1fe41d852a..62129f0d28 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -2676,11 +2676,58 @@ rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
int ret;
- if ((rx_pkt_nb_segs > 1) &&
+ if ((rx_pkt_nb_segs > 1 || rx_pkt_nb_offs > 0) &&
(rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT)) {
+ struct rte_eth_dev_info dev_info;
+ uint16_t seg_idx = 0;
+ uint16_t next_offset = 0;
+ uint16_t mtu = 0;
+ bool selective_rx;
+
+ ret = rte_eth_dev_info_get(port_id, &dev_info);
+ if (ret != 0)
+ return ret;
+
+ selective_rx = rx_pkt_nb_offs > 0 &&
+ dev_info.rx_seg_capa.selective_read != 0;
+
+ if (selective_rx) {
+ ret = rte_eth_dev_get_mtu(port_id, &mtu);
+ if (ret != 0)
+ return ret;
+ }
+
/* multi-segment configuration */
for (i = 0; i < rx_pkt_nb_segs; i++) {
- struct rte_eth_rxseg_split *rx_seg = &rx_useg[i].split;
+ struct rte_eth_rxseg_split *rx_seg;
+ uint16_t seg_offset;
+
+ seg_offset = i < rx_pkt_nb_offs ?
+ rx_pkt_seg_offsets[i] : next_offset;
+
+ /* Insert gap segment if selective Rx and there's a gap */
+ if (selective_rx && seg_offset > next_offset) {
+ if (seg_idx >= MAX_SEGS_BUFFER_SPLIT) {
+ fprintf(stderr,
+ "Too many segments (max %u)\n",
+ MAX_SEGS_BUFFER_SPLIT);
+ return -EINVAL;
+ }
+ rx_seg = &rx_useg[seg_idx++].split;
+ rx_seg->offset = next_offset;
+ rx_seg->length = seg_offset - next_offset;
+ rx_seg->mp = NULL; /* Discard gap data */
+ next_offset = seg_offset;
+ }
+
+ /* Add the actual data segment */
+ if (seg_idx >= MAX_SEGS_BUFFER_SPLIT) {
+ fprintf(stderr,
+ "Too many segments (max %u)\n",
+ MAX_SEGS_BUFFER_SPLIT);
+ return -EINVAL;
+ }
+ rx_seg = &rx_useg[seg_idx++].split;
/*
* Use last valid pool for the segments with number
* exceeding the pool index.
@@ -2688,8 +2735,7 @@ rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
mp_n = (i >= mbuf_data_size_n) ? mbuf_data_size_n - 1 : i;
mpx = mbuf_pool_find(socket_id, mp_n);
/* Handle zero as mbuf data buffer size. */
- rx_seg->offset = i < rx_pkt_nb_offs ?
- rx_pkt_seg_offsets[i] : 0;
+ rx_seg->offset = seg_offset;
rx_seg->mp = mpx ? mpx : mp;
if (rx_pkt_hdr_protos[i] != 0 && rx_pkt_seg_lengths[i] == 0) {
rx_seg->proto_hdr = rx_pkt_hdr_protos[i] & ~prev_hdrs;
@@ -2699,8 +2745,26 @@ rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
rx_pkt_seg_lengths[i] :
mbuf_data_size[mp_n];
}
+
+ if (selective_rx)
+ next_offset = seg_offset + rx_seg->length;
}
- rx_conf->rx_nseg = rx_pkt_nb_segs;
+
+ /* Add trailing segment to MTU if selective Rx enabled */
+ if (selective_rx && next_offset < mtu) {
+ if (seg_idx >= MAX_SEGS_BUFFER_SPLIT) {
+ fprintf(stderr,
+ "Too many segments (max %u)\n",
+ MAX_SEGS_BUFFER_SPLIT);
+ return -EINVAL;
+ }
+ rx_useg[seg_idx].split.offset = next_offset;
+ rx_useg[seg_idx].split.length = mtu - next_offset;
+ rx_useg[seg_idx].split.mp = NULL; /* Discard trailing data */
+ seg_idx++;
+ }
+
+ rx_conf->rx_nseg = seg_idx;
rx_conf->rx_seg = rx_useg;
rx_conf->rx_mempools = NULL;
rx_conf->rx_nmempool = 0;
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
index 97d6c75716..638c0b0eb3 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -364,6 +364,11 @@ The command line options are:
feature is engaged. Affects only the queues configured
with split offloads (currently BUFFER_SPLIT is supported only).
+ When used with ``--rxpkts`` on PMDs supporting selective Rx data,
+ enables receiving only specific packet segments and discarding the rest.
+ Gaps between configured segments and any trailing data up to MTU are
+ automatically filled with NULL mempool segments (data is discarded).
+
* ``--rxpkts=X[,Y]``
Set the length of segments to scatter packets on receiving if split
@@ -373,6 +378,20 @@ The command line options are:
command line parameter and the mbufs to receive will be allocated
sequentially from these extra memory pools.
+ **Selective Rx Data Example:**
+
+ To receive only the Ethernet header (14 bytes at offset 0) and
+ a 64-byte segment starting at offset 128, while discarding the rest::
+
+ --rxoffs=0,128 --rxpkts=14,64
+
+ This configuration will:
+
+ * Receive 14 bytes at offset 0 (Ethernet header)
+ * Discard bytes 14-127 (inserted NULL mempool segment)
+ * Receive 64 bytes at offset 128
+ * Discard bytes 192-MTU (inserted NULL mempool segment)
+
* ``--txpkts=X[,Y]``
Set TX segment sizes or total packet length. Valid for ``tx-only``
--
2.51.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH 2/2] app/testpmd: support selective Rx data
2026-02-02 16:09 ` [PATCH 2/2] app/testpmd: " Gregory Etelson
@ 2026-02-02 17:37 ` Stephen Hemminger
0 siblings, 0 replies; 4+ messages in thread
From: Stephen Hemminger @ 2026-02-02 17:37 UTC (permalink / raw)
To: Gregory Etelson; +Cc: dev, mkashani, Claude Sonnet 4.5, Aman Singh
On Mon, 2 Feb 2026 18:09:03 +0200
Gregory Etelson <getelson@nvidia.com> wrote:
> diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
> index 97d6c75716..638c0b0eb3 100644
> --- a/doc/guides/testpmd_app_ug/run_app.rst
> +++ b/doc/guides/testpmd_app_ug/run_app.rst
> @@ -364,6 +364,11 @@ The command line options are:
> feature is engaged. Affects only the queues configured
> with split offloads (currently BUFFER_SPLIT is supported only).
>
> + When used with ``--rxpkts`` on PMDs supporting selective Rx data,
> + enables receiving only specific packet segments and discarding the rest.
> + Gaps between configured segments and any trailing data up to MTU are
> + a
I don't see driver support for this. Seems like you are putting out
something that is not testable yet.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 1/2] ethdev: support selective Rx data
2026-02-02 16:09 [PATCH 1/2] ethdev: support selective Rx data Gregory Etelson
2026-02-02 16:09 ` [PATCH 2/2] app/testpmd: " Gregory Etelson
@ 2026-02-02 18:17 ` Stephen Hemminger
1 sibling, 0 replies; 4+ messages in thread
From: Stephen Hemminger @ 2026-02-02 18:17 UTC (permalink / raw)
To: Gregory Etelson; +Cc: dev, mkashani, Thomas Monjalon, Andrew Rybchenko
On Mon, 2 Feb 2026 18:09:02 +0200
Gregory Etelson <getelson@nvidia.com> wrote:
> In some cases application does not need to receive entire packet
> from port hardware.
> If application could receive required Rx data only and safely discard
> the rest of Rx packet data, that could improve port performance by
> reducing PCI bandwidth and application memory consumption.
>
> Selective Rx data allows application to receive
> only pre-configured packet segments and discard the rest.
> For example:
> - Deliver the first N bytes only.
> - Deliver the last N bytes only.
> - Deliver N1 bytes from offset Off1 and N2 bytes from offset Off2.
>
> Selective Rx data is implemented on-top of the existing Rx
> BUFFER_SPLIT functionality:
> - The rte_eth_rxseg_split will use the NULL mempool for data segments
> that should be discarded.
> - PMD will not create MBUF segments if no data was read.
>
> For example: Deliver Ethernet header only
>
> Rx queue segments configuration:
> struct rte_eth_rxseg_split split[2] = {
> {
> .mp = <some mempool>,
> .length = sizeof(struct rte_ether_hdr)
> },
> {
> .mp = NULL, /* discard data */
> .length = <MTU>
> }
> };
>
> Received MBUF configuration:
> mbuf[0].pkt_len = sizeof(struct rte_ether_hdr);
> mbuf[0].data_len = sizeof(struct rte_ether_hdr);
> mbuf[0].next = NULL; /* The next segment did not deliver data */
And nb_segs should be 1?
And mbuf must still pass sanity check?
>
> After selective Rx, the mbuf packet length reflects only the
> existing data that was actually received, and can be less than the
> original wire packet length.
>
> A PMD activates the selective Rx data capability by setting the
> rte_eth_rxseg_capa.selective_read bit.
>
> Signed-off-by: Gregory Etelson <getelson@nvidia.com>
> ---
Need documentation updates as well.
At a minimum:
- entry in nic/guides/features/default.ini
- update to the ethdev documentation
- release note
It would also be good if one or more examples used split but
that can wait.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-02-02 18:18 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-02 16:09 [PATCH 1/2] ethdev: support selective Rx data Gregory Etelson
2026-02-02 16:09 ` [PATCH 2/2] app/testpmd: " Gregory Etelson
2026-02-02 17:37 ` Stephen Hemminger
2026-02-02 18:17 ` [PATCH 1/2] ethdev: " Stephen Hemminger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox