* [PATCH] net/idpf: handle Tx of mbuf segments larger than 16k
@ 2026-03-03 15:00 Bruce Richardson
2026-03-04 9:53 ` Bruce Richardson
2026-03-06 13:45 ` Burakov, Anatoly
0 siblings, 2 replies; 5+ messages in thread
From: Bruce Richardson @ 2026-03-03 15:00 UTC (permalink / raw)
To: dev
Cc: Bruce Richardson, stable, Jingjing Wu, Praveen Shetty, Xiaoyun Li,
Beilei Xing, Junfeng Guo
Recent rework of the Tx single-queue path in idpf aligned that path with
that of other drivers, meaning it now supports segments of size greater
than 16k. Rework the split-queue path to similarly support those large
segments.
Fixes: 770f4dfe0f79 ("net/idpf: support basic Tx data path")
Cc: stable@dpdk.org
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
drivers/net/intel/idpf/idpf_common_rxtx.c | 98 ++++++++++++++---------
1 file changed, 60 insertions(+), 38 deletions(-)
diff --git a/drivers/net/intel/idpf/idpf_common_rxtx.c b/drivers/net/intel/idpf/idpf_common_rxtx.c
index b8f6418d4a..981b4b8eee 100644
--- a/drivers/net/intel/idpf/idpf_common_rxtx.c
+++ b/drivers/net/intel/idpf/idpf_common_rxtx.c
@@ -890,7 +890,7 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
volatile struct idpf_flex_tx_sched_desc *txd;
struct ci_tx_entry *sw_ring;
union ci_tx_offload tx_offload = {0};
- struct ci_tx_entry *txe, *txn;
+ struct ci_tx_entry *txe;
uint16_t nb_used, tx_id, sw_id;
struct rte_mbuf *tx_pkt;
uint16_t nb_to_clean;
@@ -911,44 +911,44 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
tx_pkt = tx_pkts[nb_tx];
- if (txq->nb_tx_free <= txq->tx_free_thresh) {
- /* TODO: Need to refine
- * 1. free and clean: Better to decide a clean destination instead of
- * loop times. And don't free mbuf when RS got immediately, free when
- * transmit or according to the clean destination.
- * Now, just ignore the RE write back, free mbuf when get RS
- * 2. out-of-order rewrite back haven't be supported, SW head and HW head
- * need to be separated.
- **/
- nb_to_clean = 2 * txq->tx_rs_thresh;
- while (nb_to_clean--)
- idpf_split_tx_free(txq->complq);
- }
-
- if (txq->nb_tx_free < tx_pkt->nb_segs)
- break;
-
cmd_dtype = 0;
ol_flags = tx_pkt->ol_flags;
tx_offload.l2_len = tx_pkt->l2_len;
tx_offload.l3_len = tx_pkt->l3_len;
tx_offload.l4_len = tx_pkt->l4_len;
tx_offload.tso_segsz = tx_pkt->tso_segsz;
+
/* Calculate the number of context descriptors needed. */
uint64_t cd_qw0 = 0, cd_qw1 = 0;
nb_ctx = idpf_set_tso_ctx(ol_flags, tx_pkt, &tx_offload, txq,
&cd_qw0, &cd_qw1);
- /* Calculate the number of TX descriptors needed for
- * each packet. For TSO packets, use ci_calc_pkt_desc as
- * the mbuf data size might exceed max data size that hw allows
- * per tx desc.
+ /* Calculate the number of TX descriptors needed for each packet.
+ * For TSO packets, use ci_calc_pkt_desc as the mbuf data size
+ * might exceed the max data size that hw allows per tx desc.
*/
- if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+ if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
nb_used = ci_calc_pkt_desc(tx_pkt) + nb_ctx;
else
nb_used = tx_pkt->nb_segs + nb_ctx;
+ if (txq->nb_tx_free <= txq->tx_free_thresh) {
+ /* TODO: Need to refine
+ * 1. free and clean: Better to decide a clean destination instead of
+ * loop times. And don't free mbuf when RS got immediately, free when
+ * transmit or according to the clean destination.
+ * Now, just ignore the RE write back, free mbuf when get RS
+ * 2. out-of-order rewrite back haven't be supported, SW head and HW head
+ * need to be separated.
+ **/
+ nb_to_clean = 2 * txq->tx_rs_thresh;
+ while (nb_to_clean--)
+ idpf_split_tx_free(txq->complq);
+ }
+
+ if (txq->nb_tx_free < nb_used)
+ break;
+
if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
cmd_dtype = IDPF_TXD_FLEX_FLOW_CMD_CS_EN;
@@ -959,30 +959,52 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
ctx_desc[0] = cd_qw0;
ctx_desc[1] = cd_qw1;
- tx_id++;
- if (tx_id == txq->nb_tx_desc)
+ if (++tx_id == txq->nb_tx_desc)
tx_id = 0;
}
+ cmd_dtype |= IDPF_TX_DESC_DTYPE_FLEX_FLOW_SCHE;
+ struct rte_mbuf *m_seg = tx_pkt;
do {
- txd = &txr[tx_id];
- txn = &sw_ring[txe->next_id];
- txe->mbuf = tx_pkt;
+ uint64_t buf_dma_addr = rte_mbuf_data_iova(m_seg);
+ uint16_t slen = m_seg->data_len;
+
+ txe->mbuf = m_seg;
+
+ /* For TSO, split large segments that exceed the
+ * per-descriptor data limit, matching the behaviour of
+ * ci_xmit_pkts() on the singleq path.
+ */
+ while ((ol_flags & (RTE_MBUF_F_TX_TCP_SEG |
+ RTE_MBUF_F_TX_UDP_SEG)) &&
+ unlikely(slen > CI_MAX_DATA_PER_TXD)) {
+ txd = &txr[tx_id];
+ txd->buf_addr = rte_cpu_to_le_64(buf_dma_addr);
+ txd->qw1.cmd_dtype = cmd_dtype;
+ txd->qw1.rxr_bufsize = CI_MAX_DATA_PER_TXD;
+ txd->qw1.compl_tag = sw_id;
+ buf_dma_addr += CI_MAX_DATA_PER_TXD;
+ slen -= CI_MAX_DATA_PER_TXD;
+ if (++tx_id == txq->nb_tx_desc)
+ tx_id = 0;
+ sw_id = txe->next_id;
+ txe = &sw_ring[sw_id];
+ /* sub-descriptor slots do not own the mbuf */
+ txe->mbuf = NULL;
+ }
- /* Setup TX descriptor */
- txd->buf_addr =
- rte_cpu_to_le_64(rte_mbuf_data_iova(tx_pkt));
- cmd_dtype |= IDPF_TX_DESC_DTYPE_FLEX_FLOW_SCHE;
+ /* Write the final (or only) descriptor for this segment */
+ txd = &txr[tx_id];
+ txd->buf_addr = rte_cpu_to_le_64(buf_dma_addr);
txd->qw1.cmd_dtype = cmd_dtype;
- txd->qw1.rxr_bufsize = tx_pkt->data_len;
+ txd->qw1.rxr_bufsize = slen;
txd->qw1.compl_tag = sw_id;
- tx_id++;
- if (tx_id == txq->nb_tx_desc)
+ if (++tx_id == txq->nb_tx_desc)
tx_id = 0;
sw_id = txe->next_id;
- txe = txn;
- tx_pkt = tx_pkt->next;
- } while (tx_pkt);
+ txe = &sw_ring[sw_id];
+ m_seg = m_seg->next;
+ } while (m_seg);
/* fill the last descriptor with End of Packet (EOP) bit */
txd->qw1.cmd_dtype |= IDPF_TXD_FLEX_FLOW_CMD_EOP;
--
2.51.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] net/idpf: handle Tx of mbuf segments larger than 16k
2026-03-03 15:00 [PATCH] net/idpf: handle Tx of mbuf segments larger than 16k Bruce Richardson
@ 2026-03-04 9:53 ` Bruce Richardson
2026-03-06 13:45 ` Burakov, Anatoly
1 sibling, 0 replies; 5+ messages in thread
From: Bruce Richardson @ 2026-03-04 9:53 UTC (permalink / raw)
To: dev
On Tue, Mar 03, 2026 at 03:00:26PM +0000, Bruce Richardson wrote:
> Recent rework of the Tx single-queue path in idpf aligned that path with
> that of other drivers, meaning it now supports segments of size greater
> than 16k. Rework the split-queue path to similarly support those large
> segments.
>
> Fixes: 770f4dfe0f79 ("net/idpf: support basic Tx data path")
> Cc: stable@dpdk.org
>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
Recheck-request: iol-unit-amd64-testing
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] net/idpf: handle Tx of mbuf segments larger than 16k
2026-03-03 15:00 [PATCH] net/idpf: handle Tx of mbuf segments larger than 16k Bruce Richardson
2026-03-04 9:53 ` Bruce Richardson
@ 2026-03-06 13:45 ` Burakov, Anatoly
2026-03-06 14:03 ` Burakov, Anatoly
1 sibling, 1 reply; 5+ messages in thread
From: Burakov, Anatoly @ 2026-03-06 13:45 UTC (permalink / raw)
To: Bruce Richardson, dev
Cc: stable, Jingjing Wu, Praveen Shetty, Xiaoyun Li, Beilei Xing,
Junfeng Guo
On 3/3/2026 4:00 PM, Bruce Richardson wrote:
> Recent rework of the Tx single-queue path in idpf aligned that path with
> that of other drivers, meaning it now supports segments of size greater
> than 16k. Rework the split-queue path to similarly support those large
> segments.
>
> Fixes: 770f4dfe0f79 ("net/idpf: support basic Tx data path")
> Cc: stable@dpdk.org
>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
<snip>
> uint64_t cd_qw0 = 0, cd_qw1 = 0;
> nb_ctx = idpf_set_tso_ctx(ol_flags, tx_pkt, &tx_offload, txq,
> &cd_qw0, &cd_qw1);
>
> - /* Calculate the number of TX descriptors needed for
> - * each packet. For TSO packets, use ci_calc_pkt_desc as
> - * the mbuf data size might exceed max data size that hw allows
> - * per tx desc.
> + /* Calculate the number of TX descriptors needed for each packet.
> + * For TSO packets, use ci_calc_pkt_desc as the mbuf data size
> + * might exceed the max data size that hw allows per tx desc.
> */
> - if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
> + if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
This looks like a drive-by fix for an unrelated issue. That particular
code was introduced here:
2904020f8313 ("net/intel: add common function to calculate needed descs")
There are other drivers that check TSO flags but only look at TCP_SEG
but not UDP_SEG - should they all look for both? Perhaps this should be
looked at and fixed across all our PMD's that support TSO.
(to be clear, this is a general question, I'm not implying these changes
must be part of this patchset)
> nb_used = ci_calc_pkt_desc(tx_pkt) + nb_ctx;
> else
> nb_used = tx_pkt->nb_segs + nb_ctx;
>
> + if (txq->nb_tx_free <= txq->tx_free_thresh) {
> + /* TODO: Need to refine
> + * 1. free and clean: Better to decide a clean destination instead of
> + * loop times. And don't free mbuf when RS got immediately, free when
> + * transmit or according to the clean destination.
> + * Now, just ignore the RE write back, free mbuf when get RS
> + * 2. out-of-order rewrite back haven't be supported, SW head and HW head
> + * need to be separated.
> + **/
> + nb_to_clean = 2 * txq->tx_rs_thresh;
> + while (nb_to_clean--)
> + idpf_split_tx_free(txq->complq);
> + }
> +
> + if (txq->nb_tx_free < nb_used)
> + break;
> +
> if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
> cmd_dtype = IDPF_TXD_FLEX_FLOW_CMD_CS_EN;
>
> @@ -959,30 +959,52 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
> ctx_desc[0] = cd_qw0;
> ctx_desc[1] = cd_qw1;
>
> - tx_id++;
> - if (tx_id == txq->nb_tx_desc)
> + if (++tx_id == txq->nb_tx_desc)
> tx_id = 0;
> }
>
> + cmd_dtype |= IDPF_TX_DESC_DTYPE_FLEX_FLOW_SCHE;
> + struct rte_mbuf *m_seg = tx_pkt;
> do {
> - txd = &txr[tx_id];
> - txn = &sw_ring[txe->next_id];
> - txe->mbuf = tx_pkt;
> + uint64_t buf_dma_addr = rte_mbuf_data_iova(m_seg);
> + uint16_t slen = m_seg->data_len;
> +
> + txe->mbuf = m_seg;
CodeRabbit picked up on something here, and I think it's worth highlighting.
When we're splitting segments, we assign txe->mbuf to the first segment...
<snip>
> + txe = &sw_ring[sw_id];
> + /* sub-descriptor slots do not own the mbuf */
> + txe->mbuf = NULL;
...then set subsequent segments to NULL...
> + }
>
> - /* Setup TX descriptor */
> - txd->buf_addr =
> - rte_cpu_to_le_64(rte_mbuf_data_iova(tx_pkt));
> - cmd_dtype |= IDPF_TX_DESC_DTYPE_FLEX_FLOW_SCHE;
> + /* Write the final (or only) descriptor for this segment */
> + txd = &txr[tx_id];
> + txd->buf_addr = rte_cpu_to_le_64(buf_dma_addr);
> txd->qw1.cmd_dtype = cmd_dtype;
> - txd->qw1.rxr_bufsize = tx_pkt->data_len;
> + txd->qw1.rxr_bufsize = slen;
> txd->qw1.compl_tag = sw_id;
...and we're supposed to write the final descriptor here, but we've
stored the mbuf pointer in the *first* descriptor, not in the *last*
one, which means when this descriptor gets to processing completions,
the mbuf pointer of that descriptor will be NULL? Is that intended?
> - tx_id++;
> - if (tx_id == txq->nb_tx_desc)
> + if (++tx_id == txq->nb_tx_desc)
> tx_id = 0;
> sw_id = txe->next_id;
> - txe = txn;
> - tx_pkt = tx_pkt->next;
> - } while (tx_pkt);
> + txe = &sw_ring[sw_id];
> + m_seg = m_seg->next;
> + } while (m_seg);
>
> /* fill the last descriptor with End of Packet (EOP) bit */
> txd->qw1.cmd_dtype |= IDPF_TXD_FLEX_FLOW_CMD_EOP;
--
Thanks,
Anatoly
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] net/idpf: handle Tx of mbuf segments larger than 16k
2026-03-06 13:45 ` Burakov, Anatoly
@ 2026-03-06 14:03 ` Burakov, Anatoly
2026-03-06 14:16 ` Burakov, Anatoly
0 siblings, 1 reply; 5+ messages in thread
From: Burakov, Anatoly @ 2026-03-06 14:03 UTC (permalink / raw)
To: Bruce Richardson, dev
Cc: stable, Jingjing Wu, Praveen Shetty, Xiaoyun Li, Beilei Xing,
Junfeng Guo
> CodeRabbit picked up on something here, and I think it's worth
> highlighting.
>
> When we're splitting segments, we assign txe->mbuf to the first segment...
>
> <snip>
>
>> + txe = &sw_ring[sw_id];
>> + /* sub-descriptor slots do not own the mbuf */
>> + txe->mbuf = NULL;
>
> ...then set subsequent segments to NULL...
>
>> + }
>> - /* Setup TX descriptor */
>> - txd->buf_addr =
>> - rte_cpu_to_le_64(rte_mbuf_data_iova(tx_pkt));
>> - cmd_dtype |= IDPF_TX_DESC_DTYPE_FLEX_FLOW_SCHE;
>> + /* Write the final (or only) descriptor for this segment */
>> + txd = &txr[tx_id];
>> + txd->buf_addr = rte_cpu_to_le_64(buf_dma_addr);
>> txd->qw1.cmd_dtype = cmd_dtype;
>> - txd->qw1.rxr_bufsize = tx_pkt->data_len;
>> + txd->qw1.rxr_bufsize = slen;
>> txd->qw1.compl_tag = sw_id;
>
> ...and we're supposed to write the final descriptor here, but we've
> stored the mbuf pointer in the *first* descriptor, not in the *last*
> one, which means when this descriptor gets to processing completions,
> the mbuf pointer of that descriptor will be NULL? Is that intended?
Actually, digging in, I don't see where we free mbufs at all in splitq
path? Am I missing something here?
--
Thanks,
Anatoly
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] net/idpf: handle Tx of mbuf segments larger than 16k
2026-03-06 14:03 ` Burakov, Anatoly
@ 2026-03-06 14:16 ` Burakov, Anatoly
0 siblings, 0 replies; 5+ messages in thread
From: Burakov, Anatoly @ 2026-03-06 14:16 UTC (permalink / raw)
To: Bruce Richardson, dev
Cc: stable, Jingjing Wu, Praveen Shetty, Xiaoyun Li, Beilei Xing,
Junfeng Guo
On 3/6/2026 3:03 PM, Burakov, Anatoly wrote:
>
>> CodeRabbit picked up on something here, and I think it's worth
>> highlighting.
>>
>> When we're splitting segments, we assign txe->mbuf to the first
>> segment...
>>
>> <snip>
>>
>>> + txe = &sw_ring[sw_id];
>>> + /* sub-descriptor slots do not own the mbuf */
>>> + txe->mbuf = NULL;
>>
>> ...then set subsequent segments to NULL...
>>
>>> + }
>>> - /* Setup TX descriptor */
>>> - txd->buf_addr =
>>> - rte_cpu_to_le_64(rte_mbuf_data_iova(tx_pkt));
>>> - cmd_dtype |= IDPF_TX_DESC_DTYPE_FLEX_FLOW_SCHE;
>>> + /* Write the final (or only) descriptor for this segment */
>>> + txd = &txr[tx_id];
>>> + txd->buf_addr = rte_cpu_to_le_64(buf_dma_addr);
>>> txd->qw1.cmd_dtype = cmd_dtype;
>>> - txd->qw1.rxr_bufsize = tx_pkt->data_len;
>>> + txd->qw1.rxr_bufsize = slen;
>>> txd->qw1.compl_tag = sw_id;
>>
>> ...and we're supposed to write the final descriptor here, but we've
>> stored the mbuf pointer in the *first* descriptor, not in the *last*
>> one, which means when this descriptor gets to processing completions,
>> the mbuf pointer of that descriptor will be NULL? Is that intended?
>
> Actually, digging in, I don't see where we free mbufs at all in splitq
> path? Am I missing something here?
>
Yes I am, RS bit is set by the hardware. I guess the question is then,
does it set RS bits for *all* segments, or just the ones we marked with
EOP? If it's the latter then it's definitely a bug. If all segments get
RS bit set, then it's not.
--
Thanks,
Anatoly
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-03-06 14:17 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-03 15:00 [PATCH] net/idpf: handle Tx of mbuf segments larger than 16k Bruce Richardson
2026-03-04 9:53 ` Bruce Richardson
2026-03-06 13:45 ` Burakov, Anatoly
2026-03-06 14:03 ` Burakov, Anatoly
2026-03-06 14:16 ` Burakov, Anatoly
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox