Re: [PATCH v8 0/4] net/zxdh: optimize Rx/Tx path performance

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Stephen Hemminger <stephen@networkplumber.org>
To: Junlong Wang <wang.junlong1@zte.com.cn>
Cc: dev@dpdk.org
Subject: Re: [PATCH v8 0/4] net/zxdh: optimize Rx/Tx path performance
Date: Thu, 25 Jun 2026 15:42:03 -0700	[thread overview]
Message-ID: <20260625154203.1d16769b@phoenix.local> (raw)
In-Reply-To: <20260625120317.211780-1-wang.junlong1@zte.com.cn>

On Thu, 25 Jun 2026 20:03:12 +0800
Junlong Wang <wang.junlong1@zte.com.cn> wrote:

> v8:
>   - Add checked the size of ZXDH_DL_NET_HDR_SIZE and RTE_PKTMBUF_HEADROOM in
>     zxdh_xmit_pkts_simple() before submitting. Add static_assert to reject builds with insufficient
>     default headroom at compile time.
> 
> v7:
>   - Add a new xmit prepare func for xmit_pkts_simple, which will checked the size of
>     ZXDH_DL_NET_HDR_SIZE and RTE_PKTMBUF_HEADROOM.
> 
> v6:
>   - Remove unnecessary error checking code in submit_to_backend_simple() and
>     pkt_padding(). Since as the max dl_net_hdr_len is always less than
>     RTE_PKTMBUF_HEADROOM, rte_pktmbuf_prepend() cannot fail in the
>     simple path (single-segment mbufs).
> v5:
>   - Reorganize patch series, placing interrupt fix as the first patch
>     and fix condition check to properly enable interrupts.
>   - Fix zxdh_recv_single_pkts() not compacting rcv_pkts[] on failure,
>     which could cause use-after-free and mbuf leak.
>   - Fix tx_bunch() and tx1() missing store barrier before setting AVAIL flag,
>     preventing data race on weakly-ordered architectures.
>   - Fix submit_to_backend_simple() writing descriptors for packets that
>     failed pkt_padding(), causing mbuf leak.
> v4:
>   - fix some AI review issues.
>   - fix queue enable intr bug.
> v3:
>   - remove unnecessary NULL check in zxdh_init_queue.
>   - Split Ring: Bit[31] is unused and reserved, zxdh_queue_notify(): removing the
>     zxdh_pci_with_feature(hw, ZXDH_F_RING_PACKED) check;
>   - remove unnecessary double-free in in zxdh_recv_single_pkts();
>   - used rte_pktmbuf_mtod();
>   - remove rxq_get_vq(q) macro, use q->vq and apply it consistently;
>   - Refactoring scatter and mtu check logic in zxdh_dev_mtu_set();
>   - set txdp->id = avail_idx + i in tx_bunch/tx1.
>   - add comment documenting zxdh_xmit_enqueue_append() now sets dxp->cookie = NULL for
>     the head slot and stores cookies per descriptor via dep[idx].cookie.
>   - add one-line comment noting tx_bunch() is the simple path handles single-segment.
>   - remove unnecessary Extra initialization and the uint32_t cast.
> v2:
>   - zxdh_rxtx.c, pkt_padding(): modifyed the return value of pkt_padding();
>   - zxdh_rxtx.c, zxdh_recv_single_pkts(): modifyed When zxdh_init_mbuf() fails
>     the loop does "continue" and free mbufs;
>   - zxdh_rxtx.c, refill_desc_unwrap(): Add rte_io_wmb() before writing flags
>     in the refill_que_descs();
>   - zxdh_queue.h, zxdh_queue_enable_intr(): Remove unnecessary function of zxdh_queue_enable_intr;
>   - zxdh_ethdev.c, zxdh_init_queue(): changed the hdr_mz NULL check logic;
>   - zxdh_rxtx.c, zxdh_xmit_pkts_simple()、zxdh_recv_single_pkts(): add stats.bytes count;
>   - zxdh_rxtx.c, zxdh_init_mbuf():remove  rte_pktmbuf_dump(stdout, rxm, 40);
>   - zxdh_ethdev.c, zxdh_dev_free_mbufs(): using rte_pktmbuf_free() to free mbufs;
>   - Splitting into separate patches, structure reorganization and sw_ring removal、
>     RX recv optimize、Tx xmit optimize、Tx;
> v1:
>   This patch optimizes the ZXDH PMD's receive and transmit path for better
>   performance through several improvements:
> - Add simple TX/RX burst functions (zxdh_xmit_pkts_simple and
>   zxdh_recv_single_pkts) for single-segment packet scenarios.
> - Remove RX software ring (sw_ring) to reduce memory allocation and
>   copy.
> - Optimize descriptor management with prefetching and simplified
>   cleanup.
> - Reorganize structure fields for better cache locality.
> 
>   These changes reduce CPU cycles and memory bandwidth consumption,
>   resulting in improved packet processing throughput.
> 
> Junlong Wang (4):
>   net/zxdh: fix queue enable intr issues
>   net/zxdh: optimize queue structure to improve performance
>   net/zxdh: optimize Rx recv pkts performance
>   net/zxdh: optimize Tx xmit pkts performance
> 
>  drivers/net/zxdh/zxdh_ethdev.c     |  83 ++--
>  drivers/net/zxdh/zxdh_ethdev_ops.c |  23 +-
>  drivers/net/zxdh/zxdh_ethdev_ops.h |   4 +
>  drivers/net/zxdh/zxdh_pci.c        |   2 +-
>  drivers/net/zxdh/zxdh_queue.c      |  11 +-
>  drivers/net/zxdh/zxdh_queue.h      | 122 +++---
>  drivers/net/zxdh/zxdh_rxtx.c       | 591 +++++++++++++++++++++++------
>  drivers/net/zxdh/zxdh_rxtx.h       |  29 +-
>  8 files changed, 604 insertions(+), 261 deletions(-)
> 

AI spotted one bug but it under valued how serious the issue is.
AI also gave bad advice about this earlier. Here is my analysis
followed by buggy AI review, then what it said after I pointed out
its mistake. Sorry, I need to not trust AI; its good until it isn't

The semantics of how DPDK drivers are supposed to handle bad packets
in transmit is not well documented. The correct way to handle a packet
that can not be sent is for the driver to consume (free in tx_burst)
and increment an error counter, and keep going. Suppose application
gave a single buggy mbuf (index 3) in a burst of 16 packets (assuming
there was space in the transmit ring).
The driver should return with 16 (all packets consumed) but the
statistics should reflect what happened: 15 packets transmitted, 1 tx error.

The problem with should return is that the application can not tell
the difference between backpressure (transmit ring full) and a bad packet.
An application may think it see back pressure and retransmit the bad packet.

--- AI review (buggy) ---

Series review: net/zxdh Rx/Tx optimization (v8)

The v7 issue is resolved. The simple Tx burst no longer depends on the
optional tx_prepare for memory safety: zxdh_xmit_pkts_simple() now
filters mbufs with data_off < hw->dl_net_hdr_len before any packet
reaches pkt_padding(), stopping at the first short-headroom packet and
returning the count enqueued so far, so the caller keeps ownership of
the rest. The threshold matches exactly what pkt_padding() prepends,
and submit_to_backend_simple()/pkt_padding() are only reachable through
that filter, so the open-coded prepend can no longer underflow data_off
or write before buf_addr. The build-time backstop

	static_assert(RTE_PKTMBUF_HEADROOM >= ZXDH_DL_NET_HDR_SIZE, ...)

is also in place (static_assert resolves via rte_common.h).

Patches 1-3 are unchanged from v7 and remain correct: the enable_intr
fix is first in the series against the original field name so it
backports cleanly; the Tx AVAIL flag is published through
zxdh_queue_store_flags_packed() (rte_io_wmb); multi-segment Tx frees
each segment once via free_seg with the head descriptor's NULL cookie
holding the hardware header; and zxdh_recv_single_pkts() compacts
surviving mbufs.

Info (non-blocking, no need to respin):

In the new headroom filter, the whole returned tail is counted as
errors:

	if (unlikely(tx_pkts[i]->data_off < hdr_len)) {
		txvq->stats.errors += nb_pkts - i;
		nb_pkts = i;
		break;
	}

Only tx_pkts[i] is known to be short; the packets after it are not
examined, just deferred, and the application will resubmit them on the
next burst. Counting the entire tail as errors inflates the error stat
(and a packet that is eventually sent fine can be counted as an error
on an earlier burst where it sat behind a short-headroom packet).
Counting only the rejected packet, or tracking the deferred tail
separately from errors, would be more accurate. Not worth a respin on
its own.

-- AI review (corrected) ---

Series review: net/zxdh Rx/Tx optimization (v8)

The v6/v7 out-of-bounds write is fixed: zxdh_xmit_pkts_simple() now
checks data_off < hw->dl_net_hdr_len before any packet reaches the
open-coded prepend in pkt_padding(), and the static_assert backstop is
in place. Memory safety is no longer the issue. How the unsendable
packet is handled is.

Patches 1-3 are unchanged from v7 and remain correct.

[PATCH v8 4/4] net/zxdh: optimize Tx xmit pkts performance

Error: the simple Tx burst signals a bad packet with a short return,
which the application cannot distinguish from backpressure.

	for (i = 0; i < nb_pkts; i++) {
		rte_prefetch0(tx_pkts[i]);
		if (unlikely(tx_pkts[i]->data_off < hdr_len)) {
			txvq->stats.errors += nb_pkts - i;
			nb_pkts = i;
			break;
		}
	}

A short return from tx_burst is the backpressure signal (transmit ring
full, retry later). Here it is also used to mean "packet i is bad",
and the bad mbuf is left owned by the caller. The application has no
way to tell the two apart: the usual

	for (sent = 0; sent < n; )
		sent += rte_eth_tx_burst(port, q, &pkts[sent], n - sent);

loop treats the short return as backpressure and resubmits pkts[i],
which fails again every time -- head-of-line blocking, and the good
packets after i (which had ring space) never go out.

A packet that cannot be sent must be consumed by the driver, not
handed back. Free it in tx_burst, increment the tx error counter, and
continue with the rest of the burst. For a burst of 16 where only
index 3 is bad and the ring has room, tx_burst should return 16, with
stats showing 15 transmitted and 1 tx error. A short return is then
reserved for the one case the application is entitled to retry: ring
full.

Concretely for the simple path: walk the burst, and for a packet with
data_off < hdr_len free it and bump the error counter rather than
breaking; compact the survivors (as zxdh_recv_single_pkts() already
does on Rx) and submit the compacted run. Reserve the short return for
the vq_free_cnt limit.

Better still, avoid dropping the packet at all: route a short-headroom
mbuf through the append path the packed burst already uses
(zxdh_xmit_enqueue_append copies the header into the reserved txr
region) instead of prepending in place. Then nothing is lost and the
simple path matches the packed path's handling.

next prev parent reply	other threads:[~2026-06-25 22:42 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-26  2:28 [PATCH v1] net/zxdh: optimize Rx/Tx path performance Junlong Wang
2026-03-26  3:27 ` Stephen Hemminger
2026-04-06  4:26 ` Stephen Hemminger
2026-04-23  1:18 ` [PATCH v2 0/3] " Junlong Wang
2026-04-23  1:18   ` [PATCH v2 1/3] net/zxdh: optimize queue structure to improve performance Junlong Wang
2026-04-23 18:57     ` Stephen Hemminger
2026-04-23  1:18   ` [PATCH v2 2/3] net/zxdh: optimize Rx recv pkts performance Junlong Wang
2026-04-23 18:54     ` Stephen Hemminger
2026-04-23 23:39     ` Stephen Hemminger
2026-04-23  1:18   ` [PATCH v2 3/3] net/zxdh: optimize Tx xmit " Junlong Wang
2026-04-23 19:23   ` [PATCH v2 0/3] net/zxdh: optimize Rx/Tx path performance Stephen Hemminger
2026-05-09  6:29   ` [PATCH v3 " Junlong Wang
2026-05-09  6:29     ` [PATCH v3 1/3] net/zxdh: optimize queue structure to improve performance Junlong Wang
2026-05-18  2:20       ` Stephen Hemminger
2026-05-09  6:29     ` [PATCH v3 2/3] net/zxdh: optimize Rx recv pkts performance Junlong Wang
2026-05-09  6:29     ` [PATCH v3 3/3] net/zxdh: optimize Tx xmit " Junlong Wang
2026-05-18  2:22       ` Stephen Hemminger
2026-06-06  6:32     ` [PATCH v4 0/4] net/zxdh: optimize Rx/Tx path performance Junlong Wang
2026-06-06  6:32       ` [PATCH v4 1/4] net/zxdh: optimize queue structure to improve performance Junlong Wang
2026-06-06  6:32       ` [PATCH v4 2/4] net/zxdh: optimize Rx recv pkts performance Junlong Wang
2026-06-06  6:32       ` [PATCH v4 3/4] net/zxdh: optimize Tx xmit " Junlong Wang
2026-06-06  6:32       ` [PATCH v4 4/4] net/zxdh: fix queue enable intr issues Junlong Wang
2026-06-07 18:00       ` [PATCH v4 0/4] net/zxdh: optimize Rx/Tx path performance Stephen Hemminger
2026-06-15  1:19       ` [PATCH v5 " Junlong Wang
2026-06-15  1:19         ` [PATCH v5 1/4] net/zxdh: fix queue enable intr issues Junlong Wang
2026-06-15  1:19         ` [PATCH v5 2/4] net/zxdh: optimize queue structure to improve performance Junlong Wang
2026-06-15  1:19         ` [PATCH v5 3/4] net/zxdh: optimize Rx recv pkts performance Junlong Wang
2026-06-15  1:19         ` [PATCH v5 4/4] net/zxdh: optimize Tx xmit " Junlong Wang
2026-06-15 18:38           ` Stephen Hemminger
2026-06-17  8:28       ` [PATCH v6 0/4] net/zxdh: optimize Rx/Tx path performance Junlong Wang
2026-06-17  8:28         ` [PATCH v6 1/4] net/zxdh: fix queue enable intr issues Junlong Wang
2026-06-17  8:28         ` [PATCH v6 2/4] net/zxdh: optimize queue structure to improve performance Junlong Wang
2026-06-17  8:28         ` [PATCH v6 3/4] net/zxdh: optimize Rx recv pkts performance Junlong Wang
2026-06-17  8:28         ` [PATCH v6 4/4] net/zxdh: optimize Tx xmit " Junlong Wang
2026-06-17 15:21         ` [PATCH v6 0/4] net/zxdh: optimize Rx/Tx path performance Stephen Hemminger
2026-06-23  6:09         ` [PATCH v7 " Junlong Wang
2026-06-23  6:09           ` [PATCH v7 1/4] net/zxdh: fix queue enable intr issues Junlong Wang
2026-06-23  6:09           ` [PATCH v7 2/4] net/zxdh: optimize queue structure to improve performance Junlong Wang
2026-06-23  6:09           ` [PATCH v7 3/4] net/zxdh: optimize Rx recv pkts performance Junlong Wang
2026-06-23  6:09           ` [PATCH v7 4/4] net/zxdh: optimize Tx xmit " Junlong Wang
2026-06-23 15:54           ` [PATCH v7 0/4] net/zxdh: optimize Rx/Tx path performance Stephen Hemminger
2026-06-25 12:03           ` [PATCH v8 " Junlong Wang
2026-06-25 12:03             ` [PATCH v8 1/4] net/zxdh: fix queue enable intr issues Junlong Wang
2026-06-25 12:03             ` [PATCH v8 2/4] net/zxdh: optimize queue structure to improve performance Junlong Wang
2026-06-25 12:03             ` [PATCH v8 3/4] net/zxdh: optimize Rx recv pkts performance Junlong Wang
2026-06-25 12:03             ` [PATCH v8 4/4] net/zxdh: optimize Tx xmit " Junlong Wang
2026-06-25 22:42             ` Stephen Hemminger [this message]
2026-06-26  3:10             ` [v8,0/4] net/zxdh: optimize Rx/Tx path performance Junlong Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260625154203.1d16769b@phoenix.local \
    --to=stephen@networkplumber.org \
    --cc=dev@dpdk.org \
    --cc=wang.junlong1@zte.com.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.