From: Stephen Hemminger <stephen@networkplumber.org>
To: Junlong Wang <wang.junlong1@zte.com.cn>
Cc: dev@dpdk.org
Subject: Re: [PATCH v4 0/4] net/zxdh: optimize Rx/Tx path performance
Date: Sun, 7 Jun 2026 11:00:01 -0700 [thread overview]
Message-ID: <20260607110001.170d66f9@phoenix.local> (raw)
In-Reply-To: <20260606063226.491848-1-wang.junlong1@zte.com.cn>
On Sat, 6 Jun 2026 14:32:21 +0800
Junlong Wang <wang.junlong1@zte.com.cn> wrote:
> v4:
> - fix some AI review issues.
> - fix queue enable intr bug.
>
> v3:
> - remove unnecessary NULL check in zxdh_init_queue.
> - Split Ring: Bit[31] is unused and reserved, zxdh_queue_notify(): removing the
> zxdh_pci_with_feature(hw, ZXDH_F_RING_PACKED) check;
> - remove unnecessary double-free in in zxdh_recv_single_pkts();
> - used rte_pktmbuf_mtod();
> - remove rxq_get_vq(q) macro, use q->vq and apply it consistently;
> - Refactoring scatter and mtu check logic in zxdh_dev_mtu_set();
> - set txdp->id = avail_idx + i in tx_bunch/tx1.
> - add comment documenting zxdh_xmit_enqueue_append() now sets dxp->cookie = NULL for
> the head slot and stores cookies per descriptor via dep[idx].cookie.
> - add one-line comment noting tx_bunch() is the simple path handles single-segment.
> - remove unnecessary Extra initialization and the uint32_t cast.
>
> v2:
> - zxdh_rxtx.c, pkt_padding(): modifyed the return value of pkt_padding();
> - zxdh_rxtx.c, zxdh_recv_single_pkts(): modifyed When zxdh_init_mbuf() fails
> the loop does "continue" and free mbufs;
> - zxdh_rxtx.c, refill_desc_unwrap(): Add rte_io_wmb() before writing flags
> in the refill_que_descs();
> - zxdh_queue.h, zxdh_queue_enable_intr(): Remove unnecessary function of zxdh_queue_enable_intr;
> - zxdh_ethdev.c, zxdh_init_queue(): changed the hdr_mz NULL check logic;
>
> - zxdh_rxtx.c, zxdh_xmit_pkts_simple()、zxdh_recv_single_pkts(): add stats.bytes count;
> - zxdh_rxtx.c, zxdh_init_mbuf():remove rte_pktmbuf_dump(stdout, rxm, 40);
> - zxdh_ethdev.c, zxdh_dev_free_mbufs(): using rte_pktmbuf_free() to free mbufs;
> - Splitting into separate patches, structure reorganization and sw_ring removal、
> RX recv optimize、Tx xmit optimize、Tx;
>
> v1:
> This patch optimizes the ZXDH PMD's receive and transmit path for better
> performance through several improvements:
>
> - Add simple TX/RX burst functions (zxdh_xmit_pkts_simple and
> zxdh_recv_single_pkts) for single-segment packet scenarios.
> - Remove RX software ring (sw_ring) to reduce memory allocation and
> copy.
> - Optimize descriptor management with prefetching and simplified
> cleanup.
> - Reorganize structure fields for better cache locality.
>
> These changes reduce CPU cycles and memory bandwidth consumption,
> resulting in improved packet processing throughput.
>
> Junlong Wang (4):
> net/zxdh: optimize queue structure to improve performance
> net/zxdh: optimize Rx recv pkts performance
> net/zxdh: optimize Tx xmit pkts performance
> net/zxdh: fix queue enable intr issues
>
> drivers/net/zxdh/zxdh_ethdev.c | 81 ++---
> drivers/net/zxdh/zxdh_ethdev_ops.c | 23 +-
> drivers/net/zxdh/zxdh_ethdev_ops.h | 4 +
> drivers/net/zxdh/zxdh_pci.c | 2 +-
> drivers/net/zxdh/zxdh_queue.c | 11 +-
> drivers/net/zxdh/zxdh_queue.h | 120 +++----
> drivers/net/zxdh/zxdh_rxtx.c | 534 ++++++++++++++++++++++-------
> drivers/net/zxdh/zxdh_rxtx.h | 27 +-
> 8 files changed, 543 insertions(+), 259 deletions(-)
>
Multiple issues reported by AI review. I ran in Claude to get better summary.
That way it can look at progress across revisions in the series.
Series review: net/zxdh Rx/Tx optimization (v4)
The v3 issues are addressed: pkt_padding() now checks the
rte_pktmbuf_prepend() return for NULL, the simple-path statistics no
longer read past the end of tx_pkts on a ring wrap, and the
zxdh_queue_enable_intr() fix has been split into its own patch 4/4
with a Fixes: tag and Cc: stable. Remaining issues below.
[PATCH v4 2/4] net/zxdh: optimize Rx recv pkts performance
Error: zxdh_recv_single_pkts() does not compact rcv_pkts[] on a
mid-burst failure, so a freed mbuf can be returned to the application
and a valid mbuf can be leaked.
for (i = 0; i < num; i++) {
struct rte_mbuf *rxm = rcv_pkts[i];
uint16_t len = lens[i];
if (unlikely(zxdh_init_mbuf(rxm, len, hw, &vq->rxq) < 0))
continue;
zxdh_update_packet_stats(&rxvq->stats, rxm);
nb_rx++;
}
zxdh_init_mbuf() frees rxm on failure (num_buffers != 1, or
data_len != pkt_len). The loop then skips it via continue but never
writes the surviving mbufs down to rcv_pkts[nb_rx]. If packet i fails
and a later packet j > i succeeds, the valid mbuf stays at rcv_pkts[j]
while the caller is told only nb_rx packets are valid. The caller reads
rcv_pkts[0..nb_rx-1], which now includes the freed slot at index i
(use-after-free) and never sees the valid mbuf at index j (leak).
The companion zxdh_recv_pkts_packed() in this same file does it
correctly with rcv_pkts[nb_rx] = rxm. Mirror that here:
if (unlikely(zxdh_init_mbuf(rxm, len, hw, &vq->rxq) < 0))
continue;
rcv_pkts[nb_rx] = rxm;
zxdh_update_packet_stats(&rxvq->stats, rxm);
nb_rx++;
[PATCH v4 3/4] net/zxdh: optimize Tx xmit pkts performance
Error: tx_bunch() and tx1() publish the descriptor AVAIL flag with no
store barrier, so the device may observe the AVAIL bit before addr/len
are visible and DMA from a stale address.
for (i = 0; i < N_PER_LOOP; ++i, ++txdp, ++pkts) {
txdp->addr = rte_mbuf_data_iova(*pkts);
txdp->len = (*pkts)->data_len;
txdp->id = start_id + i;
txdp->flags = flags; /* AVAIL bit, no rte_io_wmb */
}
The packed ring requires the addr/len/id stores to be globally visible
before the flags store that sets the AVAIL bit. The rest of the driver
does this through zxdh_queue_store_flags_packed(), which issues
rte_io_wmb() before writing flags (used by the original
zxdh_enqueue_xmit_packed_fast() that this path replaces, and by
refill_que_descs() in patch 2/4). The direct txdp->flags = flags writes
here drop that barrier. volatile prevents only compiler reordering, not
hardware reordering on weakly-ordered architectures (arm, ppc), so this
is a real data race against the device.
Write addr/len/id for the batch, then a single rte_io_wmb(), then the
flags; or publish each flag through zxdh_queue_store_flags_packed().
Error: in submit_to_backend_simple(), a packet that fails pkt_padding()
is skipped for cookie tracking but its descriptor is still written and
published, and the mbuf leaks.
for (j = 0; j < N_PER_LOOP; ++j) {
m = *(tx_pkts + i + j);
if (unlikely(pkt_padding(m, hw) < 0)) {
vq->txq.stats.errors++;
continue; /* cookie not set, m not freed */
}
(dxp + i + j)->cookie = (void *)m;
zxdh_update_packet_stats(&vq->txq.stats, m);
}
/* writes ALL N_PER_LOOP descriptors, including the failed one */
tx_bunch(vq, txdp + i, tx_pkts + i, id + i);
When pkt_padding() returns < 0 the mbuf has no prepended downlink
header, but tx_bunch() still writes its iova/len into the descriptor and
sets the AVAIL bit, so a header-less packet is handed to the device.
The matching vq_descx[].cookie is left unset, so the mbuf is never
freed (leak) and, if the slot holds a stale cookie from a prior use,
the next flush double-frees it. vq_avail_idx and the returned count are
also advanced by the full nb_pkts, counting the dropped packet as sent.
The leftover loop handles this correctly (it calls tx1() per packet,
inside the padding check). The main loop needs the same structure:
either build the descriptor per packet after a successful pkt_padding(),
or track which slots succeeded and only publish those.
[PATCH v4 4/4] net/zxdh: fix queue enable intr issues
Error: the fix does not enable interrupts; the condition is inverted
relative to the commit message.
- if (vq->event_flags_shadow == ZXDH_RING_EVENT_FLAGS_DISABLE) {
- vq->event_flags_shadow = ZXDH_RING_EVENT_FLAGS_DISABLE;
+ if (vq->event_flags_shadow == ZXDH_RING_EVENT_FLAGS_ENABLE) {
+ vq->event_flags_shadow = ZXDH_RING_EVENT_FLAGS_ENABLE;
vq->vq_packed.ring.driver->desc_event_flags = vq->event_flags_shadow;
}
ENABLE is 0x0 and DISABLE is 0x1. After a disable the shadow holds
DISABLE, which is the state from which the application re-arms the
interrupt via zxdh_dev_rx_queue_intr_enable(). With "== ENABLE" the
body runs only when interrupts are already enabled; when the shadow is
DISABLE the test is false and nothing happens, so interrupts are still
never enabled. The body now writes ENABLE, but the guard prevents it
from ever running in the case that matters.
The commit message says "Change == to !=", which is the correct fix,
but the diff changed the constant and kept ==. Mirror
zxdh_queue_disable_intr() (which uses != DISABLE):
if (vq->event_flags_shadow != ZXDH_RING_EVENT_FLAGS_ENABLE) {
vq->event_flags_shadow = ZXDH_RING_EVENT_FLAGS_ENABLE;
vq->vq_packed.ring.driver->desc_event_flags = vq->event_flags_shadow;
}
(Keeping the original "== DISABLE" guard and only changing the body
assignment to ENABLE is also correct and is the minimal fix for the
original bug.)
Warning: this Cc: stable fix depends on patch 1/4, which moved
event_flags_shadow out of vq_packed. The diff context here is
vq->event_flags_shadow, but the stable branches still have
vq->vq_packed.event_flags_shadow, so this patch will not cherry-pick
cleanly. Consider making the fix standalone against the current field
name and placing it first in the series so it backports without the
reorganization.
prev parent reply other threads:[~2026-06-07 18:00 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-26 2:28 [PATCH v1] net/zxdh: optimize Rx/Tx path performance Junlong Wang
2026-03-26 3:27 ` Stephen Hemminger
2026-04-06 4:26 ` Stephen Hemminger
2026-04-23 1:18 ` [PATCH v2 0/3] " Junlong Wang
2026-04-23 1:18 ` [PATCH v2 1/3] net/zxdh: optimize queue structure to improve performance Junlong Wang
2026-04-23 18:57 ` Stephen Hemminger
2026-04-23 1:18 ` [PATCH v2 2/3] net/zxdh: optimize Rx recv pkts performance Junlong Wang
2026-04-23 18:54 ` Stephen Hemminger
2026-04-23 23:39 ` Stephen Hemminger
2026-04-23 1:18 ` [PATCH v2 3/3] net/zxdh: optimize Tx xmit " Junlong Wang
2026-04-23 19:23 ` [PATCH v2 0/3] net/zxdh: optimize Rx/Tx path performance Stephen Hemminger
2026-05-09 6:29 ` [PATCH v3 " Junlong Wang
2026-05-09 6:29 ` [PATCH v3 1/3] net/zxdh: optimize queue structure to improve performance Junlong Wang
2026-05-18 2:20 ` Stephen Hemminger
2026-05-09 6:29 ` [PATCH v3 2/3] net/zxdh: optimize Rx recv pkts performance Junlong Wang
2026-05-09 6:29 ` [PATCH v3 3/3] net/zxdh: optimize Tx xmit " Junlong Wang
2026-05-18 2:22 ` Stephen Hemminger
2026-06-06 6:32 ` [PATCH v4 0/4] net/zxdh: optimize Rx/Tx path performance Junlong Wang
2026-06-06 6:32 ` [PATCH v4 1/4] net/zxdh: optimize queue structure to improve performance Junlong Wang
2026-06-06 6:32 ` [PATCH v4 2/4] net/zxdh: optimize Rx recv pkts performance Junlong Wang
2026-06-06 6:32 ` [PATCH v4 3/4] net/zxdh: optimize Tx xmit " Junlong Wang
2026-06-06 6:32 ` [PATCH v4 4/4] net/zxdh: fix queue enable intr issues Junlong Wang
2026-06-07 18:00 ` Stephen Hemminger [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260607110001.170d66f9@phoenix.local \
--to=stephen@networkplumber.org \
--cc=dev@dpdk.org \
--cc=wang.junlong1@zte.com.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox