Re: [PATCH v2 0/3] net/zxdh: optimize Rx/Tx path performance

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Stephen Hemminger <stephen@networkplumber.org>
To: Junlong Wang <wang.junlong1@zte.com.cn>
Cc: dev@dpdk.org
Subject: Re: [PATCH v2 0/3] net/zxdh: optimize Rx/Tx path performance
Date: Thu, 23 Apr 2026 12:23:57 -0700	[thread overview]
Message-ID: <20260423122357.3a91bccf@phoenix.local> (raw)
In-Reply-To: <20260423011820.2426203-1-wang.junlong1@zte.com.cn>

On Thu, 23 Apr 2026 09:18:15 +0800
Junlong Wang <wang.junlong1@zte.com.cn> wrote:

> v2:
>   - zxdh_rxtx.c, pkt_padding(): modifyed the return value of pkt_padding();
>   - zxdh_rxtx.c, zxdh_recv_single_pkts(): modifyed When zxdh_init_mbuf() fails
>     the loop does "continue" and free mbufs;
>   - zxdh_rxtx.c, refill_desc_unwrap(): Add rte_io_wmb() before writing flags
>     in the refill_que_descs();
>   - zxdh_queue.h, zxdh_queue_enable_intr(): Remove unnecessary function of zxdh_queue_enable_intr;
>   - zxdh_ethdev.c, zxdh_init_queue(): changed the hdr_mz NULL check logic;
> 
>   - zxdh_rxtx.c, zxdh_xmit_pkts_simple()、zxdh_recv_single_pkts(): add stats.bytes count;
>   - zxdh_rxtx.c, zxdh_init_mbuf():remove  rte_pktmbuf_dump(stdout, rxm, 40);
>   - zxdh_ethdev.c, zxdh_dev_free_mbufs(): using rte_pktmbuf_free() to free mbufs;
>   - Splitting into separate patches, structure reorganization and sw_ring removal、
>     RX recv optimize、Tx xmit optimize、Tx;
> 
> v1:
>   This patch optimizes the ZXDH PMD's receive and transmit path for better
>   performance through several improvements:
> 
> - Add simple TX/RX burst functions (zxdh_xmit_pkts_simple and
>   zxdh_recv_single_pkts) for single-segment packet scenarios.
> - Remove RX software ring (sw_ring) to reduce memory allocation and
>   copy.
> - Optimize descriptor management with prefetching and simplified
>   cleanup.
> - Reorganize structure fields for better cache locality.
> 
>   These changes reduce CPU cycles and memory bandwidth consumption,
>   resulting in improved packet processing throughput.
> 
> Junlong Wang (3):
>   net/zxdh: optimize queue structure to improve performance
>   net/zxdh: optimize Rx recv pkts performance
>   net/zxdh: optimize Tx xmit pkts performance
> 
>  drivers/net/zxdh/zxdh_ethdev.c     |  88 ++---
>  drivers/net/zxdh/zxdh_ethdev_ops.c |  24 +-
>  drivers/net/zxdh/zxdh_ethdev_ops.h |   4 +
>  drivers/net/zxdh/zxdh_pci.c        |   2 +-
>  drivers/net/zxdh/zxdh_queue.c      |  31 +-
>  drivers/net/zxdh/zxdh_queue.h      | 125 +++----
>  drivers/net/zxdh/zxdh_rxtx.c       | 520 +++++++++++++++++++++--------
>  drivers/net/zxdh/zxdh_rxtx.h       |  27 +-
>  8 files changed, 527 insertions(+), 294 deletions(-)
> 

Here is the full AI review of this patchset
```
Review of v2 series: net/zxdh queue/Rx/Tx optimization

Overall: needs a v3. Findings below, organized by patch.

Patch 1/3 (optimize queue structure)
------------------------------------

Warning: zxdh_queue_notify() in zxdh_queue.h now hardcodes the bit-31
avail-wrap into notify_data unconditionally, removing the
zxdh_pci_with_feature(hw, ZXDH_F_RING_PACKED) gate that the previous
out-of-line version in zxdh_pci.c had.  The series also adds a
vq_split member to the virtqueue union, which suggests split-ring
support is planned.  Once that exists, this helper will corrupt notify
data for split rings.  Either reinstate the feature gate or keep the
dispatch going through VTPCI_OPS()->notify_queue().

Warning: rx_queue_intr_enable/rx_queue_intr_disable dev_ops and the
zxdh_queue_enable_intr() helper are removed.  The commit log talks
about cache locality and sw_ring removal but not this.  Please split
it into its own patch with a justification, or at minimum call it out
in the commit message.

Minor: fail_q_alloc now does "if (hdr_mz) rte_memzone_free(hdr_mz);".
rte_memzone_free() accepts NULL; the guard is unnecessary.

Minor: The new "if (hdr_mz == NULL)" check inside the VTNET_TQ branch
of zxdh_init_queue() is unreachable.  hdr_mz was already validated
earlier in the function.

Minor: Doxygen close "**/" used in several places where "*/" is the
correct terminator.

Patch 2/3 (optimize Rx recv pkts performance)
---------------------------------------------

Error: Double-free in zxdh_recv_single_pkts().  Both error paths
inside zxdh_init_mbuf() already call rte_pktmbuf_free(rxm), but the
caller also frees rxm on return < 0:

        if (unlikely(zxdh_init_mbuf(rxm, len, hw, &vq->rxq) < 0)) {
                rte_pktmbuf_free(rxm);   /* already freed inside */
                continue;
        }

Drop either the caller's free or the callees' frees, not both.

Warning: zxdh_set_rxtx_funcs() silently drops the
ZXDH_NET_F_MRG_RXBUF negotiation check.  The previous version
returned -1 if MRG_RXBUF was not advertised; the new version selects a
burst function unconditionally.  The multi-seg path
zxdh_recv_pkts_packed() reads header->type_hdr.num_buffers, which is
only meaningful when MRG_RXBUF is negotiated with the peer.

Warning: xstats "full", "norefill", "multicast_packets",
"broadcast_packets" (rx) and "norefill", "multicast_packets",
"broadcast_packets" (tx) are removed from the name tables.  If these
counters were never being updated, say so in the commit log.  If they
were, multicast_packets/broadcast_packets in particular are
operator-facing counters and this is a user-visible regression.

Warning: zxdh_scattered_rx() reads eth_dev->data->min_rx_buf_size,
which is populated during rx_queue_setup().  Depending on when
zxdh_set_rxtx_funcs() runs, "min_rx_buf_size - RTE_PKTMBUF_HEADROOM"
can underflow (uint16_t wraps) if min_rx_buf_size is 0 at that point.

Minor: Open-coded rte_pktmbuf_mtod().  Both the new zxdh_init_mbuf()
and the modified zxdh_recv_pkts_packed() use

        (char *)rxm->buf_addr + RTE_PKTMBUF_HEADROOM

where rte_pktmbuf_mtod(rxm, struct zxdh_net_hdr_ul *) expresses the
same intent.  data_off equals RTE_PKTMBUF_HEADROOM here because the
refill path aligns the hardware write target to
buf_iova + RTE_PKTMBUF_HEADROOM.

Minor: zxdh_init_mbuf() zeroes rxm->ol_flags, rxm->vlan_tci, and
rxm->vlan_tci_outer.  All three are already cleared by
rte_pktmbuf_reset() on alloc from the pool.

Minor: rxq_get_vq(q) is a trivial one-line macro aliasing "q->vq"
with no functional benefit.  Either drop it or apply it consistently.

Patch 3/3 (optimize Tx xmit pkts performance)
---------------------------------------------

Error: zxdh_xmit_pkts_simple() does not write txdp->id.  tx_bunch()
and tx1() write addr, len, and flags but leave id untouched.  The new
zxdh_xmit_fast_flush() reads "id = desc[used_idx].id" as the
chain-terminator for its inner do-while loop
("while (curr_id != id)").

Descriptors submitted by the simple path therefore carry stale ids:
either 0 at cold start from memzone init, or the self-index written
by a previous flush pass.  Because the flush rewrites
desc[used_idx].id = used_idx during processing, after one full warmup
cycle every desc[i].id == i and the inner do-while happens to exit
after one iteration.  But on a cold ring, or any ring whose
descriptors were left with non-self ids by a preceding append-path
burst, the inner loop keeps iterating, freeing cookies and advancing
used_idx across descriptors the backend has not marked used, until
used_idx wraps back to 0.  That corrupts vq_free_cnt and
vq_used_cons_idx accounting.

Fix: set txdp->id = avail_idx + i in tx_bunch/tx1 so the invariant is
explicit rather than relying on the flush's self-rewrite side effect.

Warning: zxdh_xmit_pkts_prepare() drops the
rte_net_intel_cksum_prepare() call.  If the driver still advertises
L4 checksum offload, pseudo-header checksum preparation becomes the
application's responsibility.  That's a user-visible contract change
and needs justification in the commit log, or should be paired with a
matching capability change.

Warning: zxdh_xmit_enqueue_append() now sets dxp->cookie = NULL for
the head slot and stores cookies per descriptor via dep[idx].cookie.
This works with the new per-descriptor free in
zxdh_xmit_fast_flush() (rte_pktmbuf_free_seg), but any residual code
path still reading vq_descx[head_id].cookie will see NULL.  Worth a
comment documenting the new invariant.

Minor: Extra initialization.  In pkt_padding(),

        struct zxdh_net_hdr_dl *hdr = NULL;

is immediately overwritten by the rte_pktmbuf_prepend() return.  In
submit_to_backend_simple(),

        struct rte_mbuf *m = NULL;

is overwritten on first use inside the loop.  Drop both initializers.

Minor: "mainpart = (nb_pkts & ((uint32_t)~N_PER_LOOP_MASK));" — the
uint32_t cast is pointless.  nb_pkts is uint16_t and N_PER_LOOP_MASK
is a small integer constant.

Minor: submit_to_backend_simple() uses "*(tx_pkts + i + j)" where
"tx_pkts[i + j]" reads more naturally and matches style elsewhere.

Minor: tx_bunch() is named to imply a variable batch but is hardcoded
to N_PER_LOOP iterations for single-segment packets only.  A one-line
comment noting that the simple path handles single-segment only
(selected when TX_OFFLOAD_MULTI_SEGS is off) would help.
```

next prev parent reply	other threads:[~2026-04-23 19:24 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-26  2:28 [PATCH v1] net/zxdh: optimize Rx/Tx path performance Junlong Wang
2026-03-26  3:27 ` Stephen Hemminger
2026-04-06  4:26 ` Stephen Hemminger
2026-04-23  1:18 ` [PATCH v2 0/3] " Junlong Wang
2026-04-23  1:18   ` [PATCH v2 1/3] net/zxdh: optimize queue structure to improve performance Junlong Wang
2026-04-23 18:57     ` Stephen Hemminger
2026-04-23  1:18   ` [PATCH v2 2/3] net/zxdh: optimize Rx recv pkts performance Junlong Wang
2026-04-23 18:54     ` Stephen Hemminger
2026-04-23 23:39     ` Stephen Hemminger
2026-04-23  1:18   ` [PATCH v2 3/3] net/zxdh: optimize Tx xmit " Junlong Wang
2026-04-23 19:23   ` Stephen Hemminger [this message]
2026-05-09  6:29   ` [PATCH v3 0/3] net/zxdh: optimize Rx/Tx path performance Junlong Wang
2026-05-09  6:29     ` [PATCH v3 1/3] net/zxdh: optimize queue structure to improve performance Junlong Wang
2026-05-18  2:20       ` Stephen Hemminger
2026-05-09  6:29     ` [PATCH v3 2/3] net/zxdh: optimize Rx recv pkts performance Junlong Wang
2026-05-09  6:29     ` [PATCH v3 3/3] net/zxdh: optimize Tx xmit " Junlong Wang
2026-05-18  2:22       ` Stephen Hemminger
2026-06-06  6:32     ` [PATCH v4 0/4] net/zxdh: optimize Rx/Tx path performance Junlong Wang
2026-06-06  6:32       ` [PATCH v4 1/4] net/zxdh: optimize queue structure to improve performance Junlong Wang
2026-06-06  6:32       ` [PATCH v4 2/4] net/zxdh: optimize Rx recv pkts performance Junlong Wang
2026-06-06  6:32       ` [PATCH v4 3/4] net/zxdh: optimize Tx xmit " Junlong Wang
2026-06-06  6:32       ` [PATCH v4 4/4] net/zxdh: fix queue enable intr issues Junlong Wang
2026-06-07 18:00       ` [PATCH v4 0/4] net/zxdh: optimize Rx/Tx path performance Stephen Hemminger
2026-06-15  1:19       ` [PATCH v5 " Junlong Wang
2026-06-15  1:19         ` [PATCH v5 1/4] net/zxdh: fix queue enable intr issues Junlong Wang
2026-06-15  1:19         ` [PATCH v5 2/4] net/zxdh: optimize queue structure to improve performance Junlong Wang
2026-06-15  1:19         ` [PATCH v5 3/4] net/zxdh: optimize Rx recv pkts performance Junlong Wang
2026-06-15  1:19         ` [PATCH v5 4/4] net/zxdh: optimize Tx xmit " Junlong Wang
2026-06-15 18:38           ` Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260423122357.3a91bccf@phoenix.local \
    --to=stephen@networkplumber.org \
    --cc=dev@dpdk.org \
    --cc=wang.junlong1@zte.com.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.