DPDK-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: Junlong Wang <wang.junlong1@zte.com.cn>
Cc: dev@dpdk.org
Subject: Re: [PATCH v2 0/3] net/zxdh: optimize Rx/Tx path performance
Date: Thu, 23 Apr 2026 12:23:57 -0700	[thread overview]
Message-ID: <20260423122357.3a91bccf@phoenix.local> (raw)
In-Reply-To: <20260423011820.2426203-1-wang.junlong1@zte.com.cn>

On Thu, 23 Apr 2026 09:18:15 +0800
Junlong Wang <wang.junlong1@zte.com.cn> wrote:

> v2:
>   - zxdh_rxtx.c, pkt_padding(): modifyed the return value of pkt_padding();
>   - zxdh_rxtx.c, zxdh_recv_single_pkts(): modifyed When zxdh_init_mbuf() fails
>     the loop does "continue" and free mbufs;
>   - zxdh_rxtx.c, refill_desc_unwrap(): Add rte_io_wmb() before writing flags
>     in the refill_que_descs();
>   - zxdh_queue.h, zxdh_queue_enable_intr(): Remove unnecessary function of zxdh_queue_enable_intr;
>   - zxdh_ethdev.c, zxdh_init_queue(): changed the hdr_mz NULL check logic;
> 
>   - zxdh_rxtx.c, zxdh_xmit_pkts_simple()、zxdh_recv_single_pkts(): add stats.bytes count;
>   - zxdh_rxtx.c, zxdh_init_mbuf():remove  rte_pktmbuf_dump(stdout, rxm, 40);
>   - zxdh_ethdev.c, zxdh_dev_free_mbufs(): using rte_pktmbuf_free() to free mbufs;
>   - Splitting into separate patches, structure reorganization and sw_ring removal、
>     RX recv optimize、Tx xmit optimize、Tx;
> 
> v1:
>   This patch optimizes the ZXDH PMD's receive and transmit path for better
>   performance through several improvements:
> 
> - Add simple TX/RX burst functions (zxdh_xmit_pkts_simple and
>   zxdh_recv_single_pkts) for single-segment packet scenarios.
> - Remove RX software ring (sw_ring) to reduce memory allocation and
>   copy.
> - Optimize descriptor management with prefetching and simplified
>   cleanup.
> - Reorganize structure fields for better cache locality.
> 
>   These changes reduce CPU cycles and memory bandwidth consumption,
>   resulting in improved packet processing throughput.
> 
> Junlong Wang (3):
>   net/zxdh: optimize queue structure to improve performance
>   net/zxdh: optimize Rx recv pkts performance
>   net/zxdh: optimize Tx xmit pkts performance
> 
>  drivers/net/zxdh/zxdh_ethdev.c     |  88 ++---
>  drivers/net/zxdh/zxdh_ethdev_ops.c |  24 +-
>  drivers/net/zxdh/zxdh_ethdev_ops.h |   4 +
>  drivers/net/zxdh/zxdh_pci.c        |   2 +-
>  drivers/net/zxdh/zxdh_queue.c      |  31 +-
>  drivers/net/zxdh/zxdh_queue.h      | 125 +++----
>  drivers/net/zxdh/zxdh_rxtx.c       | 520 +++++++++++++++++++++--------
>  drivers/net/zxdh/zxdh_rxtx.h       |  27 +-
>  8 files changed, 527 insertions(+), 294 deletions(-)
> 

Here is the full AI review of this patchset
```
Review of v2 series: net/zxdh queue/Rx/Tx optimization

Overall: needs a v3. Findings below, organized by patch.


Patch 1/3 (optimize queue structure)
------------------------------------

Warning: zxdh_queue_notify() in zxdh_queue.h now hardcodes the bit-31
avail-wrap into notify_data unconditionally, removing the
zxdh_pci_with_feature(hw, ZXDH_F_RING_PACKED) gate that the previous
out-of-line version in zxdh_pci.c had.  The series also adds a
vq_split member to the virtqueue union, which suggests split-ring
support is planned.  Once that exists, this helper will corrupt notify
data for split rings.  Either reinstate the feature gate or keep the
dispatch going through VTPCI_OPS()->notify_queue().

Warning: rx_queue_intr_enable/rx_queue_intr_disable dev_ops and the
zxdh_queue_enable_intr() helper are removed.  The commit log talks
about cache locality and sw_ring removal but not this.  Please split
it into its own patch with a justification, or at minimum call it out
in the commit message.

Minor: fail_q_alloc now does "if (hdr_mz) rte_memzone_free(hdr_mz);".
rte_memzone_free() accepts NULL; the guard is unnecessary.

Minor: The new "if (hdr_mz == NULL)" check inside the VTNET_TQ branch
of zxdh_init_queue() is unreachable.  hdr_mz was already validated
earlier in the function.

Minor: Doxygen close "**/" used in several places where "*/" is the
correct terminator.


Patch 2/3 (optimize Rx recv pkts performance)
---------------------------------------------

Error: Double-free in zxdh_recv_single_pkts().  Both error paths
inside zxdh_init_mbuf() already call rte_pktmbuf_free(rxm), but the
caller also frees rxm on return < 0:

        if (unlikely(zxdh_init_mbuf(rxm, len, hw, &vq->rxq) < 0)) {
                rte_pktmbuf_free(rxm);   /* already freed inside */
                continue;
        }

Drop either the caller's free or the callees' frees, not both.

Warning: zxdh_set_rxtx_funcs() silently drops the
ZXDH_NET_F_MRG_RXBUF negotiation check.  The previous version
returned -1 if MRG_RXBUF was not advertised; the new version selects a
burst function unconditionally.  The multi-seg path
zxdh_recv_pkts_packed() reads header->type_hdr.num_buffers, which is
only meaningful when MRG_RXBUF is negotiated with the peer.

Warning: xstats "full", "norefill", "multicast_packets",
"broadcast_packets" (rx) and "norefill", "multicast_packets",
"broadcast_packets" (tx) are removed from the name tables.  If these
counters were never being updated, say so in the commit log.  If they
were, multicast_packets/broadcast_packets in particular are
operator-facing counters and this is a user-visible regression.

Warning: zxdh_scattered_rx() reads eth_dev->data->min_rx_buf_size,
which is populated during rx_queue_setup().  Depending on when
zxdh_set_rxtx_funcs() runs, "min_rx_buf_size - RTE_PKTMBUF_HEADROOM"
can underflow (uint16_t wraps) if min_rx_buf_size is 0 at that point.

Minor: Open-coded rte_pktmbuf_mtod().  Both the new zxdh_init_mbuf()
and the modified zxdh_recv_pkts_packed() use

        (char *)rxm->buf_addr + RTE_PKTMBUF_HEADROOM

where rte_pktmbuf_mtod(rxm, struct zxdh_net_hdr_ul *) expresses the
same intent.  data_off equals RTE_PKTMBUF_HEADROOM here because the
refill path aligns the hardware write target to
buf_iova + RTE_PKTMBUF_HEADROOM.

Minor: zxdh_init_mbuf() zeroes rxm->ol_flags, rxm->vlan_tci, and
rxm->vlan_tci_outer.  All three are already cleared by
rte_pktmbuf_reset() on alloc from the pool.

Minor: rxq_get_vq(q) is a trivial one-line macro aliasing "q->vq"
with no functional benefit.  Either drop it or apply it consistently.


Patch 3/3 (optimize Tx xmit pkts performance)
---------------------------------------------

Error: zxdh_xmit_pkts_simple() does not write txdp->id.  tx_bunch()
and tx1() write addr, len, and flags but leave id untouched.  The new
zxdh_xmit_fast_flush() reads "id = desc[used_idx].id" as the
chain-terminator for its inner do-while loop
("while (curr_id != id)").

Descriptors submitted by the simple path therefore carry stale ids:
either 0 at cold start from memzone init, or the self-index written
by a previous flush pass.  Because the flush rewrites
desc[used_idx].id = used_idx during processing, after one full warmup
cycle every desc[i].id == i and the inner do-while happens to exit
after one iteration.  But on a cold ring, or any ring whose
descriptors were left with non-self ids by a preceding append-path
burst, the inner loop keeps iterating, freeing cookies and advancing
used_idx across descriptors the backend has not marked used, until
used_idx wraps back to 0.  That corrupts vq_free_cnt and
vq_used_cons_idx accounting.

Fix: set txdp->id = avail_idx + i in tx_bunch/tx1 so the invariant is
explicit rather than relying on the flush's self-rewrite side effect.

Warning: zxdh_xmit_pkts_prepare() drops the
rte_net_intel_cksum_prepare() call.  If the driver still advertises
L4 checksum offload, pseudo-header checksum preparation becomes the
application's responsibility.  That's a user-visible contract change
and needs justification in the commit log, or should be paired with a
matching capability change.

Warning: zxdh_xmit_enqueue_append() now sets dxp->cookie = NULL for
the head slot and stores cookies per descriptor via dep[idx].cookie.
This works with the new per-descriptor free in
zxdh_xmit_fast_flush() (rte_pktmbuf_free_seg), but any residual code
path still reading vq_descx[head_id].cookie will see NULL.  Worth a
comment documenting the new invariant.

Minor: Extra initialization.  In pkt_padding(),

        struct zxdh_net_hdr_dl *hdr = NULL;

is immediately overwritten by the rte_pktmbuf_prepend() return.  In
submit_to_backend_simple(),

        struct rte_mbuf *m = NULL;

is overwritten on first use inside the loop.  Drop both initializers.

Minor: "mainpart = (nb_pkts & ((uint32_t)~N_PER_LOOP_MASK));" — the
uint32_t cast is pointless.  nb_pkts is uint16_t and N_PER_LOOP_MASK
is a small integer constant.

Minor: submit_to_backend_simple() uses "*(tx_pkts + i + j)" where
"tx_pkts[i + j]" reads more naturally and matches style elsewhere.

Minor: tx_bunch() is named to imply a variable batch but is hardcoded
to N_PER_LOOP iterations for single-segment packets only.  A one-line
comment noting that the simple path handles single-segment only
(selected when TX_OFFLOAD_MULTI_SEGS is off) would help.
```

  parent reply	other threads:[~2026-04-23 19:24 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-26  2:28 [PATCH v1] net/zxdh: optimize Rx/Tx path performance Junlong Wang
2026-03-26  3:27 ` Stephen Hemminger
2026-04-06  4:26 ` Stephen Hemminger
2026-04-23  1:18 ` [PATCH v2 0/3] " Junlong Wang
2026-04-23  1:18   ` [PATCH v2 1/3] net/zxdh: optimize queue structure to improve performance Junlong Wang
2026-04-23 18:57     ` Stephen Hemminger
2026-04-23  1:18   ` [PATCH v2 2/3] net/zxdh: optimize Rx recv pkts performance Junlong Wang
2026-04-23 18:54     ` Stephen Hemminger
2026-04-23 23:39     ` Stephen Hemminger
2026-04-23  1:18   ` [PATCH v2 3/3] net/zxdh: optimize Tx xmit " Junlong Wang
2026-04-23 19:23   ` Stephen Hemminger [this message]
2026-05-09  6:29   ` [PATCH v3 0/3] net/zxdh: optimize Rx/Tx path performance Junlong Wang
2026-05-09  6:29     ` [PATCH v3 1/3] net/zxdh: optimize queue structure to improve performance Junlong Wang
2026-05-09  6:29     ` [PATCH v3 2/3] net/zxdh: optimize Rx recv pkts performance Junlong Wang
2026-05-09  6:29     ` [PATCH v3 3/3] net/zxdh: optimize Tx xmit " Junlong Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260423122357.3a91bccf@phoenix.local \
    --to=stephen@networkplumber.org \
    --cc=dev@dpdk.org \
    --cc=wang.junlong1@zte.com.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox