DPDK-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: Dawid Wesierski <dawid.wesierski@intel.com>
Cc: dev@dpdk.org, thomas@monjalon.net, david.marchand@redhat.com,
	Marek Kasiewicz <marek.kasiewicz@intel.com>
Subject: Re: [PATCH v2 0/7] Intel network drivers enhancements
Date: Thu, 18 Jun 2026 08:46:48 -0700	[thread overview]
Message-ID: <20260618084648.7c43eb59@phoenix.local> (raw)
In-Reply-To: <20260618144442.312844-1-dawid.wesierski@intel.com>

On Thu, 18 Jun 2026 10:44:35 -0400
Dawid Wesierski <dawid.wesierski@intel.com> wrote:

> From: Marek Kasiewicz <marek.kasiewicz@intel.com>
> 
> This series introduces several improvements to Intel iavf and ice
> drivers, including a new ethdev API for header split mbuf callbacks,
> increased ring descriptors, and improved PTP timestamping.
> 
> Marek Kasiewicz (7):
>   ethdev: add header split mbuf callback API
>   net/iavf: increase max ring descriptors to hardware limit
>   net/iavf: allow runtime queue rate limit configuration
>   net/ice/base: reduce default scheduler burst size
>   net/ice: timestamp all received packets when PTP is enabled
>   net/iavf: disable runtime queue setup capability
>   net/intel: support header split mbuf callback
> 
>  drivers/net/intel/common/rx.h         |  2 +
>  drivers/net/intel/iavf/iavf_ethdev.c  |  3 --
>  drivers/net/intel/iavf/iavf_rxtx.h    |  2 +-
>  drivers/net/intel/iavf/iavf_tm.c      | 11 ++--
>  drivers/net/intel/ice/base/ice_type.h |  2 +-
>  drivers/net/intel/ice/ice_ethdev.c    |  1 +
>  drivers/net/intel/ice/ice_rxtx.c      | 72 ++++++++++++++++++++++++---
>  drivers/net/intel/ice/ice_rxtx.h      |  2 +
>  lib/ethdev/ethdev_driver.h            | 15 +++++++
>  lib/ethdev/rte_ethdev.c               | 51 ++++++++++++++++++++++
>  lib/ethdev/rte_ethdev.h               |  7 +++
>  11 files changed, 153 insertions(+), 15 deletions(-)
> 

AI review found lots of problems with the implementation as well.

Series composition

This v2 series bundles seven patches under one cover letter, but only
1/7 (ethdev API) and 7/7 (net/intel) implement the header-split mbuf
callback. Patches 2/7 (iavf max ring desc), 3/7 (iavf queue rate
limit), 4/7 (ice base burst size), 5/7 (ice PTP timestamp), and 6/7
(iavf runtime queue setup) are unrelated driver changes. Please split
these into a separate series so each can be reviewed and merged on its
own merits; mixing them obscures the dependency and complicates bisect.


[PATCH v2 1/7] ethdev: add header split mbuf callback API

Warning: struct rte_eth_hdrs_mbuf carries only buf_addr and buf_iova,
no length. The callback therefore cannot tell the PMD how large the
application buffer is. In buffer-split mode the NIC DMAs up to the
queue's configured payload buffer size; if the application buffer is
smaller, the hardware writes past it. Add a length field and have the
PMD (and/or ethdev) validate it against the configured payload size.

Warning: the buffer lifetime and ownership are undocumented. The
Doxygen does not state who owns the buffer after the payload mbuf is
freed, whether the address is consumed as-is or with headroom applied
(the 7/7 implementation adds RTE_PKTMBUF_HEADROOM to the IOVA -- see
below), or that the callback must return a distinct buffer per call.
These are contract details an application cannot guess.

Warning: new experimental API with no testpmd hook and no functional
test in app/test. Both are required for new ethdev API.

Warning: new experimental public API but no release notes entry
(doc/guides/rel_notes/release_26_07.rst). The series touches only the
three code files.

Info: rte_eth_hdrs_set_mbuf_callback() validates port_id and the
hdrs_mbuf_set_cb hook, but passes rx_queue_id to the PMD without
checking it against dev->data->nb_rx_queues. Sibling per-queue ethdev
calls (e.g. rte_eth_rx_queue_setup) validate the queue id at the
ethdev layer. The ICE PMD does check it in 7/7, but validating in the
wrapper would be consistent and would protect future PMDs that forget.


[PATCH v2 7/7] net/intel: support header split mbuf callback

Error: overwriting buf_addr/buf_iova on a pool mbuf corrupts the
payload mempool. At all three sites (ice_alloc_rx_queue_mbufs,
ice_rx_alloc_bufs, ice_recv_pkts) the payload mbuf is allocated from
rxq->rxseg[1].mp and then has its buf_addr/buf_iova rewritten to point
at application memory:

	if (ret >= 0) {
		mbuf_pay->buf_addr = hdrs_mbuf.buf_addr;
		mbuf_pay->buf_iova = hdrs_mbuf.buf_iova;
	}

DPDK never restores buf_addr/buf_iova on free. When these mbufs are
returned to rxseg[1].mp (via rte_pktmbuf_free of the received chain,
or _ice_rx_queue_release_mbufs on queue stop), the pool objects retain
the foreign buffer pointer, and buf_len still describes the original
mempool buffer. The next rte_mbuf_raw_alloc()/raw_alloc_bulk() from
that pool hands back an mbuf pointing at application memory. The
correct mechanism for pointing an mbuf at non-pool memory is
rte_pktmbuf_attach_extbuf() (sets buf_addr/buf_iova/buf_len, the
EXTERNAL flag, and a free callback), or a dedicated external-buffer
mempool. As written the payload pool is progressively poisoned.

Error: the callback's failure return is swallowed. On ret < 0 the code
skips the address update but still programs the descriptor from the
mbuf's current buf_iova:

	rxdp->read.pkt_addr =
		rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb_pay));

Because of the pool corruption above, a recycled mbuf's buf_iova may
point at a buffer the application already reclaimed, so a failed
callback silently arms the NIC to DMA into stale/foreign memory. A
callback failure should abort the refill (or fall back to a known-good
pool buffer), not fall through. Separately, the documented contract is
"0 on success, negative on failure"; the test should be ret == 0, not
ret >= 0.

Warning: RTE_PKTMBUF_HEADROOM is silently applied to the supplied
IOVA. The descriptor is programmed with rte_mbuf_data_iova_default(),
i.e. buf_iova + data_off (128), but the API documents buf_iova as "the
IOVA of the payload buffer". The NIC therefore writes 128 bytes into
the application buffer, and the tail of a buffer sized to the payload
overruns by the headroom. Either document that the headroom offset is
applied, or program pkt_addr from the raw buf_iova for callback-
provided buffers.


  parent reply	other threads:[~2026-06-18 15:47 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-08 16:40 [PATCH 0/7] intel network and pcapng updates Dawid Wesierski
2026-06-08 16:40 ` [PATCH 1/7] net/iavf: increase max ring descriptors to hardware limit Dawid Wesierski
2026-06-08 16:40 ` [PATCH 2/7] net/iavf: allow runtime queue rate limit configuration Dawid Wesierski
2026-06-08 16:40 ` [PATCH 3/7] net/ice/base: reduce default scheduler burst size Dawid Wesierski
2026-06-08 16:40 ` [PATCH 4/7] net/ice: timestamp all received packets when PTP is enabled Dawid Wesierski
2026-06-08 16:40 ` [PATCH 5/7] net/iavf: disable runtime queue setup capability Dawid Wesierski
2026-06-08 16:40 ` [PATCH 6/7] pcapng: add user-supplied timestamp support Dawid Wesierski
2026-06-08 17:09   ` Stephen Hemminger
2026-06-08 16:40 ` [PATCH 7/7] net/ice: add header split mbuf callback support Dawid Wesierski
2026-06-08 16:59 ` [PATCH 0/7] intel network and pcapng updates Thomas Monjalon
2026-06-18 14:44 ` [PATCH v3 1/1] pcapng: add user-supplied timestamp support Dawid Wesierski
2026-06-18 15:20   ` Stephen Hemminger
2026-06-18 14:44 ` [PATCH v2 0/7] Intel network drivers enhancements Dawid Wesierski
2026-06-18 14:44   ` [PATCH v2 1/7] ethdev: add header split mbuf callback API Dawid Wesierski
2026-06-18 16:26     ` Thomas Monjalon
2026-06-18 14:44   ` [PATCH v2 2/7] net/iavf: increase max ring descriptors to hardware limit Dawid Wesierski
2026-06-18 14:44   ` [PATCH v2 3/7] net/iavf: allow runtime queue rate limit configuration Dawid Wesierski
2026-06-18 14:44   ` [PATCH v2 4/7] net/ice/base: reduce default scheduler burst size Dawid Wesierski
2026-06-18 14:44   ` [PATCH v2 5/7] net/ice: timestamp all received packets when PTP is enabled Dawid Wesierski
2026-06-18 14:44   ` [PATCH v2 6/7] net/iavf: disable runtime queue setup capability Dawid Wesierski
2026-06-18 14:44   ` [PATCH v2 7/7] net/intel: support header split mbuf callback Dawid Wesierski
2026-06-18 15:45   ` [PATCH v2 0/7] Intel network drivers enhancements Stephen Hemminger
2026-06-18 15:46   ` Stephen Hemminger [this message]
  -- strict thread matches above, loose matches on Subject: below --
2026-06-18 14:38 Dawid Wesierski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260618084648.7c43eb59@phoenix.local \
    --to=stephen@networkplumber.org \
    --cc=david.marchand@redhat.com \
    --cc=dawid.wesierski@intel.com \
    --cc=dev@dpdk.org \
    --cc=marek.kasiewicz@intel.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox