All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: Dawid Wesierski <dawid.wesierski@intel.com>
Cc: dev@dpdk.org, thomas@monjalon.net, david.marchand@redhat.com,
	Marek Kasiewicz <marek.kasiewicz@intel.com>
Subject: Re: [PATCH v2 0/7] Intel network drivers enhancements
Date: Thu, 18 Jun 2026 08:46:48 -0700	[thread overview]
Message-ID: <20260618084648.7c43eb59@phoenix.local> (raw)
In-Reply-To: <20260618144442.312844-1-dawid.wesierski@intel.com>

On Thu, 18 Jun 2026 10:44:35 -0400
Dawid Wesierski <dawid.wesierski@intel.com> wrote:

> From: Marek Kasiewicz <marek.kasiewicz@intel.com>
> 
> This series introduces several improvements to Intel iavf and ice
> drivers, including a new ethdev API for header split mbuf callbacks,
> increased ring descriptors, and improved PTP timestamping.
> 
> Marek Kasiewicz (7):
>   ethdev: add header split mbuf callback API
>   net/iavf: increase max ring descriptors to hardware limit
>   net/iavf: allow runtime queue rate limit configuration
>   net/ice/base: reduce default scheduler burst size
>   net/ice: timestamp all received packets when PTP is enabled
>   net/iavf: disable runtime queue setup capability
>   net/intel: support header split mbuf callback
> 
>  drivers/net/intel/common/rx.h         |  2 +
>  drivers/net/intel/iavf/iavf_ethdev.c  |  3 --
>  drivers/net/intel/iavf/iavf_rxtx.h    |  2 +-
>  drivers/net/intel/iavf/iavf_tm.c      | 11 ++--
>  drivers/net/intel/ice/base/ice_type.h |  2 +-
>  drivers/net/intel/ice/ice_ethdev.c    |  1 +
>  drivers/net/intel/ice/ice_rxtx.c      | 72 ++++++++++++++++++++++++---
>  drivers/net/intel/ice/ice_rxtx.h      |  2 +
>  lib/ethdev/ethdev_driver.h            | 15 +++++++
>  lib/ethdev/rte_ethdev.c               | 51 ++++++++++++++++++++++
>  lib/ethdev/rte_ethdev.h               |  7 +++
>  11 files changed, 153 insertions(+), 15 deletions(-)
> 

AI review found lots of problems with the implementation as well.

Series composition

This v2 series bundles seven patches under one cover letter, but only
1/7 (ethdev API) and 7/7 (net/intel) implement the header-split mbuf
callback. Patches 2/7 (iavf max ring desc), 3/7 (iavf queue rate
limit), 4/7 (ice base burst size), 5/7 (ice PTP timestamp), and 6/7
(iavf runtime queue setup) are unrelated driver changes. Please split
these into a separate series so each can be reviewed and merged on its
own merits; mixing them obscures the dependency and complicates bisect.


[PATCH v2 1/7] ethdev: add header split mbuf callback API

Warning: struct rte_eth_hdrs_mbuf carries only buf_addr and buf_iova,
no length. The callback therefore cannot tell the PMD how large the
application buffer is. In buffer-split mode the NIC DMAs up to the
queue's configured payload buffer size; if the application buffer is
smaller, the hardware writes past it. Add a length field and have the
PMD (and/or ethdev) validate it against the configured payload size.

Warning: the buffer lifetime and ownership are undocumented. The
Doxygen does not state who owns the buffer after the payload mbuf is
freed, whether the address is consumed as-is or with headroom applied
(the 7/7 implementation adds RTE_PKTMBUF_HEADROOM to the IOVA -- see
below), or that the callback must return a distinct buffer per call.
These are contract details an application cannot guess.

Warning: new experimental API with no testpmd hook and no functional
test in app/test. Both are required for new ethdev API.

Warning: new experimental public API but no release notes entry
(doc/guides/rel_notes/release_26_07.rst). The series touches only the
three code files.

Info: rte_eth_hdrs_set_mbuf_callback() validates port_id and the
hdrs_mbuf_set_cb hook, but passes rx_queue_id to the PMD without
checking it against dev->data->nb_rx_queues. Sibling per-queue ethdev
calls (e.g. rte_eth_rx_queue_setup) validate the queue id at the
ethdev layer. The ICE PMD does check it in 7/7, but validating in the
wrapper would be consistent and would protect future PMDs that forget.


[PATCH v2 7/7] net/intel: support header split mbuf callback

Error: overwriting buf_addr/buf_iova on a pool mbuf corrupts the
payload mempool. At all three sites (ice_alloc_rx_queue_mbufs,
ice_rx_alloc_bufs, ice_recv_pkts) the payload mbuf is allocated from
rxq->rxseg[1].mp and then has its buf_addr/buf_iova rewritten to point
at application memory:

	if (ret >= 0) {
		mbuf_pay->buf_addr = hdrs_mbuf.buf_addr;
		mbuf_pay->buf_iova = hdrs_mbuf.buf_iova;
	}

DPDK never restores buf_addr/buf_iova on free. When these mbufs are
returned to rxseg[1].mp (via rte_pktmbuf_free of the received chain,
or _ice_rx_queue_release_mbufs on queue stop), the pool objects retain
the foreign buffer pointer, and buf_len still describes the original
mempool buffer. The next rte_mbuf_raw_alloc()/raw_alloc_bulk() from
that pool hands back an mbuf pointing at application memory. The
correct mechanism for pointing an mbuf at non-pool memory is
rte_pktmbuf_attach_extbuf() (sets buf_addr/buf_iova/buf_len, the
EXTERNAL flag, and a free callback), or a dedicated external-buffer
mempool. As written the payload pool is progressively poisoned.

Error: the callback's failure return is swallowed. On ret < 0 the code
skips the address update but still programs the descriptor from the
mbuf's current buf_iova:

	rxdp->read.pkt_addr =
		rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb_pay));

Because of the pool corruption above, a recycled mbuf's buf_iova may
point at a buffer the application already reclaimed, so a failed
callback silently arms the NIC to DMA into stale/foreign memory. A
callback failure should abort the refill (or fall back to a known-good
pool buffer), not fall through. Separately, the documented contract is
"0 on success, negative on failure"; the test should be ret == 0, not
ret >= 0.

Warning: RTE_PKTMBUF_HEADROOM is silently applied to the supplied
IOVA. The descriptor is programmed with rte_mbuf_data_iova_default(),
i.e. buf_iova + data_off (128), but the API documents buf_iova as "the
IOVA of the payload buffer". The NIC therefore writes 128 bytes into
the application buffer, and the tail of a buffer sized to the payload
overruns by the headroom. Either document that the headroom offset is
applied, or program pkt_addr from the raw buf_iova for callback-
provided buffers.


  parent reply	other threads:[~2026-06-18 15:47 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-08 16:40 [PATCH 0/7] intel network and pcapng updates Dawid Wesierski
2026-06-08 16:40 ` [PATCH 1/7] net/iavf: increase max ring descriptors to hardware limit Dawid Wesierski
2026-06-08 16:40 ` [PATCH 2/7] net/iavf: allow runtime queue rate limit configuration Dawid Wesierski
2026-06-08 16:40 ` [PATCH 3/7] net/ice/base: reduce default scheduler burst size Dawid Wesierski
2026-06-08 16:40 ` [PATCH 4/7] net/ice: timestamp all received packets when PTP is enabled Dawid Wesierski
2026-06-08 16:40 ` [PATCH 5/7] net/iavf: disable runtime queue setup capability Dawid Wesierski
2026-06-08 16:40 ` [PATCH 6/7] pcapng: add user-supplied timestamp support Dawid Wesierski
2026-06-08 17:09   ` Stephen Hemminger
2026-06-08 16:40 ` [PATCH 7/7] net/ice: add header split mbuf callback support Dawid Wesierski
2026-06-08 16:59 ` [PATCH 0/7] intel network and pcapng updates Thomas Monjalon
2026-06-18 14:44 ` [PATCH v3 1/1] pcapng: add user-supplied timestamp support Dawid Wesierski
2026-06-18 15:20   ` Stephen Hemminger
2026-06-18 14:44 ` [PATCH v2 0/7] Intel network drivers enhancements Dawid Wesierski
2026-06-18 14:44   ` [PATCH v2 1/7] ethdev: add header split mbuf callback API Dawid Wesierski
2026-06-18 16:26     ` Thomas Monjalon
2026-06-18 14:44   ` [PATCH v2 2/7] net/iavf: increase max ring descriptors to hardware limit Dawid Wesierski
2026-06-18 14:44   ` [PATCH v2 3/7] net/iavf: allow runtime queue rate limit configuration Dawid Wesierski
2026-06-18 14:44   ` [PATCH v2 4/7] net/ice/base: reduce default scheduler burst size Dawid Wesierski
2026-06-18 14:44   ` [PATCH v2 5/7] net/ice: timestamp all received packets when PTP is enabled Dawid Wesierski
2026-06-18 14:44   ` [PATCH v2 6/7] net/iavf: disable runtime queue setup capability Dawid Wesierski
2026-06-18 14:44   ` [PATCH v2 7/7] net/intel: support header split mbuf callback Dawid Wesierski
2026-06-18 15:45   ` [PATCH v2 0/7] Intel network drivers enhancements Stephen Hemminger
2026-06-18 15:46   ` Stephen Hemminger [this message]
  -- strict thread matches above, loose matches on Subject: below --
2026-06-18 14:38 Dawid Wesierski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260618084648.7c43eb59@phoenix.local \
    --to=stephen@networkplumber.org \
    --cc=david.marchand@redhat.com \
    --cc=dawid.wesierski@intel.com \
    --cc=dev@dpdk.org \
    --cc=marek.kasiewicz@intel.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.