DPDK-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: Chengwen Feng <fengchengwen@huawei.com>
Cc: <thomas@monjalon.net>, <dev@dpdk.org>, <wathsala.vithanage@arm.com>
Subject: Re: [PATCH v1 0/4] Introduce generic cache stash API and PCIe TPH implementation
Date: Tue, 12 May 2026 16:38:18 +0200	[thread overview]
Message-ID: <20260512163818.2a5859eb@stephen-xps.local> (raw)
In-Reply-To: <20260508092855.51987-1-fengchengwen@huawei.com>

On Fri, 8 May 2026 17:28:51 +0800
Chengwen Feng <fengchengwen@huawei.com> wrote:

> This patch set introduces a generic ethdev cache stash framework,
> provides PCIe TPH (Transaction Processing Hints) implementation
> on PCI/VFIO bus, enables TPH cache stash support for e1000/igb PMD,
> and adds testpmd command line configuration.
> 
> The patchset is organized as follows:
> - Patch 1/4: bus/pci: introduce PCIe TPH support
>   Add generic PCIe TPH APIs and VFIO backend implementation,
>   with stub implementations for BSD and Windows.
> 
> - Patch 2/4: ethdev: introduce generic cache stash API
>   Define unified cache stash capability/operation/config abstraction,
>   support device-level and per-queue stash management.
> 
> - Patch 3/4: net/e1000: add cache stash support via TPH
>   Implement ethdev cache stash ops for igb PMD based on PCIe TPH.
> 
> - Patch 4/4: app/testpmd: add TPH stash objects configuration
>   Add testpmd CLI option to configure stash objects and
>   automatically apply stash enable/disable on forwarding start/stop.
> 
> Note:
> 1. This implementation references the Linux kernel VFIO TPH uapi
>    series (v8 version), which is not yet merged upstream:
>    https://lore.kernel.org/all/20260508064053.37529-1-fengchengwen@huawei.com/
> 
> 2. The API design and cache stash model learn from the prior  work
>    by Wathsala Vithanage: "An API for Cache Stashing with TPH".
> 
> 3. The e1000/igb PMD is enabled as the first user of this API
>    because it is the only PCIe TPH capable hardware available
>    for testing at this time. The generic API design is extensible
>    to other NICs and future cache stash mechanisms.
> 
> Chengwen Feng (4):
>   bus/pci: introduce PCIe TPH support
>   ethdev: introduce generic cache stash API
>   net/e1000: add cache stash support via TPH
>   app/testpmd: add TPH stash objects configuration
> 
>  app/test-pmd/parameters.c               |  20 ++
>  app/test-pmd/testpmd.c                  |  81 +++++++
>  app/test-pmd/testpmd.h                  |   1 +
>  drivers/bus/pci/bsd/pci.c               |  50 +++++
>  drivers/bus/pci/linux/pci.c             |  48 +++++
>  drivers/bus/pci/linux/pci_init.h        |   9 +
>  drivers/bus/pci/linux/pci_vfio.c        | 272 ++++++++++++++++++++++++
>  drivers/bus/pci/rte_bus_pci.h           | 112 ++++++++++
>  drivers/bus/pci/windows/pci.c           |  50 +++++
>  drivers/net/intel/e1000/base/e1000_hw.h |   2 +
>  drivers/net/intel/e1000/base/e1000_vf.h |   2 +
>  drivers/net/intel/e1000/igb_ethdev.c    | 221 +++++++++++++++++++
>  lib/ethdev/ethdev_driver.h              |  37 ++++
>  lib/ethdev/rte_ethdev.c                 |  40 ++++
>  lib/ethdev/rte_ethdev.h                 |  95 +++++++++
>  15 files changed, 1040 insertions(+)
> 

The longer AI generated review


Subject: [v2 0/4] PCIe TPH and ethdev cache stash - review

Review of "PCIe TPH support" / "generic cache stash" series v2.


Patch 1 - bus/pci: introduce PCIe TPH support
=============================================

Error: pci_vfio_tph_query() extracts the ST Table Size with the wrong
field mask:

	*st_table_sz = RTE_FIELD_GET32(PCI_TPH_LOC_MSIX, cap) + 1;

PCI_TPH_LOC_MSIX is 0x00000400 (bit 10, an ST Table Location bit). Per
PCIe Base Spec, ST Table Size is bits [26:16] of the TPH Requester
Capability Register; Linux defines this as PCI_TPH_CAP_ST_MASK
(0x07FF0000) with shift 16. The current code returns 1 or 2 instead of
the real table size and the result is meaningless. Add the proper
mask/shift, e.g.:

	#define PCI_TPH_CAP_ST_MASK	0x07FF0000
	#define PCI_TPH_CAP_ST_SHIFT	16
	*st_table_sz = ((cap & PCI_TPH_CAP_ST_MASK) >> PCI_TPH_CAP_ST_SHIFT) + 1;

Additionally, per PCIe spec / Linux pci_get_tph_st_num_entries(), the
Size field is only meaningful when the table is located in capability
space. When the ST table is in the MSI-X table, the entry count comes
from the MSI-X table size, not from this field. The branch that returns
the same value for LOC_CAP and LOC_MSIX is incorrect for the MSI-X
case.

Error: pci_vfio_tph_st_get() and pci_vfio_tph_st_set() return raw -1 on
the vfio_dev_fd lookup failure and return the raw ioctl(2) return value
on ioctl error. ioctl() returns -1 with errno set; the function loses
errno and reports -1 to its callers. Return -errno (or a meaningful
-EXXX in the fd case):

	if (vfio_dev_fd < 0)
		return -ENODEV;
	...
	if (ioctl(vfio_dev_fd, VFIO_DEVICE_FEATURE, feature) < 0) {
		ret = -errno;
		free(feature);
		return ret;
	}

Error: kernel/linux/uapi/linux/vfio.h adds VFIO_DEVICE_FEATURE_TPH_ST
= 13 and the surrounding vfio_device_feature_tph_st structure. The
in-tree uapi is synced from a specific upstream kernel
(kernel/linux/uapi/version says v6.16, where the last
VFIO_DEVICE_FEATURE_* is BUS_MASTER = 10). Picking 13 ahead of an
accepted kernel change is risky - upstream may take 11/12/13 for other
features, creating a permanent ABI clash. DPDK uapi headers should only
mirror accepted kernel UAPI; please rebase once the kernel change is
upstream.

Warning: rte_pci_tph_st_set() doxygen uses "@index" rather than
"@param index", and the description "Array of entries with CPU IDs and
index indices to program" reads as a typo. Also rte_pci_tph_disable()
says "negative value on error" while the rest of the new API says
"negative errno value on error" - please make these consistent.


Patch 2 - ethdev: introduce generic cache stash API
===================================================

Error: RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_cache_stash_get, 26.03)
and ..._set use ABI version 26.03. The tree is currently 26.07.0-rc0
(and patch 1 correctly uses 26.07). 26.03 is a past release; new
symbols added in this cycle must use 26.07.

Warning: rte_eth_cache_stash_set() does not validate op against
RTE_ETH_CACHE_STASH_OP_BUTT. An out-of-range op is passed straight to
the driver, where the switch default just returns -ENOTSUP - validate
in the API:

	if (op >= RTE_ETH_CACHE_STASH_OP_BUTT)
		return -EINVAL;

Warning: the API uses the legacy dereference style for dev_ops
function pointers:

	if (*dev->dev_ops->cache_stash_get == NULL)
		return -ENOTSUP;
	return eth_err(port_id, (*dev->dev_ops->cache_stash_get)(dev, capa));

Recent ethdev code uses dev->dev_ops->X == NULL / dev->dev_ops->X(...)
directly; please match.

Warning: RTE_ETH_CACHE_STASH_OP_BUTT is not a typical DPDK
sentinel name; existing code uses _MAX, _LAST or _NB. Same for the
enum: rte_eth_cache_stash_type uses RTE_BIT32(0) values (a bitmask)
while the cap field that consumes it is plain uint32_t supported_types
- either keep the type as a flag enum and store flags in a plain
uint32_t (and rename the enum so it does not read as a value type),
or use #define constants like the RTE_ETH_CACHE_STASH_OBJ_* set.

Warning: the queue-level config carries a single queue_id with a
bitmask of Rx and Tx objects in the same call. There is no way for a
caller to say "stash Rx queue 3" versus "stash Tx queue 3" - they share
the queue_id field. Either split into separate enable ops or document
that OBJ_RX_* targets queue.queue_id as an Rx queue and OBJ_TX_* as a
Tx queue (with the caller required to make two calls for both).

Warning: missing release notes entry for the new public API.


Patch 3 - net/e1000: add cache stash support via TPH
====================================================

Error: supported_objects advertises TX_DESC, TX_HEADER and TX_PAYLOAD,
but eth_igb_cache_stash_queue_enable() only reads/writes
E1000_RXCTL(queue_id) and never touches E1000_TXCTL. If a caller
requests Tx stashing the call silently succeeds and does nothing.
Either implement the TXCTL programming or drop the TX_* bits from
supported_objects.

Error: eth_igb_cache_stash_dev_clear() loops over
max(nb_rx_queues, nb_tx_queues) and writes E1000_RXCTL(i):

	nb_queues = RTE_MAX(dev->data->nb_rx_queues, dev->data->nb_tx_queues);
	for (i = 0; i < nb_queues; i++) {
		...
		reg = E1000_READ_REG(hw, E1000_RXCTL(i));
		...
		E1000_WRITE_REG(hw, E1000_RXCTL(i), reg);
	}

If nb_tx_queues > nb_rx_queues this writes RXCTL entries for indices
beyond the configured Rx queues. Use nb_rx_queues for the RXCTL loop
(and a separate TXCTL loop with nb_tx_queues when Tx is implemented).

Error: eth_igb_cache_stash_queue_enable() and _queue_disable() do not
bounds-check config->queue.queue_id. E1000_RXCTL(queue_id) is just an
arithmetic register offset; a bogus queue_id writes to an arbitrary MMIO
address. Reject queue_id >= dev->data->nb_rx_queues.

Error: config->queue.lcore_id is passed directly to rte_pci_tph_st_get()
without validation. Use rte_lcore_is_enabled() (or at least
config->queue.lcore_id < RTE_MAX_LCORE) before calling.

Error: drivers/net/intel/e1000/base/e1000_hw.h and base/e1000_vf.h are
modified to add "u8 tph_mode". The base/ directory is a snapshot of
Intel's vendor base (see base/README) and DPDK-private state must not
be added there - it will be lost on the next resync. Put the field in
the DPDK-side adapter struct in drivers/net/intel/e1000/e1000_ethdev.h
instead. Also, the patch touches the VF copy of struct e1000_hw but the
implementation only lives in igb_ethdev.c (PF) - the VF addition is
dead state.

Warning: log uses %u for an int errno value:

	PMD_DRV_LOG(ERR, "Failed to get device queue=%u st of cpu=%u ret=%u",
		    queue_id, config->queue.lcore_id, ret);

ret is int and is negative on failure; print with %d.

Warning: magic bit literals in RXCTL programming - RTE_BIT32(0/2/3),
RTE_SHIFT_VAL32(0xFF, 24) - should be named per the I350 register layout
(e.g. E1000_RXCTL_RX_DESC_STASH_EN, _HDR_STASH_EN, _PLD_STASH_EN,
_CPU_STAG_MASK/SHIFT).

Warning: comment "I350 must encoding st in high 8bit, and ST in
config-space is no needed!" - grammar aside, the intent is not clear;
please reword.

Warning: doc/guides/nics/features/igb.ini is not updated for the new
cache-stash capability.

Warning: missing release notes for the new e1000 capability.


Patch 4 - app/testpmd: add TPH stash objects configuration
==========================================================

Warning: start_tph_stash() returns early on the first port that fails
the capability check or DEV_ENABLE, but ports with index < i have
already had DEV_ENABLE applied. Those ports stay enabled with no queue
configuration and will not be disabled on stop until a full
stop_tph_stash() runs. Either roll back already-enabled ports on error,
or skip just the failing port and continue.

Warning: queue-level enable failures in the inner loop only print to
stderr and forwarding proceeds with a partial configuration. Worth at
least returning failure from start_tph_stash() and propagating so
start_packet_forwarding() can refuse to run.

Warning: no entry added to usage() in app/test-pmd/parameters.c for
--tph-stash-objects. Users have no way to discover the option or its
accepted values (rxdesc, rxheader, rxpayload, rxdata, rxall).

Warning: no documentation update (doc/guides/testpmd_app_ug/) covering
the new option.

Warning: user-visible message "don't support tph stash" -> "doesn't
support tph stash" (also in the second fprintf). Minor.

Warning: "lcoreid_t  lc_id;" has a double space; "int i" should be
unsigned to match the portid_t/nb_fwd_ports loop bound type.


General
=======

The kernel-side dependency is the central blocker: until the matching
VFIO_DEVICE_FEATURE_TPH_ST is in an accepted Linux UAPI, the uapi
header import in patch 1 (and by extension all of patch 1's VFIO
ioctl plumbing) is speculative. Please post or reference the upstream
kernel patch so the feature number and structure layout are stable
before this lands in DPDK.

      parent reply	other threads:[~2026-05-12 14:38 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-08  9:28 [PATCH v1 0/4] Introduce generic cache stash API and PCIe TPH implementation Chengwen Feng
2026-05-08  9:28 ` [PATCH v1 1/4] bus/pci: introduce PCIe TPH support Chengwen Feng
2026-05-08 15:00   ` Stephen Hemminger
2026-05-08  9:28 ` [PATCH v1 2/4] ethdev: introduce generic cache stash API Chengwen Feng
2026-05-08  9:28 ` [PATCH v1 3/4] net/e1000: add cache stash support via TPH Chengwen Feng
2026-05-08  9:28 ` [PATCH v1 4/4] app/testpmd: add TPH stash objects configuration Chengwen Feng
2026-05-12  9:22 ` [PATCH v2 0/4] Introduce generic cache stash API and PCIe TPH implementation Chengwen Feng
2026-05-12  9:22   ` [PATCH v2 1/4] bus/pci: introduce PCIe TPH support Chengwen Feng
2026-05-12  9:23   ` [PATCH v2 2/4] ethdev: introduce generic cache stash API Chengwen Feng
2026-05-12  9:23   ` [PATCH v2 3/4] net/e1000: add cache stash support via TPH Chengwen Feng
2026-05-12  9:23   ` [PATCH v2 4/4] app/testpmd: add TPH stash objects configuration Chengwen Feng
2026-05-12 14:36 ` [PATCH v1 0/4] Introduce generic cache stash API and PCIe TPH implementation Stephen Hemminger
2026-05-12 14:38 ` Stephen Hemminger [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260512163818.2a5859eb@stephen-xps.local \
    --to=stephen@networkplumber.org \
    --cc=dev@dpdk.org \
    --cc=fengchengwen@huawei.com \
    --cc=thomas@monjalon.net \
    --cc=wathsala.vithanage@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox