DPDK-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v4 0/3] An API for Stashing Packets into CPU caches
       [not found] <20241021015246.304431-1-wathsala.vithanage@arm.com>
@ 2025-05-17 15:17 ` Wathsala Vithanage
  2025-05-17 15:17   ` [RFC PATCH v4 1/3] pci: add non-merged Linux uAPI changes Wathsala Vithanage
                     ` (2 more replies)
  2025-06-02 22:38 ` [PATCH v5 0/4] An API for Cache Stashing with TPH Wathsala Vithanage
  1 sibling, 3 replies; 33+ messages in thread
From: Wathsala Vithanage @ 2025-05-17 15:17 UTC (permalink / raw)
  Cc: dev, Wathsala Vithanage

ethdev: an API for cache stashing hints

Today, DPDK applications benefit from Direct Cache Access (DCA) features
like Intel DDIO and Arm's write-allocate-to-SLC. However, those features
do not allow fine-grained control of direct cache access, such as
stashing packets into upper-level caches (L2 caches) of a processor or
the shared cache of a chiplet. PCIe TLP Processing Hints (TPH) addresses
this need in a vendor-agnostic manner. TPH capability has existed since
PCI Express Base Specification revision 3.0; today, numerous Network
Interface Cards and interconnects from different vendors support TPH
capability. TPH comprises a steering tag (ST) and a processing hint
(PH). ST specifies the cache level of a CPU at which the data should be
written to (or DCAed into), while PH is a hint provided by the PCIe
requester to the completer on an upcoming traffic pattern. Some NIC
vendors bundle TPH capability with fine-grained control over the type of
objects that can be stashed into CPU caches, such as

- Rx/Tx queue descriptors
- Packet-headers
- Packet-payloads
- Data from a given offset from the start of a packet

Note that stashable object types are outside the scope of the PCIe
standard; therefore, vendors could support any combination of the above
items as they see fit.

To enable TPH and fine-grained packet stashing, this API extends the
ethdev library and the PCI bus driver. In this design, the application
provides hints to the PMD via the ethdev stashing API to indicate the
underlying hardware at which CPU and cache level it prefers a packet to
end up. Once the PMD receives a CPU and a cache-level combination (or a
list of such combinations), it must extract the matching ST from the PCI
bus driver for such combinations. The PCI bus driver implements the TPH
functions in an OS specific way; for Linux, it depends on the TPH
capabilities of the VFIO kernel driver.

An application uses the cache stashing ethdev API by first calling the
rte_eth_dev_stashing_capabilities_get() function to find out what object
types can be stashed into a CPU cache by the NIC out of the object types
in the bulleted list above. This function takes a port_id and a pointer
to a uint16_t to report back the object type flags. PMD implements the
stashing_capabilities_get function pointer in eth_dev_ops. If the
underlying platform or the NIC does not support TPH, this function
returns -ENOTSUP, and the application should consider any values stored
in the object invalid.

Once the application knows the supported object types that can be
stashed, the next step is to set the steering tags for the packets
associated with Rx and Tx queues via
rte_eth_dev_stashing_{rx,tx}_config_set() ethdev library functions. Both
functions have an identical signature, a port_id, a queue_id, and a
config object. The port_id and the queue_id are used to locate the
device and the queue. The config object is of type struct
rte_eth_stashing_config, which specifies the lcore_id and the
cache_level, indicating where objects from this queue should be stashed.
The 'objects' field in the config sets the types of objects the
application wishes to stash based on the capabilities found earlier.
Note that if the 'objects' field includes the flag
RTE_ETH_DEV_STASH_OBJECT_OFFSET, the 'offset' field must be used to set
the desired offset. These functions invoke PMD implementations of the
stashing functionality via the stashing_{rx,tx}_hints_set function
callbacks in the eth_dev_ops, respectively.

The PMD's implementation of the stashing_rx_hints_set() and
stashing_tx_hints_set() functions is ultimately responsible for
extracting the ST via the API provided by the PCI bus driver. Before
extracting STs, the PMD should enable the TPH capability in the endpoint
device by calling the rte_pci_tph_enable() function.  The application
begins the ST extraction process by calling the rte_pci_tph_st_get()
function in drivers/bus/pci/rte_bus_pci.h, which returns STs via the
same rte_tph_info objects array passed into it as an argument.  Once PMD
acquires ST, the stashing_{rx,tx}_hints_set callbacks implemented in the
PMD are ready to set the ST as per the rte_eth_stashing_config object
passed to them by the higher-level ethdev functions
ret_eth_dev_stashing_{rx,tx}_hints(). As per the PCIe specification, STs
can be placed on the MSI-X tables or in a device-specific location. For
PMDs, setting the STs on queue contexts is the only viable way of using
TPH. Therefore, the PMDs should only enable TPH in device-specific mode.

V3->V4:
 * Add VFIO IOCTL based ST extraction mechanism to Linux PCI bus driver
 * Remove ST extraction via direct access to ACPI _DSM
 * Replace rte_pci_extract_tph_st() with rte_pci_tph_st_get() in PCI
   bus driver.

Wathsala Vithanage (3):
  pci: add non-merged Linux uAPI changes
  bus/pci: introduce the PCIe TLP Processing Hints API
  ethdev: introduce the cache stashing hints API

 drivers/bus/pci/bsd/pci.c          |  39 +++++++
 drivers/bus/pci/bus_pci_driver.h   |  43 ++++++++
 drivers/bus/pci/linux/pci.c        |  94 ++++++++++++++++
 drivers/bus/pci/linux/pci_init.h   |   9 ++
 drivers/bus/pci/linux/pci_vfio.c   | 166 +++++++++++++++++++++++++++++
 drivers/bus/pci/private.h          |   8 ++
 drivers/bus/pci/rte_bus_pci.h      |  67 ++++++++++++
 drivers/bus/pci/windows/pci.c      |  39 +++++++
 kernel/linux/uapi/linux/vfio_tph.h | 100 +++++++++++++++++
 lib/ethdev/ethdev_driver.h         |  66 ++++++++++++
 lib/ethdev/rte_ethdev.c            | 149 ++++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.h            | 158 +++++++++++++++++++++++++++
 12 files changed, 938 insertions(+)
 create mode 100644 kernel/linux/uapi/linux/vfio_tph.h

--
2.43.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC PATCH v4 1/3] pci: add non-merged Linux uAPI changes
  2025-05-17 15:17 ` [RFC PATCH v4 0/3] An API for Stashing Packets into CPU caches Wathsala Vithanage
@ 2025-05-17 15:17   ` Wathsala Vithanage
  2025-05-19  6:41     ` David Marchand
  2025-05-17 15:17   ` [RFC PATCH v4 2/3] bus/pci: introduce the PCIe TLP Processing Hints API Wathsala Vithanage
  2025-05-17 15:17   ` [RFC PATCH v4 3/3] ethdev: introduce the cache stashing hints API Wathsala Vithanage
  2 siblings, 1 reply; 33+ messages in thread
From: Wathsala Vithanage @ 2025-05-17 15:17 UTC (permalink / raw)
  To: Chenbo Xia, Nipun Gupta, Maxime Coquelin
  Cc: dev, Wathsala Vithanage, Dhruv Tripathi

This commit is a hack to prevent build failures the next commit in this
patch series causes due to missing vfio uapi definitions.
This commit should NEVER BE MERGED.
Next commit in this patch series depends on additions to vfio uapi that
enable TPH icotl in the vfio-pci driver in the Linux kernel.
These additions have not yet been merged into the upstream kernel.

Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
---
 drivers/bus/pci/linux/pci_init.h   |   1 +
 kernel/linux/uapi/linux/vfio_tph.h | 100 +++++++++++++++++++++++++++++
 2 files changed, 101 insertions(+)
 create mode 100644 kernel/linux/uapi/linux/vfio_tph.h

diff --git a/drivers/bus/pci/linux/pci_init.h b/drivers/bus/pci/linux/pci_init.h
index a4d37c0d0a..25b901f460 100644
--- a/drivers/bus/pci/linux/pci_init.h
+++ b/drivers/bus/pci/linux/pci_init.h
@@ -6,6 +6,7 @@
 #define EAL_PCI_INIT_H_
 
 #include <rte_vfio.h>
+#include <uapi/linux/vfio_tph.h>
 
 #include "private.h"
 
diff --git a/kernel/linux/uapi/linux/vfio_tph.h b/kernel/linux/uapi/linux/vfio_tph.h
new file mode 100644
index 0000000000..5850400fad
--- /dev/null
+++ b/kernel/linux/uapi/linux/vfio_tph.h
@@ -0,0 +1,100 @@
+
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * VFIO API definition
+ *
+ * WARNING: CONTENTS OF THIS HEADER NEEDS TO BE MERGED INTO KERNEL'S
+ * uapi/linux/vifo.h IN A FUTURE KERNEL RELEASE. UNTIL THEN IT'S TACKED
+ * ON TO DPDK'S kernel/linux/uapi DIRECTORY TO PREVENT BUILD FAILURES.
+ *
+ * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
+ *     Author: Alex Williamson <alex.williamson@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _UAPIVFIO_TPH_H
+#define _UAPIVFIO_TPH_H
+
+/**
+ * VFIO_DEVICE_PCI_TPH	- _IO(VFIO_TYPE, VFIO_BASE + 22)
+ *
+ * This command is used to control PCIe TLP Processing Hints (TPH)
+ * capability in a PCIe device.
+ * It supports following operations on a PCIe device with respect to TPH
+ * capability.
+ *
+ * - Enabling/disabling TPH capability in a PCIe device.
+ *
+ *   Setting VFIO_DEVICE_TPH_ENABLE flag enables TPH in no-steering-tag,
+ *   interrupt-vector, or device-specific mode defined in the PCIe specficiation
+ *   when feature flags VFIO_TPH_ST_NS_MODE, VFIO_TPH_ST_IV_MODE, and
+ *   VFIO_TPH_ST_DS_MODE are set respectively.
+ *   VFIO_DEVICE_TPH_DISABLE disables PCIe TPH on the device.
+ *
+ * - Writing STs to MSI-X or ST table in a PCIe device.
+ *
+ *   VFIO_DEVICE_TPH_SET_ST flag set steering tags on a device at an index in
+ *   MSI-X or ST-table depending on the VFIO_TPH_ST_x_MODE flag used and device
+ *   capabilities. The caller can set one or more steering tags by passing an
+ *   array of vfio_pci_tph_entry objects containing cpu_id, cache_level, and
+ *   MSI-X/ST-table index. The caller can also set the intended memory type and
+ *   the processing hint by setting VFIO_TPH_MEM_TYPE_x and VFIO_TPH_HINT_x
+ *   flags, respectively.
+ *
+ * - Reading Steering Tags (ST) from the host platform.
+ *
+ *   VFIO_DEVICE_TPH_GET_ST flags returns steering tags to the caller. Caller
+ *   can request one or more steering tags by passing an array of
+ *   vfio_pci_tph_entry objects. Steering Tag for each request is returned via
+ *   the st field in vfio_pci_tph_entry.
+ */
+struct vfio_pci_tph_entry {
+	/* in */
+	__u32 cpu_id;			/* CPU logical ID */
+	__u32 cache_level;		/* Cache level. L1 D= 0, L2D = 2, ...*/
+	__u8  flags;
+#define VFIO_TPH_MEM_TYPE_MASK		0x1
+#define VFIO_TPH_MEM_TYPE_SHIFT		0
+#define VFIO_TPH_MEM_TYPE_VMEM		0   /* Request volatile memory ST */
+#define VFIO_TPH_MEM_TYPE_PMEM		1   /* Request persistent memory ST */
+
+#define VFIO_TPH_HINT_MASK		0x3
+#define VFIO_TPH_HINT_SHIFT		1
+#define VFIO_TPH_HINT_BIDIR		0
+#define VFIO_TPH_HINT_REQSTR		(1 << VFIO_TPH_HINT_SHIFT)
+#define VFIO_TPH_HINT_TARGET		(2 << VFIO_TPH_HINT_SHIFT)
+#define VFIO_TPH_HINT_TARGET_PRIO	(3 << VFIO_TPH_HINT_SHIFT)
+	__u8  pad0;
+	__u16 index;			/* MSI-X/ST-table index to set ST */
+	/* out */
+	__u16 st;			/* Steering-Tag */
+	__u8  ph_ignore;		/* Platform ignored the Processing */
+	__u8  pad1;
+};
+
+struct vfio_pci_tph {
+	__u32 argsz;			/* Size of vfio_pci_tph and info[] */
+	__u32 flags;
+#define VFIO_DEVICE_TPH_OP_MASK		0x7
+#define VFIO_DEVICE_TPH_OP_SHIFT	3
+#define VFIO_DEVICE_TPH_ENABLE		0	/* Enable TPH on device */
+#define VFIO_DEVICE_TPH_DISABLE		1	/* Disable TPH on device */
+#define VFIO_DEVICE_TPH_GET_ST		2	/* Get steering-tags */
+#define VFIO_DEVICE_TPH_SET_ST		4	/* Set steering-tags */
+
+#define	VFIO_TPH_ST_MODE_MASK	(0x3 << VFIO_DEVICE_TPH_OP_SHIFT)
+#define	VFIO_TPH_ST_NS_MODE	(0 << VFIO_DEVICE_TPH_OP_SHIFT)
+#define	VFIO_TPH_ST_IV_MODE	(1 << VFIO_DEVICE_TPH_OP_SHIFT)
+#define	VFIO_TPH_ST_DS_MODE	(2 << VFIO_DEVICE_TPH_OP_SHIFT)
+	__u32 count;			/* Number of entries in ents[] */
+	struct vfio_pci_tph_entry ents[];
+#define VFIO_TPH_INFO_MAX	2048	/* Max entries in ents[] */
+};
+
+#define VFIO_DEVICE_PCI_TPH	_IO(VFIO_TYPE, VFIO_BASE + 22)
+
+
+#endif /* _UAPIVFIO_TPH_H */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC PATCH v4 2/3] bus/pci: introduce the PCIe TLP Processing Hints API
  2025-05-17 15:17 ` [RFC PATCH v4 0/3] An API for Stashing Packets into CPU caches Wathsala Vithanage
  2025-05-17 15:17   ` [RFC PATCH v4 1/3] pci: add non-merged Linux uAPI changes Wathsala Vithanage
@ 2025-05-17 15:17   ` Wathsala Vithanage
  2025-05-19  6:44     ` David Marchand
  2025-05-17 15:17   ` [RFC PATCH v4 3/3] ethdev: introduce the cache stashing hints API Wathsala Vithanage
  2 siblings, 1 reply; 33+ messages in thread
From: Wathsala Vithanage @ 2025-05-17 15:17 UTC (permalink / raw)
  To: Chenbo Xia, Nipun Gupta, Anatoly Burakov
  Cc: dev, Wathsala Vithanage, Honnappa Nagarahalli, Dhruv Tripathi

Extend the PCI bus driver to enable or disable TPH capability and set or
get PCI Steering-Tags (STs) on an endpoint device. The functions
rte_pci_tph_{enable, disable,st_set,st_get} provide the primary
interface for DPDK device drivers. Implementation of the interface is OS
dependent. For Linux, the kernel VFIO driver provides the
implementation. rte_pci_tph_{enable, disable} functions enable and
disable TPH capability, respectively. rte_pci_tph_enable enables TPH on
the device in either of the device-specific, interrupt-vector, or
no-steering-tag modes.

rte_pci_tph_st_{get, set} functions take an array of rte_tph_info
objects with cpu-id, cache-level, flags (processing-hint, memory-type).
The index in rte_tph_info is the MSI-X/MSI vector/ST-table index if TPH
was enabled in the interrupt-vector mode; the rte_pci_tph_st_get
function ignores it. Both rte_pci_tph_st_{set, get} functions return the
steering-tag (st) and processing-hint-ignored (ph_ignore) fields via the
same rte_tph_info object passed into them.

rte_pci_tph_st_{get, set} functions will return an error if processing
any of the rte_tph_info objects fails. The API does not indicate which
entry in the rte_tph_info array was executed successfully and which
caused an error. Therefore, in case of an error, the caller should
discard the output. If rte_pci_tph_set returns an error, it should be
treated as a partial error. Hence, the steering-tag update on the device
should be considered partial and inconsistent with the expected outcome.
This should be resolved by resetting the endpoint device before further
attempts to set steering tags.

Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
---
 drivers/bus/pci/bsd/pci.c        |  39 ++++++++
 drivers/bus/pci/bus_pci_driver.h |  43 ++++++++
 drivers/bus/pci/linux/pci.c      |  94 +++++++++++++++++
 drivers/bus/pci/linux/pci_init.h |   8 ++
 drivers/bus/pci/linux/pci_vfio.c | 166 +++++++++++++++++++++++++++++++
 drivers/bus/pci/private.h        |   8 ++
 drivers/bus/pci/rte_bus_pci.h    |  67 +++++++++++++
 drivers/bus/pci/windows/pci.c    |  39 ++++++++
 8 files changed, 464 insertions(+)

diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
index 5e2e09d5a4..257816ab8e 100644
--- a/drivers/bus/pci/bsd/pci.c
+++ b/drivers/bus/pci/bsd/pci.c
@@ -650,3 +650,42 @@ rte_pci_ioport_unmap(struct rte_pci_ioport *p)
 
 	return ret;
 }
+
+int
+rte_pci_tph_enable(const struct rte_pci_device *dev, int mode)
+{
+	RTE_SET_USED(dev);
+	RTE_SET_USED(mode);
+	/* This feature is not yet implemented for BSD */
+	return -1;
+}
+
+int
+rte_pci_tph_disable(const struct rte_pci_device *dev)
+{
+	RTE_SET_USED(dev);
+	/* This feature is not yet implemented for BSD */
+	return -1;
+}
+
+int
+rte_pci_tph_st_get(const struct rte_pci_device *dev,
+		   struct rte_tph_info *info, size_t count)
+{
+	RTE_SET_USED(dev);
+	RTE_SET_USED(info);
+	RTE_SET_USED(count);
+	/* This feature is not yet implemented for BSD */
+	return -1;
+}
+
+int
+rte_pci_tph_st_set(const struct rte_pci_device *dev,
+		   struct rte_tph_info *info, size_t count)
+{
+	RTE_SET_USED(dev);
+	RTE_SET_USED(info);
+	RTE_SET_USED(count);
+	/* This feature is not yet implemented for BSD */
+	return -1;
+}
diff --git a/drivers/bus/pci/bus_pci_driver.h b/drivers/bus/pci/bus_pci_driver.h
index 2cc1119072..f19b4be295 100644
--- a/drivers/bus/pci/bus_pci_driver.h
+++ b/drivers/bus/pci/bus_pci_driver.h
@@ -194,6 +194,49 @@ struct rte_pci_ioport {
 	uint64_t len; /* only filled for memory mapped ports */
 };
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this structure may change, or be removed, without prior
+ * notice
+ *
+ * This structure is passed into the TPH Steering-Tag set or get function as an
+ * argument by the caller. Return values are set in the same structure in st and
+ * ph_ignore fields by the calee.
+ *
+ * Refer to PCI-SIG ECN "Revised _DSM for Cache Locality TPH Features" for
+ * details.
+ */
+struct rte_tph_info {
+	/* Input */
+	uint32_t cpu_id;	/*Logical CPU id*/
+	uint32_t cache_level;	/*Cache level relative to CPU. l1d=0,l2d=1,...*/
+	uint8_t flags;		/*Memory type, procesisng hint etc.*/
+	uint16_t index;		/*Index in vector table to store the ST*/
+
+	/* Output */
+	uint16_t st;		/*Steering tag returned by the platform*/
+	uint8_t ph_ignore;	/*Platform ignores PH for the returned ST*/
+};
+
+#define RTE_PCI_TPH_MEM_TYPE_MASK		0x1
+#define RTE_PCI_TPH_MEM_TYPE_SHIFT		0
+/** Request volatile memory ST */
+#define RTE_PCI_TPH_MEM_TYPE_VMEM		0
+/** Request persistent memory ST */
+#define RTE_PCI_TPH_MEM_TYPE_PMEM		1
+
+/** TLP Processing Hints - PCIe 6.0 specification section 2.2.7.1.1 */
+#define RTE_PCI_TPH_HINT_MASK		0x3
+#define RTE_PCI_TPH_HINT_SHIFT		1
+/** Host and device access data equally */
+#define RTE_PCI_TPH_HINT_BIDIR		0
+/** Device accesses data more frequently */
+#define RTE_PCI_TPH_HINT_REQSTR		(1 << RTE_PCI_TPH_HINT_SHIFT)
+/** Host access data more frequently */
+#define RTE_PCI_TPH_HINT_TARGET		(2 << RTE_PCI_TPH_HINT_SHIFT)
+/** Host access data more frequently with a high temporal locality */
+#define RTE_PCI_TPH_HINT_TARGET_PRIO	(3 << RTE_PCI_TPH_HINT_SHIFT)
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index c20d159218..463c06ad64 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -814,3 +814,97 @@ rte_pci_ioport_unmap(struct rte_pci_ioport *p)
 
 	return ret;
 }
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_enable, 25.03)
+int
+rte_pci_tph_enable(const struct rte_pci_device *dev, int mode)
+{
+	int ret = 0;
+
+	switch (dev->kdrv) {
+#ifdef VFIO_PRESENT
+	case RTE_PCI_KDRV_VFIO:
+		if (pci_vfio_is_enabled())
+			ret = pci_vfio_tph_enable(dev, mode);
+		break;
+#endif
+	case RTE_PCI_KDRV_IGB_UIO:
+	case RTE_PCI_KDRV_UIO_GENERIC:
+	default:
+		ret = -ENOTSUP;
+		break;
+	}
+
+	return ret;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_disable, 25.03)
+int
+rte_pci_tph_disable(const struct rte_pci_device *dev)
+{
+	int ret = 0;
+
+	switch (dev->kdrv) {
+#ifdef VFIO_PRESENT
+	case RTE_PCI_KDRV_VFIO:
+		if (pci_vfio_is_enabled())
+			ret = pci_vfio_tph_disable(dev);
+		break;
+#endif
+	case RTE_PCI_KDRV_IGB_UIO:
+	case RTE_PCI_KDRV_UIO_GENERIC:
+	default:
+		ret = -ENOTSUP;
+		break;
+	}
+
+	return ret;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_st_get, 25.03)
+int
+rte_pci_tph_st_get(const struct rte_pci_device *dev,
+		   struct rte_tph_info *info, size_t count)
+{
+	int ret = 0;
+
+	switch (dev->kdrv) {
+#ifdef VFIO_PRESENT
+	case RTE_PCI_KDRV_VFIO:
+		if (pci_vfio_is_enabled())
+			ret = pci_vfio_tph_st_get(dev, info, count);
+		break;
+#endif
+	case RTE_PCI_KDRV_IGB_UIO:
+	case RTE_PCI_KDRV_UIO_GENERIC:
+	default:
+		ret = -ENOTSUP;
+		break;
+	}
+
+	return ret;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_st_set, 25.03)
+int
+rte_pci_tph_st_set(const struct rte_pci_device *dev,
+		   struct rte_tph_info *info, size_t count)
+{
+	int ret = 0;
+
+	switch (dev->kdrv) {
+#ifdef VFIO_PRESENT
+	case RTE_PCI_KDRV_VFIO:
+		if (pci_vfio_is_enabled())
+			ret = pci_vfio_tph_st_set(dev, info, count);
+		break;
+#endif
+	case RTE_PCI_KDRV_IGB_UIO:
+	case RTE_PCI_KDRV_UIO_GENERIC:
+	default:
+		ret = -ENOTSUP;
+		break;
+	}
+
+	return ret;
+}
diff --git a/drivers/bus/pci/linux/pci_init.h b/drivers/bus/pci/linux/pci_init.h
index 25b901f460..5b249c81b1 100644
--- a/drivers/bus/pci/linux/pci_init.h
+++ b/drivers/bus/pci/linux/pci_init.h
@@ -76,6 +76,14 @@ int pci_vfio_ioport_unmap(struct rte_pci_ioport *p);
 int pci_vfio_map_resource(struct rte_pci_device *dev);
 int pci_vfio_unmap_resource(struct rte_pci_device *dev);
 
+/* TLP Processing Hints control functions */
+int pci_vfio_tph_enable(const struct rte_pci_device *dev, int mode);
+int pci_vfio_tph_disable(const struct rte_pci_device *dev);
+int pci_vfio_tph_st_get(const struct rte_pci_device *dev,
+			struct rte_tph_info *info, size_t ent_count);
+int pci_vfio_tph_st_set(const struct rte_pci_device *dev,
+			struct rte_tph_info *info, size_t ent_count);
+
 int pci_vfio_is_enabled(void);
 
 #endif
diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
index 5317170231..1e293c1376 100644
--- a/drivers/bus/pci/linux/pci_vfio.c
+++ b/drivers/bus/pci/linux/pci_vfio.c
@@ -1316,6 +1316,172 @@ pci_vfio_mmio_write(const struct rte_pci_device *dev, int bar,
 	return pwrite(fd, buf, len, offset + offs);
 }
 
+static int
+pci_vfio_tph_ioctl(const struct rte_pci_device *dev, struct vfio_pci_tph *pci_tph)
+{
+	const struct rte_intr_handle *intr_handle = dev->intr_handle;
+	int vfio_dev_fd = 0, ret = 0;
+
+	vfio_dev_fd = rte_intr_dev_fd_get(intr_handle);
+	if (vfio_dev_fd < 0) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ret = ioctl(vfio_dev_fd, VFIO_DEVICE_PCI_TPH, pci_tph);
+out:
+	return ret;
+}
+
+static int
+pci_vfio_tph_st_op(const struct rte_pci_device *dev,
+		    struct rte_tph_info *info, size_t count,
+		    enum rte_pci_st_op op)
+{
+	RTE_SET_USED(dev);
+	int ret = 0;
+	size_t argsz = 0, i;
+	struct vfio_pci_tph *pci_tph = NULL;
+	uint8_t mem_type = 0, hint = 0;
+
+	if (!count) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	argsz = sizeof(struct vfio_pci_tph) +
+		count * sizeof(struct vfio_pci_tph_entry);
+
+	pci_tph = rte_zmalloc(NULL, argsz, 0);
+	if (!pci_tph) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	pci_tph->argsz = argsz;
+	pci_tph->count = count;
+
+	switch (op) {
+	case RTE_PCI_TPH_ST_GET:
+		pci_tph->flags = VFIO_DEVICE_TPH_GET_ST;
+		break;
+	case RTE_PCI_TPH_ST_SET:
+		pci_tph->flags = VFIO_DEVICE_TPH_SET_ST;
+		break;
+	default:
+		ret = -EINVAL;
+		goto out;
+	}
+
+	for (i = 0; i < count; i++) {
+		pci_tph->ents[i].cpu_id = info[i].cpu_id;
+		pci_tph->ents[i].cache_level = info[i].cache_level;
+
+		mem_type = info[i].flags & RTE_PCI_TPH_MEM_TYPE_MASK;
+		switch (mem_type) {
+		case RTE_PCI_TPH_MEM_TYPE_VMEM:
+			pci_tph->ents[i].flags |= VFIO_TPH_MEM_TYPE_VMEM;
+			break;
+		case RTE_PCI_TPH_MEM_TYPE_PMEM:
+			pci_tph->ents[i].flags |= VFIO_TPH_MEM_TYPE_PMEM;
+			break;
+		default:
+			ret = -EINVAL;
+			goto out;
+		}
+
+		hint = info[i].flags & RTE_PCI_TPH_HINT_MASK;
+		switch (hint) {
+		case RTE_PCI_TPH_HINT_BIDIR:
+			pci_tph->ents[i].flags |= VFIO_TPH_HINT_BIDIR;
+			break;
+		case RTE_PCI_TPH_HINT_REQSTR:
+			pci_tph->ents[i].flags |= VFIO_TPH_HINT_REQSTR;
+			break;
+		case RTE_PCI_TPH_HINT_TARGET:
+			pci_tph->ents[i].flags |= VFIO_TPH_HINT_TARGET;
+			break;
+		case RTE_PCI_TPH_HINT_TARGET_PRIO:
+			pci_tph->ents[i].flags |= VFIO_TPH_HINT_TARGET_PRIO;
+			break;
+		default:
+			ret = -EINVAL;
+			goto out;
+		}
+
+		if (op == RTE_PCI_TPH_ST_SET)
+			pci_tph->ents[i].index = info[i].index;
+	}
+
+	ret = pci_vfio_tph_ioctl(dev, pci_tph);
+	if (ret)
+		goto out;
+
+	/*
+	 * Kernel returns steering-tag and ph-ignore bits for
+	 * RTE_PCI_TPH_ST_SET too, therefore copy output for
+	 * both RTE_PCI_TPH_ST_SET and RTE_PCI_TPH_ST_GET
+	 * cases.
+	 */
+	for (i = 0; i < count; i++) {
+		info[i].st = pci_tph->ents[i].st;
+		info[i].ph_ignore = pci_tph->ents[i].ph_ignore;
+	}
+
+out:
+	if (pci_tph)
+		rte_free(pci_tph);
+	return ret;
+}
+
+int
+pci_vfio_tph_enable(const struct rte_pci_device *dev, int mode)
+{
+	int ret;
+
+	if (!(mode ^ (mode & VFIO_TPH_ST_MODE_MASK))) {
+		ret = -EINVAL;
+		goto out;
+	} else
+		mode &= VFIO_TPH_ST_MODE_MASK;
+
+	struct vfio_pci_tph pci_tph = {
+		.argsz = sizeof(struct vfio_pci_tph),
+		.flags = VFIO_DEVICE_TPH_ENABLE | mode,
+		.count = 0
+	};
+
+	ret = pci_vfio_tph_ioctl(dev, &pci_tph);
+out:
+	return ret;
+}
+
+int
+pci_vfio_tph_disable(const struct rte_pci_device *dev)
+{
+	struct vfio_pci_tph pci_tph = {
+		.argsz = sizeof(struct vfio_pci_tph),
+		.flags = VFIO_DEVICE_TPH_DISABLE,
+		.count = 0
+	};
+
+	return pci_vfio_tph_ioctl(dev, &pci_tph);
+}
+
+int
+pci_vfio_tph_st_get(const struct rte_pci_device *dev,
+		    struct rte_tph_info *info, size_t count)
+{
+	return pci_vfio_tph_st_op(dev, info, count, RTE_PCI_TPH_ST_GET);
+}
+
+int
+pci_vfio_tph_st_set(const struct rte_pci_device *dev,
+		    struct rte_tph_info *info, size_t count)
+{
+	return pci_vfio_tph_st_op(dev, info, count, RTE_PCI_TPH_ST_SET);
+}
+
 int
 pci_vfio_is_enabled(void)
 {
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 38109844b9..d2ec370320 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -335,4 +335,12 @@ rte_pci_dev_iterate(const void *start,
 int
 rte_pci_devargs_parse(struct rte_devargs *da);
 
+/*
+ * TPH Steering-Tag operation types.
+ */
+enum rte_pci_st_op {
+	RTE_PCI_TPH_ST_SET, /* Set TPH Steering - Tags */
+	RTE_PCI_TPH_ST_GET  /* Get TPH Steering - Tags */
+};
+
 #endif /* _PCI_PRIVATE_H_ */
diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
index 19a7b15b99..69aad5e3da 100644
--- a/drivers/bus/pci/rte_bus_pci.h
+++ b/drivers/bus/pci/rte_bus_pci.h
@@ -31,6 +31,7 @@ extern "C" {
 struct rte_pci_device;
 struct rte_pci_driver;
 struct rte_pci_ioport;
+struct rte_tph_info;
 
 struct rte_devargs;
 
@@ -312,6 +313,72 @@ void rte_pci_ioport_read(struct rte_pci_ioport *p,
 void rte_pci_ioport_write(struct rte_pci_ioport *p,
 		const void *data, size_t len, off_t offset);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Enable TLP Processing Hints (TPH) in the endpoint device.
+ *
+ * @param dev
+ *   A pointer to a rte_pci_device structure describing the device
+ *   to use.
+ * @param mode
+ *   TPH mode the device must operate in.
+ */
+__rte_experimental
+int rte_pci_tph_enable(const struct rte_pci_device *dev, int mode);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Disable TLP Processing Hints (TPH) in the endpoint device.
+ *
+ * @param dev
+ *   A pointer to a rte_pci_device structure describing the device
+ *   to use.
+ */
+__rte_experimental
+int rte_pci_tph_disable(const struct rte_pci_device *dev);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get PCI Steering-Tags (STs) for a list of stashing targets.
+ *
+ * @param mode
+ *   TPH mode the device must operate in.
+ * @param info
+ *   An array of rte_tph_info objects, each describing the target
+ *   cpu-id, cache-level, etc. Steering-tags for each target is
+ *   eturned via info array.
+ * @param count
+ *   The number of elements in the info array.
+ */
+__rte_experimental
+int rte_pci_tph_st_get(const struct rte_pci_device *dev,
+		struct rte_tph_info *info, size_t count);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set PCI Steering-Tags (STs) for a list of stashing targets.
+ *
+ * @param mode
+ *   TPH mode the device must operate in.
+ * @param info
+ *   An array of rte_tph_info objects, each describing the target
+ *   cpu-id, cache-level, etc. Steering-tags for each target is
+ *   eturned via info array.
+ * @param count
+ *   The number of elements in the info array.
+ */
+__rte_experimental
+int rte_pci_tph_st_set(const struct rte_pci_device *dev,
+		struct rte_tph_info *info, size_t count);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/drivers/bus/pci/windows/pci.c b/drivers/bus/pci/windows/pci.c
index e7e449306e..72c334e572 100644
--- a/drivers/bus/pci/windows/pci.c
+++ b/drivers/bus/pci/windows/pci.c
@@ -511,3 +511,42 @@ rte_pci_scan(void)
 
 	return ret;
 }
+
+int
+rte_pci_tph_enable(const struct rte_pci_device *dev, int mode)
+{
+	RTE_SET_USED(dev);
+	RTE_SET_USED(mode);
+	/* This feature is not yet implemented for windows */
+	return -1;
+}
+
+int
+rte_pci_tph_disable(const struct rte_pci_device *dev)
+{
+	RTE_SET_USED(dev);
+	/* This feature is not yet implemented for windows */
+	return -1;
+}
+
+int
+rte_pci_tph_st_get(const struct rte_pci_device *dev,
+		   struct rte_tph_info *info, size_t count)
+{
+	RTE_SET_USED(dev);
+	RTE_SET_USED(info);
+	RTE_SET_USED(count);
+	/* This feature is not yet implemented for windows */
+	return -1;
+}
+
+int
+rte_pci_tph_st_set(const struct rte_pci_device *dev,
+		   struct rte_tph_info *info, size_t count)
+{
+	RTE_SET_USED(dev);
+	RTE_SET_USED(info);
+	RTE_SET_USED(count);
+	/* This feature is not yet implemented for windows */
+	return -1;
+}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC PATCH v4 3/3] ethdev: introduce the cache stashing hints API
  2025-05-17 15:17 ` [RFC PATCH v4 0/3] An API for Stashing Packets into CPU caches Wathsala Vithanage
  2025-05-17 15:17   ` [RFC PATCH v4 1/3] pci: add non-merged Linux uAPI changes Wathsala Vithanage
  2025-05-17 15:17   ` [RFC PATCH v4 2/3] bus/pci: introduce the PCIe TLP Processing Hints API Wathsala Vithanage
@ 2025-05-17 15:17   ` Wathsala Vithanage
  2025-05-20 13:53     ` Stephen Hemminger
  2 siblings, 1 reply; 33+ messages in thread
From: Wathsala Vithanage @ 2025-05-17 15:17 UTC (permalink / raw)
  To: Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
  Cc: dev, Wathsala Vithanage, Honnappa Nagarahalli, Dhruv Tripathi

Extend the ethdev library to enable the stashing of different data
objects, such as the ones listed below, into CPU caches directly
from the NIC.

- Rx/Tx queue descriptors
- Rx packets
- Packet headers
- packet payloads
- Data of a packet at an offset from the start of the packet

The APIs are designed in a hardware/vendor agnostic manner such that
supporting PMDs could use any capabilities available in the underlying
hardware for fine-grained stashing of data objects into a CPU cache

The API provides an interface to query the availability of stashing
capabilities, i.e., platform/NIC support, stashable object types, etc,
via the rte_eth_dev_stashing_capabilities_get interface.

The function pair rte_eth_dev_stashing_rx_config_set and
rte_eth_dev_stashing_tx_config_set sets the stashing hint (the CPU, 
cache level, and data object types) on the Rx and Tx queues.

PMDs that support stashing must register their implementations with the
following eth_dev_ops callbacks, which are invoked by the ethdev
functions listed above.

- stashing_capabilities_get
- stashing_rx_hints_set
- stashing_tx_hints_set

Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
---
 lib/ethdev/ethdev_driver.h |  66 ++++++++++++++++
 lib/ethdev/rte_ethdev.c    | 149 ++++++++++++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.h    | 158 +++++++++++++++++++++++++++++++++++++
 3 files changed, 373 insertions(+)

diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 2b4d2ae9c3..8a4012db08 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -1376,6 +1376,68 @@ enum rte_eth_dev_operation {
 typedef uint64_t (*eth_get_restore_flags_t)(struct rte_eth_dev *dev,
 					    enum rte_eth_dev_operation op);
 
+/**
+ * @internal
+ * Set cache stashing hints in Rx queue.
+ *
+ * @param dev
+ *   Port (ethdev) handle.
+ * @param queue_id
+ *   Rx queue.
+ * @param config
+ *   Stashing hints configuration for the queue.
+ *
+ * @return
+ *   -ENOTSUP if the device or the platform does not support cache stashing.
+ *   -ENOSYS  if the underlying PMD hasn't implemented cache stashing feature.
+ *   -EINVAL  on invalid arguments.
+ *   0 on success.
+ */
+typedef int (*eth_stashing_rx_hints_set_t)(struct rte_eth_dev *dev, uint16_t queue_id,
+					   struct rte_eth_stashing_config *config);
+
+/**
+ * @internal
+ * Set cache stashing hints in Tx queue.
+ *
+ * @param dev
+ *   Port (ethdev) handle.
+ * @param queue_id
+ *   Tx queue.
+ * @param config
+ *   Stashing hints configuration for the queue.
+ *
+ * @return
+ *   -ENOTSUP if the device or the platform does not support cache stashing.
+ *   -ENOSYS  if the underlying PMD hasn't implemented cache stashing feature.
+ *   -EINVAL  on invalid arguments.
+ *   0 on success.
+ */
+typedef int (*eth_stashing_tx_hints_set_t)(struct rte_eth_dev *dev, uint16_t queue_id,
+					   struct rte_eth_stashing_config *config);
+
+/**
+ * @internal
+ * Get cache stashing object types supported in the ethernet device.
+ * The return value indicates availability of stashing hints support
+ * in the hardware and the PMD.
+ *
+ * @param dev
+ *   Port (ethdev) handle.
+ * @param objects
+ *   PMD sets supported bits on return.
+ *
+ * @return
+ *   -ENOTSUP if the device or the platform does not support cache stashing.
+ *   -ENOSYS  if the underlying PMD hasn't implemented cache stashing feature.
+ *   -EINVAL  on NULL values for types or hints parameters.
+ *   On return, types and hints parameters will have bits set for supported
+ *   object types and hints.
+ *   0 on success.
+ */
+typedef int (*eth_stashing_capabilities_get_t)(struct rte_eth_dev *dev,
+					     uint16_t *objects);
+
 /**
  * @internal A structure containing the functions exported by an Ethernet driver.
  */
@@ -1402,6 +1464,10 @@ struct eth_dev_ops {
 	eth_mac_addr_remove_t      mac_addr_remove; /**< Remove MAC address */
 	eth_mac_addr_add_t         mac_addr_add;  /**< Add a MAC address */
 	eth_mac_addr_set_t         mac_addr_set;  /**< Set a MAC address */
+	eth_stashing_rx_hints_set_t   stashing_rx_hints_set; /**< Set Rx cache stashing*/
+	eth_stashing_tx_hints_set_t   stashing_tx_hints_set; /**< Set Tx cache stashing*/
+	/** Get supported stashing hints*/
+	eth_stashing_capabilities_get_t stashing_capabilities_get;
 	/** Set list of multicast addresses */
 	eth_set_mc_addr_list_t     set_mc_addr_list;
 	mtu_set_t                  mtu_set;       /**< Set MTU */
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index d4197322a0..75001e844d 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -158,6 +158,7 @@ static const struct {
 	{RTE_ETH_DEV_CAPA_RXQ_SHARE, "RXQ_SHARE"},
 	{RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP, "FLOW_RULE_KEEP"},
 	{RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP, "FLOW_SHARED_OBJECT_KEEP"},
+	{RTE_ETH_DEV_CAPA_CACHE_STASHING, "CACHE_STASHING"},
 };
 
 enum {
@@ -7419,5 +7420,153 @@ int rte_eth_dev_map_aggr_tx_affinity(uint16_t port_id, uint16_t tx_queue_id,
 	return ret;
 }
 
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_dev_validate_stashing_config, 25.03)
+int
+rte_eth_dev_validate_stashing_config(uint16_t port_id, uint16_t queue_id,
+				     uint8_t queue_direction,
+				     struct rte_eth_stashing_config *config)
+{
+	struct rte_eth_dev *dev;
+	struct rte_eth_dev_info dev_info;
+	int ret = 0;
+	uint16_t nb_queues;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+	if (!config) {
+		RTE_ETHDEV_LOG_LINE(ERR, "Invalid stashing configuration");
+		ret = -EINVAL;
+		goto out;
+	}
+
+	/*
+	 * Check for invalid objects
+	 */
+	if (!RTE_ETH_DEV_STASH_OBJECTS_VALID(config->objects)) {
+		RTE_ETHDEV_LOG_LINE(ERR, "Invalid stashing objects");
+		ret = -EINVAL;
+		goto out;
+	}
+
+	dev = &rte_eth_devices[port_id];
+
+	nb_queues = (queue_direction == RTE_ETH_DEV_RX_QUEUE) ?
+				      dev->data->nb_rx_queues :
+				      dev->data->nb_tx_queues;
+
+	if (queue_id >= nb_queues) {
+		RTE_ETHDEV_LOG_LINE(ERR, "Invalid Rx queue_id=%u", queue_id);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ret = rte_eth_dev_info_get(port_id, &dev_info);
+	if (ret < 0)
+		goto out;
+
+	if ((dev_info.dev_capa & RTE_ETH_DEV_CAPA_CACHE_STASHING) !=
+	    RTE_ETH_DEV_CAPA_CACHE_STASHING) {
+		ret = -ENOTSUP;
+		goto out;
+	}
+
+	if (*dev->dev_ops->stashing_rx_hints_set == NULL ||
+	    *dev->dev_ops->stashing_tx_hints_set == NULL) {
+		RTE_ETHDEV_LOG_LINE(ERR, "Stashing hints are not implemented "
+				    "in %s for %s", dev_info.driver_name,
+				    dev_info.device->name);
+		ret = -ENOSYS;
+	}
+
+out:
+	return ret;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_dev_stashing_rx_config_set, 25.03)
+int
+rte_eth_dev_stashing_rx_config_set(uint16_t port_id, uint16_t queue_id,
+				   struct rte_eth_stashing_config *config)
+{
+	struct rte_eth_dev *dev;
+	int ret = 0;
+
+	ret = rte_eth_dev_validate_stashing_config(port_id, queue_id,
+						   RTE_ETH_DEV_RX_QUEUE,
+						   config);
+	if (ret < 0)
+		goto out;
+
+	dev = &rte_eth_devices[port_id];
+
+	ret = eth_err(port_id,
+		      (*dev->dev_ops->stashing_rx_hints_set)(dev, queue_id,
+		      config));
+out:
+	return ret;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_dev_stashing_tx_config_set, 25.03)
+int
+rte_eth_dev_stashing_tx_config_set(uint16_t port_id, uint16_t queue_id,
+				   struct rte_eth_stashing_config *config)
+{
+	struct rte_eth_dev *dev;
+	int ret = 0;
+
+	ret = rte_eth_dev_validate_stashing_config(port_id, queue_id,
+						   RTE_ETH_DEV_TX_QUEUE,
+						   config);
+	if (ret < 0)
+		goto out;
+
+	dev = &rte_eth_devices[port_id];
+
+	ret = eth_err(port_id,
+		      (*dev->dev_ops->stashing_rx_hints_set) (dev, queue_id,
+		       config));
+out:
+	return ret;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_dev_stashing_capabilities_get, 25.03)
+int
+rte_eth_dev_stashing_capabilities_get(uint16_t port_id, uint16_t *objects)
+{
+	struct rte_eth_dev *dev;
+	struct rte_eth_dev_info dev_info;
+	int ret = 0;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+	if (!objects) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	dev = &rte_eth_devices[port_id];
+
+	ret = rte_eth_dev_info_get(port_id, &dev_info);
+	if (ret < 0)
+		goto out;
+
+	if ((dev_info.dev_capa & RTE_ETH_DEV_CAPA_CACHE_STASHING) !=
+	    RTE_ETH_DEV_CAPA_CACHE_STASHING) {
+		ret = -ENOTSUP;
+		goto out;
+	}
+
+	if (*dev->dev_ops->stashing_capabilities_get == NULL) {
+		RTE_ETHDEV_LOG_LINE(ERR, "Stashing hints are not implemented "
+				    "in %s for %s", dev_info.driver_name,
+				    dev_info.device->name);
+		ret = -ENOSYS;
+		goto out;
+	}
+	ret = eth_err(port_id,
+		      (*dev->dev_ops->stashing_capabilities_get)(dev, objects));
+out:
+	return ret;
+}
+
 RTE_EXPORT_SYMBOL(rte_eth_dev_logtype)
 RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index ea7f8c4a1a..c1133e8b76 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1667,6 +1667,9 @@ struct rte_eth_conf {
 #define RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP RTE_BIT64(4)
 /**@}*/
 
+/** Device supports stashing to CPU/system caches. */
+#define RTE_ETH_DEV_CAPA_CACHE_STASHING RTE_BIT64(5)
+
 /*
  * Fallback default preferred Rx/Tx port parameters.
  * These are used if an application requests default parameters
@@ -1838,6 +1841,7 @@ struct rte_eth_dev_info {
 	struct rte_eth_dev_portconf default_txportconf;
 	/** Generic device capabilities (RTE_ETH_DEV_CAPA_). */
 	uint64_t dev_capa;
+	uint16_t stashing_capa;
 	/**
 	 * Switching information for ports on a device with a
 	 * embedded managed interconnect/switch.
@@ -6173,6 +6177,160 @@ int rte_eth_cman_config_set(uint16_t port_id, const struct rte_eth_cman_config *
 __rte_experimental
 int rte_eth_cman_config_get(uint16_t port_id, struct rte_eth_cman_config *config);
 
+
+/** Queue type is RX. */
+#define RTE_ETH_DEV_RX_QUEUE		0
+/** Queue type is TX. */
+#define RTE_ETH_DEV_TX_QUEUE		1
+
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this structure may change, or be removed, without prior notice
+ *
+ * A structure used for configuring the cache stashing hints.
+ */
+struct rte_eth_stashing_config {
+	/**
+	 * lcore_id of the processor the stashing hints are applied to.
+	 */
+	uint32_t	lcore_id;
+	/**
+	 * Zero based cache level relative to the CPU.
+	 * E.g. l1d = 0, l2d = 1,...
+	 */
+	uint32_t	cache_level;
+	/**
+	 * Object types the configuration is applied to
+	 */
+	uint16_t	objects;
+	/**
+	 * The offset if RTE_ETH_DEV_STASH_OBJECT_OFFSET bit is set
+	 *  in objects
+	 */
+	off_t		offset;
+};
+
+/**@{@name Stashable Rx/Tx queue object types supported by the ethernet device
+ *@see rte_eth_dev_stashing_capabilities_get
+ *@see rte_eth_dev_stashing_rx_config_set
+ *@see rte_eth_dev_stashing_tx_config_set
+ */
+
+/**
+ * Apply stashing hint to data at a given offset from the start of a
+ * received packet.
+ */
+#define RTE_ETH_DEV_STASH_OBJECT_OFFSET		0x0001
+
+/** Apply stashing hint to an rx descriptor. */
+#define RTE_ETH_DEV_STASH_OBJECT_DESC		0x0002
+
+/** Apply stashing hint to a header of a received packet. */
+#define RTE_ETH_DEV_STASH_OBJECT_HEADER		0x0004
+
+/** Apply stashing hint to a payload of a received packet. */
+#define RTE_ETH_DEV_STASH_OBJECT_PAYLOAD	0x0008
+
+#define __RTE_ETH_DEV_STASH_OBJECT_MASK		0x000f
+/**@}*/
+
+#define RTE_ETH_DEV_STASH_OBJECTS_VALID(t)				\
+	((!((t) & (~__RTE_ETH_DEV_STASH_OBJECT_MASK))) && (t))
+
+/**
+ *
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * @internal
+ * Helper function to validate stashing hints configuration.
+ */
+__rte_experimental
+int rte_eth_dev_validate_stashing_config(uint16_t port_id, uint16_t queue_id,
+					 uint8_t queue_direction,
+					 struct rte_eth_stashing_config *config);
+
+/**
+ *
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Provide cache stashing hints for improved memory access latencies for
+ * packets received by the NIC.
+ * This feature is available only in supported NICs and platforms.
+ *
+ * @param port_id
+ *  The port identifier of the Ethernet device.
+ * @param queue_id
+ *  The index of the receive queue to which hints are applied.
+ * @param config
+ *  Stashing configuration.
+ * @return
+ *  - (-ENODEV) on incorrect port_ids.
+ *  - (-EINVAL) if both RX and TX object types used in conjuection in objects
+ *  parameter.
+ *  - (-EINVAL) on invalid queue_id.
+ *  - (-ENOTSUP) if RTE_ETH_DEV_CAPA_CACHE_STASHING capability is unavailable.
+ *  - (-ENOSYS) if PMD does not implement cache stashing hints.
+ *  - (0) on Success.
+ */
+__rte_experimental
+int rte_eth_dev_stashing_rx_config_set(uint16_t port_id, uint16_t queue_id,
+				   struct rte_eth_stashing_config *config);
+
+/**
+ *
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Configure cache stashing for improved memory access latencies for Tx
+ * queue completion descriptors being sent to host system by the NIC.
+ * This feature is available only in supported NICs and platforms.
+ *
+ * @param port_id
+ *  The port identifier of the Ethernet device.
+ * @param queue_id
+ *  The index of the receive queue to which hints are applied.
+ * @param config
+ *  Stashing configuration.
+ * @return
+ *  - (-ENODEV) on incorrect port_ids.
+ *  - (-EINVAL) if both RX and TX object types are used in conjuection in objects
+ *  parameter.
+ *  - (-EINVAL) if hints are incompatible with TX queues.
+ *  - (-EINVAL) on invalid queue_id.
+ *  - (-ENOTSUP) if RTE_ETH_DEV_CAPA_CACHE_STASHING capability is unavailable.
+ *  - (-ENOSYS) if PMD does not implement cache stashing hints.
+ *  - (0) on Success.
+ */
+__rte_experimental
+int rte_eth_dev_stashing_tx_config_set(uint16_t port_id, uint16_t queue_id,
+				   struct rte_eth_stashing_config *config);
+
+/**
+ *
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Discover cache stashing objects supported in the ethernet device.
+ *
+ * @param port_id
+ *  The port identifier of the Ethernet device.
+ * @param objects
+ *  Supported objects vector set by the ethernet device.
+ * @return
+ *  On return types and hints parameters will have bits set for supported
+ *  object types.
+ *  - (-ENOTSUP) if the device or the platform does not support cache stashing.
+ *  - (-ENOSYS)  if the underlying PMD hasn't implemented cache stashing
+ *  feature.
+ *  - (-EINVAL)  on NULL values for types or hints parameters.
+ *  - (0) on success.
+ */
+__rte_experimental
+int rte_eth_dev_stashing_capabilities_get(uint16_t port_id, uint16_t *objects);
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH v4 1/3] pci: add non-merged Linux uAPI changes
  2025-05-17 15:17   ` [RFC PATCH v4 1/3] pci: add non-merged Linux uAPI changes Wathsala Vithanage
@ 2025-05-19  6:41     ` David Marchand
  2025-05-19 17:55       ` Wathsala Wathawana Vithanage
  0 siblings, 1 reply; 33+ messages in thread
From: David Marchand @ 2025-05-19  6:41 UTC (permalink / raw)
  To: Wathsala Vithanage
  Cc: Chenbo Xia, Nipun Gupta, Maxime Coquelin, dev, Dhruv Tripathi

Hi Wathsala,

On Sat, May 17, 2025 at 5:18 PM Wathsala Vithanage
<wathsala.vithanage@arm.com> wrote:
>
> This commit is a hack to prevent build failures the next commit in this
> patch series causes due to missing vfio uapi definitions.
> This commit should NEVER BE MERGED.
> Next commit in this patch series depends on additions to vfio uapi that
> enable TPH icotl in the vfio-pci driver in the Linux kernel.
> These additions have not yet been merged into the upstream kernel.

Could you provide a link to the ongoing discussions (ml archive maybe)?
Thanks.

-- 
David Marchand


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH v4 2/3] bus/pci: introduce the PCIe TLP Processing Hints API
  2025-05-17 15:17   ` [RFC PATCH v4 2/3] bus/pci: introduce the PCIe TLP Processing Hints API Wathsala Vithanage
@ 2025-05-19  6:44     ` David Marchand
  2025-05-19 17:57       ` Wathsala Wathawana Vithanage
  0 siblings, 1 reply; 33+ messages in thread
From: David Marchand @ 2025-05-19  6:44 UTC (permalink / raw)
  To: Wathsala Vithanage
  Cc: Chenbo Xia, Nipun Gupta, Anatoly Burakov, dev,
	Honnappa Nagarahalli, Dhruv Tripathi

On Sat, May 17, 2025 at 5:18 PM Wathsala Vithanage
<wathsala.vithanage@arm.com> wrote:
> diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
> index c20d159218..463c06ad64 100644
> --- a/drivers/bus/pci/linux/pci.c
> +++ b/drivers/bus/pci/linux/pci.c
> @@ -814,3 +814,97 @@ rte_pci_ioport_unmap(struct rte_pci_ioport *p)
>
>         return ret;
>  }
> +
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_enable, 25.03)

The release currently prepared is 25.07.
(no need for a respin for just this comment, this series is not ready
for merge in any case)


-- 
David Marchand


^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [RFC PATCH v4 1/3] pci: add non-merged Linux uAPI changes
  2025-05-19  6:41     ` David Marchand
@ 2025-05-19 17:55       ` Wathsala Wathawana Vithanage
  0 siblings, 0 replies; 33+ messages in thread
From: Wathsala Wathawana Vithanage @ 2025-05-19 17:55 UTC (permalink / raw)
  To: David Marchand
  Cc: Chenbo Xia, Nipun Gupta, Maxime Coquelin, dev@dpdk.org,
	Dhruv Tripathi, nd

Linux uAPI changes
> 
> Hi Wathsala,
> 
> On Sat, May 17, 2025 at 5:18 PM Wathsala Vithanage
> <wathsala.vithanage@arm.com> wrote:
> >
> > This commit is a hack to prevent build failures the next commit in
> > this patch series causes due to missing vfio uapi definitions.
> > This commit should NEVER BE MERGED.
> > Next commit in this patch series depends on additions to vfio uapi
> > that enable TPH icotl in the vfio-pci driver in the Linux kernel.
> > These additions have not yet been merged into the upstream kernel.
> 
> Could you provide a link to the ongoing discussions (ml archive maybe)?
> Thanks.

Hi David,

I initially introduced this feature as a VFIO feature IOCTL. Maintainers didn't
like that idea and favored a new IOCTL for TPH.
I will be sending out the V2 with that change sometime this week (pending
review internally). I sent out the DPDK patch before so that I can link this
patch to the cover letter of the kernel patch to show them that there is a real
user-space use case for this feature.

V1 is now largely outdated here is the link
 https://lore.kernel.org/kvm/20250401235225.GA327284@ziepe.ca/T/#me73cf9b9c87da97d7d9461dfb97863b78ca1755b

Thanks.

--wathsala


^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [RFC PATCH v4 2/3] bus/pci: introduce the PCIe TLP Processing Hints API
  2025-05-19  6:44     ` David Marchand
@ 2025-05-19 17:57       ` Wathsala Wathawana Vithanage
  0 siblings, 0 replies; 33+ messages in thread
From: Wathsala Wathawana Vithanage @ 2025-05-19 17:57 UTC (permalink / raw)
  To: David Marchand
  Cc: Chenbo Xia, Nipun Gupta, Anatoly Burakov, dev@dpdk.org,
	Honnappa Nagarahalli, Dhruv Tripathi, nd

> Hints API
> 
> On Sat, May 17, 2025 at 5:18 PM Wathsala Vithanage
> <wathsala.vithanage@arm.com> wrote:
> > diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
> > index c20d159218..463c06ad64 100644
> > --- a/drivers/bus/pci/linux/pci.c
> > +++ b/drivers/bus/pci/linux/pci.c
> > @@ -814,3 +814,97 @@ rte_pci_ioport_unmap(struct rte_pci_ioport *p)
> >
> >         return ret;
> >  }
> > +
> > +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_enable, 25.03)
> 
> The release currently prepared is 25.07.
> (no need for a respin for just this comment, this series is not ready for merge in
> any case)
> 
> 
Yes, thanks for pointing this out. I only waw that after sending the patch out. 
I will be updating this with fixes to windows and BSD build failures. 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH v4 3/3] ethdev: introduce the cache stashing hints API
  2025-05-17 15:17   ` [RFC PATCH v4 3/3] ethdev: introduce the cache stashing hints API Wathsala Vithanage
@ 2025-05-20 13:53     ` Stephen Hemminger
  0 siblings, 0 replies; 33+ messages in thread
From: Stephen Hemminger @ 2025-05-20 13:53 UTC (permalink / raw)
  To: Wathsala Vithanage
  Cc: Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, dev,
	Honnappa Nagarahalli, Dhruv Tripathi

On Sat, 17 May 2025 15:17:35 +0000
Wathsala Vithanage <wathsala.vithanage@arm.com> wrote:

> Extend the ethdev library to enable the stashing of different data
> objects, such as the ones listed below, into CPU caches directly
> from the NIC.
> 
> - Rx/Tx queue descriptors
> - Rx packets
> - Packet headers
> - packet payloads
> - Data of a packet at an offset from the start of the packet
> 
> The APIs are designed in a hardware/vendor agnostic manner such that
> supporting PMDs could use any capabilities available in the underlying
> hardware for fine-grained stashing of data objects into a CPU cache
> 
> The API provides an interface to query the availability of stashing
> capabilities, i.e., platform/NIC support, stashable object types, etc,
> via the rte_eth_dev_stashing_capabilities_get interface.
> 
> The function pair rte_eth_dev_stashing_rx_config_set and
> rte_eth_dev_stashing_tx_config_set sets the stashing hint (the CPU, 
> cache level, and data object types) on the Rx and Tx queues.
> 
> PMDs that support stashing must register their implementations with the
> following eth_dev_ops callbacks, which are invoked by the ethdev
> functions listed above.
> 
> - stashing_capabilities_get
> - stashing_rx_hints_set
> - stashing_tx_hints_set
> 
> Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>

Adding more control in DPDK is good but I have concerns.

The default must be that the application has all the caching enabled
without calling this API. It would be bad if existing DPDK applications
from network vendors had to be modified.

The DPDK should follow the "it just works" mantra and additional
API's should be for special cases.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v5 0/4] An API for Cache Stashing with TPH
       [not found] <20241021015246.304431-1-wathsala.vithanage@arm.com>
  2025-05-17 15:17 ` [RFC PATCH v4 0/3] An API for Stashing Packets into CPU caches Wathsala Vithanage
@ 2025-06-02 22:38 ` Wathsala Vithanage
  2025-06-02 22:38   ` [PATCH v5 1/4] pci: add non-merged Linux uAPI changes Wathsala Vithanage
                     ` (5 more replies)
  1 sibling, 6 replies; 33+ messages in thread
From: Wathsala Vithanage @ 2025-06-02 22:38 UTC (permalink / raw)
  Cc: dev, nd, Wathsala Vithanage

Today, DPDK applications benefit from Direct Cache Access (DCA) features
like Intel DDIO and Arm's write-allocate-to-SLC. However, those features
do not allow fine-grained control of direct cache access, such as
stashing packets into upper-level caches (L2 caches) of a processor or
the shared cache of a chiplet. PCIe TLP Processing Hints (TPH) addresses
this need in a vendor-agnostic manner. TPH capability has existed since
PCI Express Base Specification revision 3.0; today, numerous Network
Interface Cards and interconnects from different vendors support TPH
capability. TPH comprises a steering tag (ST) and a processing hint
(PH). ST specifies the cache level of a CPU at which the data should be
written to (or DCAed into), while PH is a hint provided by the PCIe
requester to the completer on an upcoming traffic pattern. Some NIC
vendors bundle TPH capability with fine-grained control over the type of
objects that can be stashed into CPU caches, such as

- Rx/Tx queue descriptors
- Packet-headers
- Packet-payloads
- Data from a given offset from the start of a packet

Note that stashable object types are outside the scope of the PCIe
standard; therefore, vendors could support any combination of the above
items as they see fit.

To enable TPH and fine-grained packet stashing, this API extends the
ethdev library and the PCI bus driver. In this design, the application
provides hints to the PMD via the ethdev stashing API to indicate the
underlying hardware at which CPU and cache level it prefers a packet to
end up. Once the PMD receives a CPU and a cache-level combination (or a
list of such combinations), it must extract the matching ST from the PCI
bus driver for such combinations. The PCI bus driver implements the TPH
functions in an OS specific way; for Linux, it depends on the TPH
capabilities of the VFIO kernel driver.

An application uses the cache stashing ethdev API by first calling the
rte_eth_dev_stashing_capabilities_get() function to find out what object
types can be stashed into a CPU cache by the NIC out of the object types
in the bulleted list above. This function takes a port_id and a pointer
to a uint16_t to report back the object type flags. PMD implements the
stashing_capabilities_get function pointer in eth_dev_ops. If the
underlying platform or the NIC does not support TPH, this function
returns -ENOTSUP, and the application should consider any values stored
in the object invalid.

Once the application knows the supported object types that can be
stashed, the next step is to set the steering tags for the packets
associated with Rx and Tx queues via
rte_eth_dev_stashing_{rx,tx}_config_set() ethdev library functions. Both
functions have an identical signature, a port_id, a queue_id, and a
config object. The port_id and the queue_id are used to locate the
device and the queue. The config object is of type struct
rte_eth_stashing_config, which specifies the lcore_id and the
cache_level, indicating where objects from this queue should be stashed.
The 'objects' field in the config sets the types of objects the
application wishes to stash based on the capabilities found earlier.
Note that if the 'objects' field includes the flag
RTE_ETH_DEV_STASH_OBJECT_OFFSET, the 'offset' field must be used to set
the desired offset. These functions invoke PMD implementations of the
stashing functionality via the stashing_{rx,tx}_hints_set function
callbacks in the eth_dev_ops, respectively.

The PMD's implementation of the stashing_rx_hints_set() and
stashing_tx_hints_set() functions is ultimately responsible for
extracting the ST via the API provided by the PCI bus driver. Before
extracting STs, the PMD should enable the TPH capability in the endpoint
device by calling the rte_pci_tph_enable() function.  The application
begins the ST extraction process by calling the rte_pci_tph_st_get()
function in drivers/bus/pci/rte_bus_pci.h, which returns STs via the
same rte_tph_info objects array passed into it as an argument.  Once PMD
acquires ST, the stashing_{rx,tx}_hints_set callbacks implemented in the
PMD are ready to set the ST as per the rte_eth_stashing_config object
passed to them by the higher-level ethdev functions
ret_eth_dev_stashing_{rx,tx}_hints(). As per the PCIe specification, STs
can be placed on the MSI-X tables or in a device-specific location. For
PMDs, setting the STs on queue contexts is the only viable way of using
TPH. Therefore, the PMDs should only enable TPH in device-specific mode.

V4->V5:
 * Enable stashing-hints (TPH) in Intel i40e driver.
 * Update exported symbol version from 25.03 to 25.07.
 * Add TPH mode macros.

V3->V4:
 * Add VFIO IOCTL based ST extraction mechanism to Linux PCI bus driver
 * Remove ST extraction via direct access to ACPI _DSM
 * Replace rte_pci_extract_tph_st() with rte_pci_tph_st_get() in PCI
   bus driver.

Wathsala Vithanage (4):
  pci: add non-merged Linux uAPI changes
  bus/pci: introduce the PCIe TLP Processing Hints API
  ethdev: introduce the cache stashing hints API
  net/i40e: enable TPH in i40e

 drivers/bus/pci/bsd/pci.c            |  43 +++++++
 drivers/bus/pci/bus_pci_driver.h     |  52 ++++++++
 drivers/bus/pci/linux/pci.c          | 100 ++++++++++++++++
 drivers/bus/pci/linux/pci_init.h     |  14 +++
 drivers/bus/pci/linux/pci_vfio.c     | 170 +++++++++++++++++++++++++++
 drivers/bus/pci/private.h            |   8 ++
 drivers/bus/pci/rte_bus_pci.h        |  67 +++++++++++
 drivers/bus/pci/windows/pci.c        |  43 +++++++
 drivers/net/intel/i40e/i40e_ethdev.c | 127 ++++++++++++++++++++
 kernel/linux/uapi/linux/vfio_tph.h   | 102 ++++++++++++++++
 lib/ethdev/ethdev_driver.h           |  66 +++++++++++
 lib/ethdev/rte_ethdev.c              | 149 +++++++++++++++++++++++
 lib/ethdev/rte_ethdev.h              | 158 +++++++++++++++++++++++++
 lib/pci/rte_pci.h                    |  15 +++
 14 files changed, 1114 insertions(+)
 create mode 100644 kernel/linux/uapi/linux/vfio_tph.h

-- 
2.43.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v5 1/4] pci: add non-merged Linux uAPI changes
  2025-06-02 22:38 ` [PATCH v5 0/4] An API for Cache Stashing with TPH Wathsala Vithanage
@ 2025-06-02 22:38   ` Wathsala Vithanage
  2025-06-02 23:11     ` Wathsala Wathawana Vithanage
  2025-06-04 20:43     ` Stephen Hemminger
  2025-06-02 22:38   ` [PATCH v5 2/4] bus/pci: introduce the PCIe TLP Processing Hints API Wathsala Vithanage
                     ` (4 subsequent siblings)
  5 siblings, 2 replies; 33+ messages in thread
From: Wathsala Vithanage @ 2025-06-02 22:38 UTC (permalink / raw)
  To: Chenbo Xia, Nipun Gupta, Maxime Coquelin
  Cc: dev, nd, Wathsala Vithanage, Dhruv Tripathi

This commit is a hack to prevent build failures the next commit in this
patch series causes due to missing vfio uapi definitions.
This commit should NEVER BE MERGED.
Next commit in this patch series depends on additions to vfio uapi that
enable TPH icotl in the vfio-pci driver in the Linux kernel.
These additions have not yet been merged into the upstream kernel.

Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
---
 drivers/bus/pci/linux/pci_init.h   |   1 +
 kernel/linux/uapi/linux/vfio_tph.h | 102 +++++++++++++++++++++++++++++
 2 files changed, 103 insertions(+)
 create mode 100644 kernel/linux/uapi/linux/vfio_tph.h

diff --git a/drivers/bus/pci/linux/pci_init.h b/drivers/bus/pci/linux/pci_init.h
index a4d37c0d0a..25b901f460 100644
--- a/drivers/bus/pci/linux/pci_init.h
+++ b/drivers/bus/pci/linux/pci_init.h
@@ -6,6 +6,7 @@
 #define EAL_PCI_INIT_H_
 
 #include <rte_vfio.h>
+#include <uapi/linux/vfio_tph.h>
 
 #include "private.h"
 
diff --git a/kernel/linux/uapi/linux/vfio_tph.h b/kernel/linux/uapi/linux/vfio_tph.h
new file mode 100644
index 0000000000..9336c2e5af
--- /dev/null
+++ b/kernel/linux/uapi/linux/vfio_tph.h
@@ -0,0 +1,102 @@
+
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * VFIO API definition
+ *
+ * WARNING: CONTENTS OF THIS HEADER NEEDS TO BE MERGED INTO KERNEL'S
+ * uapi/linux/vifo.h IN A FUTURE KERNEL RELEASE. UNTIL THEN IT'S TACKED
+ * ON TO DPDK'S kernel/linux/uapi DIRECTORY TO PREVENT BUILD FAILURES.
+ *
+ * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
+ *     Author: Alex Williamson <alex.williamson@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _UAPIVFIO_TPH_H
+#define _UAPIVFIO_TPH_H
+
+/**
+ * VFIO_DEVICE_PCI_TPH	- _IO(VFIO_TYPE, VFIO_BASE + 22)
+ *
+ * This command is used to control PCIe TLP Processing Hints (TPH)
+ * capability in a PCIe device.
+ * It supports following operations on a PCIe device with respect to TPH
+ * capability.
+ *
+ * - Enabling/disabling TPH capability in a PCIe device.
+ *
+ *   Setting VFIO_DEVICE_TPH_ENABLE flag enables TPH in no-steering-tag,
+ *   interrupt-vector, or device-specific mode defined in the PCIe specficiation
+ *   when feature flags TPH_ST_NS_MODE, TPH_ST_IV_MODE, and TPH_ST_DS_MODE are
+ *   set respectively. TPH_ST_xx_MODE macros are defined in
+ *   uapi/linux/pci_regs.h.
+ *
+ *   VFIO_DEVICE_TPH_DISABLE disables PCIe TPH on the device.
+ *
+ * - Writing STs to MSI-X or ST table in a PCIe device.
+ *
+ *   VFIO_DEVICE_TPH_SET_ST flag set steering tags on a device at an index in
+ *   MSI-X or ST-table depending on the VFIO_TPH_ST_x_MODE flag used and device
+ *   capabilities. The caller can set one or more steering tags by passing an
+ *   array of vfio_pci_tph_entry objects containing cpu_id, cache_level, and
+ *   MSI-X/ST-table index. The caller can also set the intended memory type and
+ *   the processing hint by setting VFIO_TPH_MEM_TYPE_x and VFIO_TPH_HINT_x
+ *   flags, respectively.
+ *
+ * - Reading Steering Tags (ST) from the host platform.
+ *
+ *   VFIO_DEVICE_TPH_GET_ST flags returns steering tags to the caller. Caller
+ *   can request one or more steering tags by passing an array of
+ *   vfio_pci_tph_entry objects. Steering Tag for each request is returned via
+ *   the st field in vfio_pci_tph_entry.
+ */
+struct vfio_pci_tph_entry {
+	/* in */
+	__u32 cpu_id;			/* CPU logical ID */
+	__u32 cache_level;		/* Cache level. L1 D= 0, L2D = 2, ...*/
+	__u8  flags;
+#define VFIO_TPH_MEM_TYPE_MASK		0x1
+#define VFIO_TPH_MEM_TYPE_SHIFT		0
+#define VFIO_TPH_MEM_TYPE_VMEM		0   /* Request volatile memory ST */
+#define VFIO_TPH_MEM_TYPE_PMEM		1   /* Request persistent memory ST */
+
+#define VFIO_TPH_HINT_MASK		0x3
+#define VFIO_TPH_HINT_SHIFT		1
+#define VFIO_TPH_HINT_BIDIR		0
+#define VFIO_TPH_HINT_REQSTR		(1 << VFIO_TPH_HINT_SHIFT)
+#define VFIO_TPH_HINT_TARGET		(2 << VFIO_TPH_HINT_SHIFT)
+#define VFIO_TPH_HINT_TARGET_PRIO	(3 << VFIO_TPH_HINT_SHIFT)
+	__u8  pad0;
+	__u16 index;			/* MSI-X/ST-table index to set ST */
+	/* out */
+	__u16 st;			/* Steering-Tag */
+	__u8  ph_ignore;		/* Platform ignored the Processing */
+	__u8  pad1;
+};
+
+struct vfio_pci_tph {
+	__u32 argsz;			/* Size of vfio_pci_tph and info[] */
+	__u32 flags;
+#define VFIO_TPH_ST_MODE_MASK		0x7
+
+#define VFIO_DEVICE_TPH_OP_SHIFT	3
+#define VFIO_DEVICE_TPH_OP_MASK		(0x7 << VFIO_DEVICE_TPH_OP_SHIFT)
+/* Enable TPH on device */
+#define VFIO_DEVICE_TPH_ENABLE		0
+/* Disable TPH on device */
+#define VFIO_DEVICE_TPH_DISABLE		(1 << VFIO_DEVICE_TPH_OP_SHIFT)
+/* Get steering-tags */
+#define VFIO_DEVICE_TPH_GET_ST		(2 << VFIO_DEVICE_TPH_OP_SHIFT)
+/* Set steering-tags */
+#define VFIO_DEVICE_TPH_SET_ST		(4 << VFIO_DEVICE_TPH_OP_SHIFT)
+	__u32 count;			/* Number of entries in ents[] */
+	struct vfio_pci_tph_entry ents[];
+#define VFIO_TPH_INFO_MAX	2048	/* Max entries in ents[] */
+};
+
+#define VFIO_DEVICE_PCI_TPH	_IO(VFIO_TYPE, VFIO_BASE + 22)
+
+#endif /* _UAPIVFIO_TPH_H */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 2/4] bus/pci: introduce the PCIe TLP Processing Hints API
  2025-06-02 22:38 ` [PATCH v5 0/4] An API for Cache Stashing with TPH Wathsala Vithanage
  2025-06-02 22:38   ` [PATCH v5 1/4] pci: add non-merged Linux uAPI changes Wathsala Vithanage
@ 2025-06-02 22:38   ` Wathsala Vithanage
  2025-06-03  8:11     ` Morten Brørup
                       ` (2 more replies)
  2025-06-02 22:38   ` [PATCH v5 3/4] ethdev: introduce the cache stashing hints API Wathsala Vithanage
                     ` (3 subsequent siblings)
  5 siblings, 3 replies; 33+ messages in thread
From: Wathsala Vithanage @ 2025-06-02 22:38 UTC (permalink / raw)
  To: Chenbo Xia, Nipun Gupta, Anatoly Burakov, Gaetan Rivet
  Cc: dev, nd, Wathsala Vithanage, Honnappa Nagarahalli, Dhruv Tripathi

Extend the PCI bus driver to enable or disable TPH capability and set or
get PCI Steering-Tags (STs) on an endpoint device. The functions
rte_pci_tph_{enable, disable,st_set,st_get} provide the primary
interface for DPDK device drivers. Implementation of the interface is OS
dependent. For Linux, the kernel VFIO driver provides the
implementation. rte_pci_tph_{enable, disable} functions enable and
disable TPH capability, respectively. rte_pci_tph_enable enables TPH on
the device in either of the device-specific, interrupt-vector, or
no-steering-tag modes.

rte_pci_tph_st_{get, set} functions take an array of rte_tph_info
objects with cpu-id, cache-level, flags (processing-hint, memory-type).
The index in rte_tph_info is the MSI-X/MSI vector/ST-table index if TPH
was enabled in the interrupt-vector mode; the rte_pci_tph_st_get
function ignores it. Both rte_pci_tph_st_{set, get} functions return the
steering-tag (st) and processing-hint-ignored (ph_ignore) fields via the
same rte_tph_info object passed into them.

rte_pci_tph_st_{get, set} functions will return an error if processing
any of the rte_tph_info objects fails. The API does not indicate which
entry in the rte_tph_info array was executed successfully and which
caused an error. Therefore, in case of an error, the caller should
discard the output. If rte_pci_tph_set returns an error, it should be
treated as a partial error. Hence, the steering-tag update on the device
should be considered partial and inconsistent with the expected outcome.
This should be resolved by resetting the endpoint device before further
attempts to set steering tags.

Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
---
 drivers/bus/pci/bsd/pci.c        |  43 ++++++++
 drivers/bus/pci/bus_pci_driver.h |  52 ++++++++++
 drivers/bus/pci/linux/pci.c      | 100 ++++++++++++++++++
 drivers/bus/pci/linux/pci_init.h |  13 +++
 drivers/bus/pci/linux/pci_vfio.c | 170 +++++++++++++++++++++++++++++++
 drivers/bus/pci/private.h        |   8 ++
 drivers/bus/pci/rte_bus_pci.h    |  67 ++++++++++++
 drivers/bus/pci/windows/pci.c    |  43 ++++++++
 lib/pci/rte_pci.h                |  15 +++
 9 files changed, 511 insertions(+)

diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
index 5e2e09d5a4..dff750c4d6 100644
--- a/drivers/bus/pci/bsd/pci.c
+++ b/drivers/bus/pci/bsd/pci.c
@@ -650,3 +650,46 @@ rte_pci_ioport_unmap(struct rte_pci_ioport *p)
 
 	return ret;
 }
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_enable, 25.07)
+int
+rte_pci_tph_enable(struct rte_pci_device *dev, int mode)
+{
+	RTE_SET_USED(dev);
+	RTE_SET_USED(mode);
+	/* This feature is not yet implemented for BSD */
+	return -1;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_disable, 25.07)
+int
+rte_pci_tph_disable(struct rte_pci_device *dev)
+{
+	RTE_SET_USED(dev);
+	/* This feature is not yet implemented for BSD */
+	return -1;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_st_get, 25.07)
+int
+rte_pci_tph_st_get(const struct rte_pci_device *dev,
+		   struct rte_tph_info *info, size_t count)
+{
+	RTE_SET_USED(dev);
+	RTE_SET_USED(info);
+	RTE_SET_USED(count);
+	/* This feature is not yet implemented for BSD */
+	return -1;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_st_set, 25.07)
+int
+rte_pci_tph_st_set(const struct rte_pci_device *dev,
+		   struct rte_tph_info *info, size_t count)
+{
+	RTE_SET_USED(dev);
+	RTE_SET_USED(info);
+	RTE_SET_USED(count);
+	/* This feature is not yet implemented for BSD */
+	return -1;
+}
diff --git a/drivers/bus/pci/bus_pci_driver.h b/drivers/bus/pci/bus_pci_driver.h
index 2cc1119072..b1c2829fc1 100644
--- a/drivers/bus/pci/bus_pci_driver.h
+++ b/drivers/bus/pci/bus_pci_driver.h
@@ -46,6 +46,7 @@ struct rte_pci_device {
 	char *bus_info;                     /**< PCI bus specific info */
 	struct rte_intr_handle *vfio_req_intr_handle;
 				/**< Handler of VFIO request interrupt */
+	uint8_t tph_enabled;                /**< TPH enabled on this device */
 };
 
 /**
@@ -194,6 +195,57 @@ struct rte_pci_ioport {
 	uint64_t len; /* only filled for memory mapped ports */
 };
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this structure may change, or be removed, without prior
+ * notice
+ *
+ * This structure is passed into the TPH Steering-Tag set or get function as an
+ * argument by the caller. Return values are set in the same structure in st and
+ * ph_ignore fields by the calee.
+ *
+ * Refer to PCI-SIG ECN "Revised _DSM for Cache Locality TPH Features" for
+ * details.
+ */
+struct rte_tph_info {
+	/* Input */
+	uint32_t cpu_id;	/*Logical CPU id*/
+	uint32_t cache_level;	/*Cache level relative to CPU. l1d=0,l2d=1,...*/
+	uint8_t flags;		/*Memory type, procesisng hint etc.*/
+	uint16_t index;		/*Index in vector table to store the ST*/
+
+	/* Output */
+	uint16_t st;		/*Steering tag returned by the platform*/
+	uint8_t ph_ignore;	/*Platform ignores PH for the returned ST*/
+};
+
+#define RTE_PCI_TPH_MEM_TYPE_MASK		0x1
+#define RTE_PCI_TPH_MEM_TYPE_SHIFT		0
+/** Request volatile memory ST */
+#define RTE_PCI_TPH_MEM_TYPE_VMEM		0
+/** Request persistent memory ST */
+#define RTE_PCI_TPH_MEM_TYPE_PMEM		1
+
+/** TLP Processing Hints - PCIe 6.0 specification section 2.2.7.1.1 */
+#define RTE_PCI_TPH_HINT_MASK		0x3
+#define RTE_PCI_TPH_HINT_SHIFT		1
+/** Host and device access data equally */
+#define RTE_PCI_TPH_HINT_BIDIR		0
+/** Device accesses data more frequently */
+#define RTE_PCI_TPH_HINT_REQSTR		(1 << RTE_PCI_TPH_HINT_SHIFT)
+/** Host access data more frequently */
+#define RTE_PCI_TPH_HINT_TARGET		(2 << RTE_PCI_TPH_HINT_SHIFT)
+/** Host access data more frequently with a high temporal locality */
+#define RTE_PCI_TPH_HINT_TARGET_PRIO	(3 << RTE_PCI_TPH_HINT_SHIFT)
+
+#define RTE_PCI_TPH_ST_MODE_MASK   0x3
+/** TPH no ST mode */
+#define RTE_PCI_TPH_ST_NS_MODE	   0
+/** TPH interrupt vector mode */
+#define RTE_PCI_TPH_ST_IV_MODE	   1
+/** TPH device specific mode */
+#define RTE_PCI_TPH_ST_DS_MODE	   2
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index c20d159218..b5a8ba0a86 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -814,3 +814,103 @@ rte_pci_ioport_unmap(struct rte_pci_ioport *p)
 
 	return ret;
 }
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_enable, 25.07)
+int
+rte_pci_tph_enable(struct rte_pci_device *dev, int mode)
+{
+	int ret = 0;
+
+	switch (dev->kdrv) {
+#ifdef VFIO_PRESENT
+	case RTE_PCI_KDRV_VFIO:
+		if (pci_vfio_is_enabled())
+			ret = pci_vfio_tph_enable(dev, mode);
+		break;
+#endif
+	case RTE_PCI_KDRV_IGB_UIO:
+	case RTE_PCI_KDRV_UIO_GENERIC:
+	default:
+		ret = -ENOTSUP;
+		break;
+	}
+
+	if (!ret)
+		dev->tph_enabled = 1;
+
+	return ret;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_disable, 25.07)
+int
+rte_pci_tph_disable(struct rte_pci_device *dev)
+{
+	int ret = 0;
+
+	switch (dev->kdrv) {
+#ifdef VFIO_PRESENT
+	case RTE_PCI_KDRV_VFIO:
+		if (pci_vfio_is_enabled())
+			ret = pci_vfio_tph_disable(dev);
+		break;
+#endif
+	case RTE_PCI_KDRV_IGB_UIO:
+	case RTE_PCI_KDRV_UIO_GENERIC:
+	default:
+		ret = -ENOTSUP;
+		break;
+	}
+
+	if (!ret)
+		dev->tph_enabled = 0;
+
+	return ret;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_st_get, 25.07)
+int
+rte_pci_tph_st_get(const struct rte_pci_device *dev,
+		   struct rte_tph_info *info, size_t count)
+{
+	int ret = 0;
+
+	switch (dev->kdrv) {
+#ifdef VFIO_PRESENT
+	case RTE_PCI_KDRV_VFIO:
+		if (pci_vfio_is_enabled())
+			ret = pci_vfio_tph_st_get(dev, info, count);
+		break;
+#endif
+	case RTE_PCI_KDRV_IGB_UIO:
+	case RTE_PCI_KDRV_UIO_GENERIC:
+	default:
+		ret = -ENOTSUP;
+		break;
+	}
+
+	return ret;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_st_set, 25.07)
+int
+rte_pci_tph_st_set(const struct rte_pci_device *dev,
+		   struct rte_tph_info *info, size_t count)
+{
+	int ret = 0;
+
+	switch (dev->kdrv) {
+#ifdef VFIO_PRESENT
+	case RTE_PCI_KDRV_VFIO:
+		if (pci_vfio_is_enabled())
+			ret = pci_vfio_tph_st_set(dev, info, count);
+		break;
+#endif
+	case RTE_PCI_KDRV_IGB_UIO:
+	case RTE_PCI_KDRV_UIO_GENERIC:
+	default:
+		ret = -ENOTSUP;
+		break;
+	}
+
+	return ret;
+}
diff --git a/drivers/bus/pci/linux/pci_init.h b/drivers/bus/pci/linux/pci_init.h
index 25b901f460..e71bfd2dce 100644
--- a/drivers/bus/pci/linux/pci_init.h
+++ b/drivers/bus/pci/linux/pci_init.h
@@ -5,6 +5,7 @@
 #ifndef EAL_PCI_INIT_H_
 #define EAL_PCI_INIT_H_
 
+#include <rte_compat.h>
 #include <rte_vfio.h>
 #include <uapi/linux/vfio_tph.h>
 
@@ -76,6 +77,18 @@ int pci_vfio_ioport_unmap(struct rte_pci_ioport *p);
 int pci_vfio_map_resource(struct rte_pci_device *dev);
 int pci_vfio_unmap_resource(struct rte_pci_device *dev);
 
+/* TLP Processing Hints control functions */
+__rte_experimental
+int pci_vfio_tph_enable(const struct rte_pci_device *dev, int mode);
+__rte_experimental
+int pci_vfio_tph_disable(const struct rte_pci_device *dev);
+__rte_experimental
+int pci_vfio_tph_st_get(const struct rte_pci_device *dev,
+			struct rte_tph_info *info, size_t ent_count);
+__rte_experimental
+int pci_vfio_tph_st_set(const struct rte_pci_device *dev,
+			struct rte_tph_info *info, size_t ent_count);
+
 int pci_vfio_is_enabled(void);
 
 #endif
diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
index 5317170231..bdbeb38658 100644
--- a/drivers/bus/pci/linux/pci_vfio.c
+++ b/drivers/bus/pci/linux/pci_vfio.c
@@ -12,6 +12,7 @@
 #include <stdbool.h>
 
 #include <rte_log.h>
+#include <eal_export.h>
 #include <rte_pci.h>
 #include <rte_bus_pci.h>
 #include <rte_eal_paging.h>
@@ -1316,6 +1317,175 @@ pci_vfio_mmio_write(const struct rte_pci_device *dev, int bar,
 	return pwrite(fd, buf, len, offset + offs);
 }
 
+static int
+pci_vfio_tph_ioctl(const struct rte_pci_device *dev, struct vfio_pci_tph *pci_tph)
+{
+	const struct rte_intr_handle *intr_handle = dev->intr_handle;
+	int vfio_dev_fd = 0, ret = 0;
+
+	vfio_dev_fd = rte_intr_dev_fd_get(intr_handle);
+	if (vfio_dev_fd < 0) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ret = ioctl(vfio_dev_fd, VFIO_DEVICE_PCI_TPH, pci_tph);
+out:
+	return ret;
+}
+
+static int
+pci_vfio_tph_st_op(const struct rte_pci_device *dev,
+		    struct rte_tph_info *info, size_t count,
+		    enum rte_pci_st_op op)
+{
+	int ret = 0;
+	size_t argsz = 0, i;
+	struct vfio_pci_tph *pci_tph = NULL;
+	uint8_t mem_type = 0, hint = 0;
+
+	if (!count) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	argsz = sizeof(struct vfio_pci_tph) +
+		count * sizeof(struct vfio_pci_tph_entry);
+
+	pci_tph = rte_zmalloc(NULL, argsz, 0);
+	if (!pci_tph) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	pci_tph->argsz = argsz;
+	pci_tph->count = count;
+
+	switch (op) {
+	case RTE_PCI_TPH_ST_GET:
+		pci_tph->flags = VFIO_DEVICE_TPH_GET_ST;
+		break;
+	case RTE_PCI_TPH_ST_SET:
+		pci_tph->flags = VFIO_DEVICE_TPH_SET_ST;
+		break;
+	default:
+		ret = -EINVAL;
+		goto out;
+	}
+
+	for (i = 0; i < count; i++) {
+		pci_tph->ents[i].cpu_id = info[i].cpu_id;
+		pci_tph->ents[i].cache_level = info[i].cache_level;
+
+		mem_type = info[i].flags & RTE_PCI_TPH_MEM_TYPE_MASK;
+		switch (mem_type) {
+		case RTE_PCI_TPH_MEM_TYPE_VMEM:
+			pci_tph->ents[i].flags |= VFIO_TPH_MEM_TYPE_VMEM;
+			break;
+		case RTE_PCI_TPH_MEM_TYPE_PMEM:
+			pci_tph->ents[i].flags |= VFIO_TPH_MEM_TYPE_PMEM;
+			break;
+		default:
+			ret = -EINVAL;
+			goto out;
+		}
+
+		hint = info[i].flags & RTE_PCI_TPH_HINT_MASK;
+		switch (hint) {
+		case RTE_PCI_TPH_HINT_BIDIR:
+			pci_tph->ents[i].flags |= VFIO_TPH_HINT_BIDIR;
+			break;
+		case RTE_PCI_TPH_HINT_REQSTR:
+			pci_tph->ents[i].flags |= VFIO_TPH_HINT_REQSTR;
+			break;
+		case RTE_PCI_TPH_HINT_TARGET:
+			pci_tph->ents[i].flags |= VFIO_TPH_HINT_TARGET;
+			break;
+		case RTE_PCI_TPH_HINT_TARGET_PRIO:
+			pci_tph->ents[i].flags |= VFIO_TPH_HINT_TARGET_PRIO;
+			break;
+		default:
+			ret = -EINVAL;
+			goto out;
+		}
+
+		if (op == RTE_PCI_TPH_ST_SET)
+			pci_tph->ents[i].index = info[i].index;
+	}
+
+	ret = pci_vfio_tph_ioctl(dev, pci_tph);
+	if (ret)
+		goto out;
+
+	/*
+	 * Kernel returns steering-tag and ph-ignore bits for
+	 * RTE_PCI_TPH_ST_SET too, therefore copy output for
+	 * both RTE_PCI_TPH_ST_SET and RTE_PCI_TPH_ST_GET
+	 * cases.
+	 */
+	for (i = 0; i < count; i++) {
+		info[i].st = pci_tph->ents[i].st;
+		info[i].ph_ignore = pci_tph->ents[i].ph_ignore;
+	}
+
+out:
+	if (pci_tph)
+		rte_free(pci_tph);
+	return ret;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(pci_vfio_tph_enable, 25.07)
+int
+pci_vfio_tph_enable(const struct rte_pci_device *dev, int mode)
+{
+	int ret;
+
+	if (!(mode ^ (mode & VFIO_TPH_ST_MODE_MASK))) {
+		ret = -EINVAL;
+		goto out;
+	} else
+		mode &= VFIO_TPH_ST_MODE_MASK;
+
+	struct vfio_pci_tph pci_tph = {
+		.argsz = sizeof(struct vfio_pci_tph),
+		.flags = VFIO_DEVICE_TPH_ENABLE | mode,
+		.count = 0
+	};
+
+	ret = pci_vfio_tph_ioctl(dev, &pci_tph);
+out:
+	return ret;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(pci_vfio_tph_disable, 25.07)
+int
+pci_vfio_tph_disable(const struct rte_pci_device *dev)
+{
+	struct vfio_pci_tph pci_tph = {
+		.argsz = sizeof(struct vfio_pci_tph),
+		.flags = VFIO_DEVICE_TPH_DISABLE,
+		.count = 0
+	};
+
+	return pci_vfio_tph_ioctl(dev, &pci_tph);
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(pci_vfio_tph_st_get, 25.07)
+int
+pci_vfio_tph_st_get(const struct rte_pci_device *dev,
+		    struct rte_tph_info *info, size_t count)
+{
+	return pci_vfio_tph_st_op(dev, info, count, RTE_PCI_TPH_ST_GET);
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(pci_vfio_tph_st_set, 25.07)
+int
+pci_vfio_tph_st_set(const struct rte_pci_device *dev,
+		    struct rte_tph_info *info, size_t count)
+{
+	return pci_vfio_tph_st_op(dev, info, count, RTE_PCI_TPH_ST_SET);
+}
+
 int
 pci_vfio_is_enabled(void)
 {
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 38109844b9..d2ec370320 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -335,4 +335,12 @@ rte_pci_dev_iterate(const void *start,
 int
 rte_pci_devargs_parse(struct rte_devargs *da);
 
+/*
+ * TPH Steering-Tag operation types.
+ */
+enum rte_pci_st_op {
+	RTE_PCI_TPH_ST_SET, /* Set TPH Steering - Tags */
+	RTE_PCI_TPH_ST_GET  /* Get TPH Steering - Tags */
+};
+
 #endif /* _PCI_PRIVATE_H_ */
diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
index 19a7b15b99..e4d4780f54 100644
--- a/drivers/bus/pci/rte_bus_pci.h
+++ b/drivers/bus/pci/rte_bus_pci.h
@@ -31,6 +31,7 @@ extern "C" {
 struct rte_pci_device;
 struct rte_pci_driver;
 struct rte_pci_ioport;
+struct rte_tph_info;
 
 struct rte_devargs;
 
@@ -312,6 +313,72 @@ void rte_pci_ioport_read(struct rte_pci_ioport *p,
 void rte_pci_ioport_write(struct rte_pci_ioport *p,
 		const void *data, size_t len, off_t offset);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Enable TLP Processing Hints (TPH) in the endpoint device.
+ *
+ * @param dev
+ *   A pointer to a rte_pci_device structure describing the device
+ *   to use.
+ * @param mode
+ *   TPH mode the device must operate in.
+ */
+__rte_experimental
+int rte_pci_tph_enable(struct rte_pci_device *dev, int mode);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Disable TLP Processing Hints (TPH) in the endpoint device.
+ *
+ * @param dev
+ *   A pointer to a rte_pci_device structure describing the device
+ *   to use.
+ */
+__rte_experimental
+int rte_pci_tph_disable(struct rte_pci_device *dev);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get PCI Steering-Tags (STs) for a list of stashing targets.
+ *
+ * @param mode
+ *   TPH mode the device must operate in.
+ * @param info
+ *   An array of rte_tph_info objects, each describing the target
+ *   cpu-id, cache-level, etc. Steering-tags for each target is
+ *   eturned via info array.
+ * @param count
+ *   The number of elements in the info array.
+ */
+__rte_experimental
+int rte_pci_tph_st_get(const struct rte_pci_device *dev,
+		struct rte_tph_info *info, size_t count);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set PCI Steering-Tags (STs) for a list of stashing targets.
+ *
+ * @param mode
+ *   TPH mode the device must operate in.
+ * @param info
+ *   An array of rte_tph_info objects, each describing the target
+ *   cpu-id, cache-level, etc. Steering-tags for each target is
+ *   eturned via info array.
+ * @param count
+ *   The number of elements in the info array.
+ */
+__rte_experimental
+int rte_pci_tph_st_set(const struct rte_pci_device *dev,
+		struct rte_tph_info *info, size_t count);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/drivers/bus/pci/windows/pci.c b/drivers/bus/pci/windows/pci.c
index e7e449306e..218e667a5a 100644
--- a/drivers/bus/pci/windows/pci.c
+++ b/drivers/bus/pci/windows/pci.c
@@ -511,3 +511,46 @@ rte_pci_scan(void)
 
 	return ret;
 }
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_enable, 25.07)
+int
+rte_pci_tph_enable(struct rte_pci_device *dev, int mode)
+{
+	RTE_SET_USED(dev);
+	RTE_SET_USED(mode);
+	/* This feature is not yet implemented for windows */
+	return -1;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_disable, 25.07)
+int
+rte_pci_tph_disable(struct rte_pci_device *dev)
+{
+	RTE_SET_USED(dev);
+	/* This feature is not yet implemented for windows */
+	return -1;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_st_get, 25.07)
+int
+rte_pci_tph_st_get(const struct rte_pci_device *dev,
+		   struct rte_tph_info *info, size_t count)
+{
+	RTE_SET_USED(dev);
+	RTE_SET_USED(info);
+	RTE_SET_USED(count);
+	/* This feature is not yet implemented for windows */
+	return -1;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_st_set, 25.07)
+int
+rte_pci_tph_st_set(const struct rte_pci_device *dev,
+		   struct rte_tph_info *info, size_t count)
+{
+	RTE_SET_USED(dev);
+	RTE_SET_USED(info);
+	RTE_SET_USED(count);
+	/* This feature is not yet implemented for windows */
+	return -1;
+}
diff --git a/lib/pci/rte_pci.h b/lib/pci/rte_pci.h
index 9a50a12142..da9cd666bf 100644
--- a/lib/pci/rte_pci.h
+++ b/lib/pci/rte_pci.h
@@ -137,6 +137,21 @@ extern "C" {
 /* Process Address Space ID (RTE_PCI_EXT_CAP_ID_PASID) */
 #define RTE_PCI_PASID_CTRL		0x06    /* PASID control register */
 
+/* TPH Requester */
+#define RTE_PCI_TPH_CAP            4       /* capability register */
+#define RTE_PCI_TPH_CAP_ST_NS      0x00000001 /* No ST Mode Supported */
+#define RTE_PCI_TPH_CAP_ST_IV      0x00000002 /* Interrupt Vector Mode Supported */
+#define RTE_PCI_TPH_CAP_ST_DS      0x00000004 /* Device Specific Mode Supported */
+#define RTE_PCI_TPH_CAP_EXT_TPH    0x00000100 /* Ext TPH Requester Supported */
+#define RTE_PCI_TPH_CAP_LOC_MASK   0x00000600 /* ST Table Location */
+#define RTE_PCI_TPH_LOC_NONE       0x00000000 /* Not present */
+#define RTE_PCI_TPH_LOC_CAP        0x00000200 /* In capability */
+#define RTE_PCI_TPH_LOC_MSIX       0x00000400 /* In MSI-X */
+#define RTE_PCI_TPH_CAP_ST_MASK    0x07FF0000 /* ST Table Size */
+#define RTE_PCI_TPH_CAP_ST_SHIFT   16      /* ST Table Size shift */
+#define RTE_PCI_TPH_BASE_SIZEOF    0xc     /* Size with no ST table */
+
+
 /** Formatting string for PCI device identifier: Ex: 0000:00:01.0 */
 #define PCI_PRI_FMT "%.4" PRIx32 ":%.2" PRIx8 ":%.2" PRIx8 ".%" PRIx8
 #define PCI_PRI_STR_SIZE sizeof("XXXXXXXX:XX:XX.X")
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 3/4] ethdev: introduce the cache stashing hints API
  2025-06-02 22:38 ` [PATCH v5 0/4] An API for Cache Stashing with TPH Wathsala Vithanage
  2025-06-02 22:38   ` [PATCH v5 1/4] pci: add non-merged Linux uAPI changes Wathsala Vithanage
  2025-06-02 22:38   ` [PATCH v5 2/4] bus/pci: introduce the PCIe TLP Processing Hints API Wathsala Vithanage
@ 2025-06-02 22:38   ` Wathsala Vithanage
  2025-06-03  8:43     ` Morten Brørup
  2025-06-05 10:03     ` Bruce Richardson
  2025-06-02 22:38   ` [PATCH v5 4/4] net/i40e: enable TPH in i40e Wathsala Vithanage
                     ` (2 subsequent siblings)
  5 siblings, 2 replies; 33+ messages in thread
From: Wathsala Vithanage @ 2025-06-02 22:38 UTC (permalink / raw)
  To: Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
  Cc: dev, nd, Wathsala Vithanage, Honnappa Nagarahalli, Dhruv Tripathi

Extend the ethdev library to enable the stashing of different data
objects, such as the ones listed below, into CPU caches directly
from the NIC.

- Rx/Tx queue descriptors
- Rx packets
- Packet headers
- packet payloads
- Data of a packet at an offset from the start of the packet

The APIs are designed in a hardware/vendor agnostic manner such that
supporting PMDs could use any capabilities available in the underlying
hardware for fine-grained stashing of data objects into a CPU cache

The API provides an interface to query the availability of stashing
capabilities, i.e., platform/NIC support, stashable object types, etc,
via the rte_eth_dev_stashing_capabilities_get interface.

The function pair rte_eth_dev_stashing_rx_config_set and
rte_eth_dev_stashing_tx_config_set sets the stashing hint (the CPU, 
cache level, and data object types) on the Rx and Tx queues.

PMDs that support stashing must register their implementations with the
following eth_dev_ops callbacks, which are invoked by the ethdev
functions listed above.

- stashing_capabilities_get
- stashing_rx_hints_set
- stashing_tx_hints_set

Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
---
 lib/ethdev/ethdev_driver.h |  66 ++++++++++++++++
 lib/ethdev/rte_ethdev.c    | 149 ++++++++++++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.h    | 158 +++++++++++++++++++++++++++++++++++++
 3 files changed, 373 insertions(+)

diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 2b4d2ae9c3..8a4012db08 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -1376,6 +1376,68 @@ enum rte_eth_dev_operation {
 typedef uint64_t (*eth_get_restore_flags_t)(struct rte_eth_dev *dev,
 					    enum rte_eth_dev_operation op);
 
+/**
+ * @internal
+ * Set cache stashing hints in Rx queue.
+ *
+ * @param dev
+ *   Port (ethdev) handle.
+ * @param queue_id
+ *   Rx queue.
+ * @param config
+ *   Stashing hints configuration for the queue.
+ *
+ * @return
+ *   -ENOTSUP if the device or the platform does not support cache stashing.
+ *   -ENOSYS  if the underlying PMD hasn't implemented cache stashing feature.
+ *   -EINVAL  on invalid arguments.
+ *   0 on success.
+ */
+typedef int (*eth_stashing_rx_hints_set_t)(struct rte_eth_dev *dev, uint16_t queue_id,
+					   struct rte_eth_stashing_config *config);
+
+/**
+ * @internal
+ * Set cache stashing hints in Tx queue.
+ *
+ * @param dev
+ *   Port (ethdev) handle.
+ * @param queue_id
+ *   Tx queue.
+ * @param config
+ *   Stashing hints configuration for the queue.
+ *
+ * @return
+ *   -ENOTSUP if the device or the platform does not support cache stashing.
+ *   -ENOSYS  if the underlying PMD hasn't implemented cache stashing feature.
+ *   -EINVAL  on invalid arguments.
+ *   0 on success.
+ */
+typedef int (*eth_stashing_tx_hints_set_t)(struct rte_eth_dev *dev, uint16_t queue_id,
+					   struct rte_eth_stashing_config *config);
+
+/**
+ * @internal
+ * Get cache stashing object types supported in the ethernet device.
+ * The return value indicates availability of stashing hints support
+ * in the hardware and the PMD.
+ *
+ * @param dev
+ *   Port (ethdev) handle.
+ * @param objects
+ *   PMD sets supported bits on return.
+ *
+ * @return
+ *   -ENOTSUP if the device or the platform does not support cache stashing.
+ *   -ENOSYS  if the underlying PMD hasn't implemented cache stashing feature.
+ *   -EINVAL  on NULL values for types or hints parameters.
+ *   On return, types and hints parameters will have bits set for supported
+ *   object types and hints.
+ *   0 on success.
+ */
+typedef int (*eth_stashing_capabilities_get_t)(struct rte_eth_dev *dev,
+					     uint16_t *objects);
+
 /**
  * @internal A structure containing the functions exported by an Ethernet driver.
  */
@@ -1402,6 +1464,10 @@ struct eth_dev_ops {
 	eth_mac_addr_remove_t      mac_addr_remove; /**< Remove MAC address */
 	eth_mac_addr_add_t         mac_addr_add;  /**< Add a MAC address */
 	eth_mac_addr_set_t         mac_addr_set;  /**< Set a MAC address */
+	eth_stashing_rx_hints_set_t   stashing_rx_hints_set; /**< Set Rx cache stashing*/
+	eth_stashing_tx_hints_set_t   stashing_tx_hints_set; /**< Set Tx cache stashing*/
+	/** Get supported stashing hints*/
+	eth_stashing_capabilities_get_t stashing_capabilities_get;
 	/** Set list of multicast addresses */
 	eth_set_mc_addr_list_t     set_mc_addr_list;
 	mtu_set_t                  mtu_set;       /**< Set MTU */
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index d4197322a0..ae666c370b 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -158,6 +158,7 @@ static const struct {
 	{RTE_ETH_DEV_CAPA_RXQ_SHARE, "RXQ_SHARE"},
 	{RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP, "FLOW_RULE_KEEP"},
 	{RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP, "FLOW_SHARED_OBJECT_KEEP"},
+	{RTE_ETH_DEV_CAPA_CACHE_STASHING, "CACHE_STASHING"},
 };
 
 enum {
@@ -7419,5 +7420,153 @@ int rte_eth_dev_map_aggr_tx_affinity(uint16_t port_id, uint16_t tx_queue_id,
 	return ret;
 }
 
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_dev_validate_stashing_config, 25.07)
+int
+rte_eth_dev_validate_stashing_config(uint16_t port_id, uint16_t queue_id,
+				     uint8_t queue_direction,
+				     struct rte_eth_stashing_config *config)
+{
+	struct rte_eth_dev *dev;
+	struct rte_eth_dev_info dev_info;
+	int ret = 0;
+	uint16_t nb_queues;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+	if (!config) {
+		RTE_ETHDEV_LOG_LINE(ERR, "Invalid stashing configuration");
+		ret = -EINVAL;
+		goto out;
+	}
+
+	/*
+	 * Check for invalid objects
+	 */
+	if (!RTE_ETH_DEV_STASH_OBJECTS_VALID(config->objects)) {
+		RTE_ETHDEV_LOG_LINE(ERR, "Invalid stashing objects");
+		ret = -EINVAL;
+		goto out;
+	}
+
+	dev = &rte_eth_devices[port_id];
+
+	nb_queues = (queue_direction == RTE_ETH_DEV_RX_QUEUE) ?
+				      dev->data->nb_rx_queues :
+				      dev->data->nb_tx_queues;
+
+	if (queue_id >= nb_queues) {
+		RTE_ETHDEV_LOG_LINE(ERR, "Invalid Rx queue_id=%u", queue_id);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ret = rte_eth_dev_info_get(port_id, &dev_info);
+	if (ret < 0)
+		goto out;
+
+	if ((dev_info.dev_capa & RTE_ETH_DEV_CAPA_CACHE_STASHING) !=
+	    RTE_ETH_DEV_CAPA_CACHE_STASHING) {
+		ret = -ENOTSUP;
+		goto out;
+	}
+
+	if (*dev->dev_ops->stashing_rx_hints_set == NULL ||
+	    *dev->dev_ops->stashing_tx_hints_set == NULL) {
+		RTE_ETHDEV_LOG_LINE(ERR, "Stashing hints are not implemented "
+				    "in %s for %s", dev_info.driver_name,
+				    dev_info.device->name);
+		ret = -ENOSYS;
+	}
+
+out:
+	return ret;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_dev_stashing_rx_config_set, 25.07)
+int
+rte_eth_dev_stashing_rx_config_set(uint16_t port_id, uint16_t queue_id,
+				   struct rte_eth_stashing_config *config)
+{
+	struct rte_eth_dev *dev;
+	int ret = 0;
+
+	ret = rte_eth_dev_validate_stashing_config(port_id, queue_id,
+						   RTE_ETH_DEV_RX_QUEUE,
+						   config);
+	if (ret < 0)
+		goto out;
+
+	dev = &rte_eth_devices[port_id];
+
+	ret = eth_err(port_id,
+		      (*dev->dev_ops->stashing_rx_hints_set)(dev, queue_id,
+		      config));
+out:
+	return ret;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_dev_stashing_tx_config_set, 25.07)
+int
+rte_eth_dev_stashing_tx_config_set(uint16_t port_id, uint16_t queue_id,
+				   struct rte_eth_stashing_config *config)
+{
+	struct rte_eth_dev *dev;
+	int ret = 0;
+
+	ret = rte_eth_dev_validate_stashing_config(port_id, queue_id,
+						   RTE_ETH_DEV_TX_QUEUE,
+						   config);
+	if (ret < 0)
+		goto out;
+
+	dev = &rte_eth_devices[port_id];
+
+	ret = eth_err(port_id,
+		      (*dev->dev_ops->stashing_rx_hints_set) (dev, queue_id,
+		       config));
+out:
+	return ret;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_dev_stashing_capabilities_get, 25.07)
+int
+rte_eth_dev_stashing_capabilities_get(uint16_t port_id, uint16_t *objects)
+{
+	struct rte_eth_dev *dev;
+	struct rte_eth_dev_info dev_info;
+	int ret = 0;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+	if (!objects) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	dev = &rte_eth_devices[port_id];
+
+	ret = rte_eth_dev_info_get(port_id, &dev_info);
+	if (ret < 0)
+		goto out;
+
+	if ((dev_info.dev_capa & RTE_ETH_DEV_CAPA_CACHE_STASHING) !=
+	    RTE_ETH_DEV_CAPA_CACHE_STASHING) {
+		ret = -ENOTSUP;
+		goto out;
+	}
+
+	if (*dev->dev_ops->stashing_capabilities_get == NULL) {
+		RTE_ETHDEV_LOG_LINE(ERR, "Stashing hints are not implemented "
+				    "in %s for %s", dev_info.driver_name,
+				    dev_info.device->name);
+		ret = -ENOSYS;
+		goto out;
+	}
+	ret = eth_err(port_id,
+		      (*dev->dev_ops->stashing_capabilities_get)(dev, objects));
+out:
+	return ret;
+}
+
 RTE_EXPORT_SYMBOL(rte_eth_dev_logtype)
 RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index ea7f8c4a1a..1398f8c837 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1667,6 +1667,9 @@ struct rte_eth_conf {
 #define RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP RTE_BIT64(4)
 /**@}*/
 
+/** Device supports stashing to CPU/system caches. */
+#define RTE_ETH_DEV_CAPA_CACHE_STASHING RTE_BIT64(5)
+
 /*
  * Fallback default preferred Rx/Tx port parameters.
  * These are used if an application requests default parameters
@@ -1838,6 +1841,7 @@ struct rte_eth_dev_info {
 	struct rte_eth_dev_portconf default_txportconf;
 	/** Generic device capabilities (RTE_ETH_DEV_CAPA_). */
 	uint64_t dev_capa;
+	uint16_t stashing_capa;
 	/**
 	 * Switching information for ports on a device with a
 	 * embedded managed interconnect/switch.
@@ -6173,6 +6177,160 @@ int rte_eth_cman_config_set(uint16_t port_id, const struct rte_eth_cman_config *
 __rte_experimental
 int rte_eth_cman_config_get(uint16_t port_id, struct rte_eth_cman_config *config);
 
+
+/** Queue type is RX. */
+#define RTE_ETH_DEV_RX_QUEUE		0
+/** Queue type is TX. */
+#define RTE_ETH_DEV_TX_QUEUE		1
+
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this structure may change, or be removed, without prior notice
+ *
+ * A structure used for configuring the cache stashing hints.
+ */
+struct rte_eth_stashing_config {
+	/**
+	 * lcore_id of the processor the stashing hints are applied to.
+	 */
+	uint32_t	lcore_id;
+	/**
+	 * Zero based cache level relative to the CPU.
+	 * E.g. l1d = 0, l2d = 1,...
+	 */
+	uint32_t	cache_level;
+	/**
+	 * Object types the configuration is applied to
+	 */
+	uint16_t	objects;
+	/**
+	 * The offset if RTE_ETH_DEV_STASH_OBJECT_OFFSET bit is set
+	 *  in objects
+	 */
+	int		offset;
+};
+
+/**@{@name Stashable Rx/Tx queue object types supported by the ethernet device
+ *@see rte_eth_dev_stashing_capabilities_get
+ *@see rte_eth_dev_stashing_rx_config_set
+ *@see rte_eth_dev_stashing_tx_config_set
+ */
+
+/**
+ * Apply stashing hint to data at a given offset from the start of a
+ * received packet.
+ */
+#define RTE_ETH_DEV_STASH_OBJECT_OFFSET		0x0001
+
+/** Apply stashing hint to an rx descriptor. */
+#define RTE_ETH_DEV_STASH_OBJECT_DESC		0x0002
+
+/** Apply stashing hint to a header of a received packet. */
+#define RTE_ETH_DEV_STASH_OBJECT_HEADER		0x0004
+
+/** Apply stashing hint to a payload of a received packet. */
+#define RTE_ETH_DEV_STASH_OBJECT_PAYLOAD	0x0008
+
+#define __RTE_ETH_DEV_STASH_OBJECT_MASK		0x000f
+/**@}*/
+
+#define RTE_ETH_DEV_STASH_OBJECTS_VALID(t)				\
+	((!((t) & (~__RTE_ETH_DEV_STASH_OBJECT_MASK))) && (t))
+
+/**
+ *
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * @internal
+ * Helper function to validate stashing hints configuration.
+ */
+__rte_experimental
+int rte_eth_dev_validate_stashing_config(uint16_t port_id, uint16_t queue_id,
+					 uint8_t queue_direction,
+					 struct rte_eth_stashing_config *config);
+
+/**
+ *
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Provide cache stashing hints for improved memory access latencies for
+ * packets received by the NIC.
+ * This feature is available only in supported NICs and platforms.
+ *
+ * @param port_id
+ *  The port identifier of the Ethernet device.
+ * @param queue_id
+ *  The index of the receive queue to which hints are applied.
+ * @param config
+ *  Stashing configuration.
+ * @return
+ *  - (-ENODEV) on incorrect port_ids.
+ *  - (-EINVAL) if both RX and TX object types used in conjuection in objects
+ *  parameter.
+ *  - (-EINVAL) on invalid queue_id.
+ *  - (-ENOTSUP) if RTE_ETH_DEV_CAPA_CACHE_STASHING capability is unavailable.
+ *  - (-ENOSYS) if PMD does not implement cache stashing hints.
+ *  - (0) on Success.
+ */
+__rte_experimental
+int rte_eth_dev_stashing_rx_config_set(uint16_t port_id, uint16_t queue_id,
+				   struct rte_eth_stashing_config *config);
+
+/**
+ *
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Configure cache stashing for improved memory access latencies for Tx
+ * queue completion descriptors being sent to host system by the NIC.
+ * This feature is available only in supported NICs and platforms.
+ *
+ * @param port_id
+ *  The port identifier of the Ethernet device.
+ * @param queue_id
+ *  The index of the receive queue to which hints are applied.
+ * @param config
+ *  Stashing configuration.
+ * @return
+ *  - (-ENODEV) on incorrect port_ids.
+ *  - (-EINVAL) if both RX and TX object types are used in conjuection in objects
+ *  parameter.
+ *  - (-EINVAL) if hints are incompatible with TX queues.
+ *  - (-EINVAL) on invalid queue_id.
+ *  - (-ENOTSUP) if RTE_ETH_DEV_CAPA_CACHE_STASHING capability is unavailable.
+ *  - (-ENOSYS) if PMD does not implement cache stashing hints.
+ *  - (0) on Success.
+ */
+__rte_experimental
+int rte_eth_dev_stashing_tx_config_set(uint16_t port_id, uint16_t queue_id,
+				   struct rte_eth_stashing_config *config);
+
+/**
+ *
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Discover cache stashing objects supported in the ethernet device.
+ *
+ * @param port_id
+ *  The port identifier of the Ethernet device.
+ * @param objects
+ *  Supported objects vector set by the ethernet device.
+ * @return
+ *  On return types and hints parameters will have bits set for supported
+ *  object types.
+ *  - (-ENOTSUP) if the device or the platform does not support cache stashing.
+ *  - (-ENOSYS)  if the underlying PMD hasn't implemented cache stashing
+ *  feature.
+ *  - (-EINVAL)  on NULL values for types or hints parameters.
+ *  - (0) on success.
+ */
+__rte_experimental
+int rte_eth_dev_stashing_capabilities_get(uint16_t port_id, uint16_t *objects);
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v5 4/4] net/i40e: enable TPH in i40e
  2025-06-02 22:38 ` [PATCH v5 0/4] An API for Cache Stashing with TPH Wathsala Vithanage
                     ` (2 preceding siblings ...)
  2025-06-02 22:38   ` [PATCH v5 3/4] ethdev: introduce the cache stashing hints API Wathsala Vithanage
@ 2025-06-02 22:38   ` Wathsala Vithanage
  2025-06-04 16:51   ` [PATCH v5 0/4] An API for Cache Stashing with TPH Stephen Hemminger
  2026-01-08  0:30   ` fengchengwen
  5 siblings, 0 replies; 33+ messages in thread
From: Wathsala Vithanage @ 2025-06-02 22:38 UTC (permalink / raw)
  To: Ian Stokes, Bruce Richardson; +Cc: dev, nd, Wathsala Vithanage

Adds stashing_capabilities_get, stashing_{rx,tx}_hinst_set functions
to eth_dev_ops of the i40e driver. Enables TPH in device-specific
mode, so that steering-tags are set in LAN queue contexts.

Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
---
 drivers/net/intel/i40e/i40e_ethdev.c | 128 +++++++++++++++++++++++++++
 1 file changed, 128 insertions(+)

diff --git a/drivers/net/intel/i40e/i40e_ethdev.c b/drivers/net/intel/i40e/i40e_ethdev.c
index 90eba3419f..326e323dfb 100644
--- a/drivers/net/intel/i40e/i40e_ethdev.c
+++ b/drivers/net/intel/i40e/i40e_ethdev.c
@@ -18,6 +18,7 @@
 #include <rte_pci.h>
 #include <bus_pci_driver.h>
 #include <rte_ether.h>
+#include <rte_ethdev.h>
 #include <ethdev_driver.h>
 #include <ethdev_pci.h>
 #include <rte_memzone.h>
@@ -410,6 +411,11 @@ static int i40e_fec_get_capability(struct rte_eth_dev *dev,
 	struct rte_eth_fec_capa *speed_fec_capa, unsigned int num);
 static int i40e_fec_get(struct rte_eth_dev *dev, uint32_t *fec_capa);
 static int i40e_fec_set(struct rte_eth_dev *dev, uint32_t fec_capa);
+static int i40e_stashing_cap_get(struct rte_eth_dev *dev, uint16_t *objects);
+static int i40e_stashing_rx_hints_set(struct rte_eth_dev *dev, uint16_t queue_id,
+				     struct rte_eth_stashing_config *config);
+static int i40e_stashing_tx_hints_set(struct rte_eth_dev *dev, uint16_t queue_id,
+				     struct rte_eth_stashing_config *config);
 
 static const char *const valid_keys[] = {
 	ETH_I40E_FLOATING_VEB_ARG,
@@ -527,6 +533,9 @@ static const struct eth_dev_ops i40e_eth_dev_ops = {
 	.fec_get_capability           = i40e_fec_get_capability,
 	.fec_get                      = i40e_fec_get,
 	.fec_set                      = i40e_fec_set,
+	.stashing_capabilities_get    = i40e_stashing_cap_get,
+	.stashing_rx_hints_set        = i40e_stashing_rx_hints_set,
+	.stashing_tx_hints_set        = i40e_stashing_tx_hints_set,
 };
 
 /* store statistics names and its offset in stats structure */
@@ -3878,6 +3887,7 @@ i40e_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP |
 		RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP;
 	dev_info->dev_capa &= ~RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP;
+	dev_info->dev_capa |= RTE_ETH_DEV_CAPA_CACHE_STASHING;
 
 	dev_info->hash_key_size = (I40E_PFQF_HKEY_MAX_INDEX + 1) *
 						sizeof(uint32_t);
@@ -12544,6 +12554,124 @@ i40e_fec_set(struct rte_eth_dev *dev, uint32_t fec_capa)
 	return 0;
 }
 
+static int
+i40e_stashing_cap_get(struct rte_eth_dev *dev, uint16_t *objects)
+{
+	RTE_SET_USED(dev);
+
+	*objects = RTE_ETH_DEV_STASH_OBJECT_DESC |
+		   RTE_ETH_DEV_STASH_OBJECT_HEADER |
+		   RTE_ETH_DEV_STASH_OBJECT_PAYLOAD;
+
+	return 0;
+}
+
+static int
+i40e_stashing_hints_set(struct rte_eth_dev *dev, uint16_t queue_id,
+			struct rte_eth_stashing_config *config,
+			enum i40e_hmc_lan_rsrc_type hmc_type)
+{
+	struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	struct rte_pci_device *pci_dev = RTE_ETH_DEV_TO_PCI(dev);
+	int err = I40E_SUCCESS;
+	struct i40e_hmc_obj_rxq rxq;
+	struct i40e_hmc_obj_txq txq;
+
+	if (!config) {
+		err = -EINVAL;
+		goto out;
+	}
+
+	struct rte_tph_info tph = {
+		.cpu_id = config->lcore_id,
+		.cache_level = config->cache_level,
+		.flags = RTE_PCI_TPH_MEM_TYPE_VMEM | RTE_PCI_TPH_HINT_BIDIR,
+		.index = 0,
+	};
+
+	if (!pci_dev->tph_enabled)
+		err = rte_pci_tph_enable(pci_dev, RTE_PCI_TPH_CAP_ST_DS);
+	if (err) {
+		PMD_DRV_LOG(ERR, "Failed enabling TPH");
+		goto out;
+	}
+
+	err = rte_pci_tph_st_get(pci_dev, &tph, 1);
+	if (err) {
+		PMD_DRV_LOG(ERR, "Failed ST read for lcore: %u and cache-level: %u",
+			    tph.cpu_id, tph.cache_level);
+		goto out;
+	}
+
+	switch (hmc_type) {
+	case I40E_HMC_LAN_RX:
+		err = i40e_get_lan_rx_queue_context(hw, queue_id, &rxq);
+		if (err != I40E_SUCCESS) {
+			PMD_DRV_LOG(ERR, "Failed to get LAN RX queue context");
+			goto out;
+		}
+
+		rxq.cpuid = tph.st;
+
+		if (config->objects & RTE_ETH_DEV_STASH_OBJECT_DESC) {
+			rxq.tphrdesc_ena = 1;
+			rxq.tphwdesc_ena = 1;
+		}
+
+		if (config->objects & RTE_ETH_DEV_STASH_OBJECT_PAYLOAD)
+			rxq.tphdata_ena = 1;
+
+		if (config->objects & RTE_ETH_DEV_STASH_OBJECT_HEADER)
+			rxq.tphhead_ena = 1;
+
+		err = i40e_set_lan_rx_queue_context(hw, queue_id, &rxq);
+		if (err != I40E_SUCCESS)
+			PMD_DRV_LOG(ERR, "Failed to set LAN RX queue context");
+		break;
+	case I40E_HMC_LAN_TX:
+		err = i40e_get_lan_tx_queue_context(hw, queue_id, &txq);
+		if (err != I40E_SUCCESS) {
+			PMD_DRV_LOG(ERR, "Failed to get LAN TX queue context");
+			goto out;
+		}
+
+		txq.cpuid = tph.st;
+
+		if (config->objects & RTE_ETH_DEV_STASH_OBJECT_DESC) {
+			txq.tphrdesc_ena = 1;
+			txq.tphwdesc_ena = 1;
+		}
+
+		if (config->objects & (RTE_ETH_DEV_STASH_OBJECT_PAYLOAD |
+			       RTE_ETH_DEV_STASH_OBJECT_HEADER))
+			txq.tphrpacket_ena = 1;
+
+		err = i40e_set_lan_tx_queue_context(hw, queue_id, &txq);
+		if (err != I40E_SUCCESS)
+			PMD_DRV_LOG(ERR, "Failed to set LAN TX queue context");
+		break;
+	default:
+		err = -EINVAL;
+	}
+
+out:
+	return err;
+}
+
+static int
+i40e_stashing_rx_hints_set(struct rte_eth_dev *dev, uint16_t queue_id,
+			  struct rte_eth_stashing_config *config)
+{
+	return i40e_stashing_hints_set(dev, queue_id, config, I40E_HMC_LAN_RX);
+}
+
+static int
+i40e_stashing_tx_hints_set(struct rte_eth_dev *dev, uint16_t queue_id,
+			  struct rte_eth_stashing_config *config)
+{
+	return i40e_stashing_hints_set(dev, queue_id, config, I40E_HMC_LAN_TX);
+}
+
 RTE_LOG_REGISTER_SUFFIX(i40e_logtype_init, init, NOTICE);
 RTE_LOG_REGISTER_SUFFIX(i40e_logtype_driver, driver, NOTICE);
 #ifdef RTE_ETHDEV_DEBUG_RX
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* RE: [PATCH v5 1/4] pci: add non-merged Linux uAPI changes
  2025-06-02 22:38   ` [PATCH v5 1/4] pci: add non-merged Linux uAPI changes Wathsala Vithanage
@ 2025-06-02 23:11     ` Wathsala Wathawana Vithanage
  2025-06-02 23:16       ` Wathsala Wathawana Vithanage
  2025-06-04 20:43     ` Stephen Hemminger
  1 sibling, 1 reply; 33+ messages in thread
From: Wathsala Wathawana Vithanage @ 2025-06-02 23:11 UTC (permalink / raw)
  To: Wathsala Wathawana Vithanage, Chenbo Xia, Nipun Gupta,
	Maxime Coquelin
  Cc: dev@dpdk.org, nd, Dhruv Tripathi, thomas@monjalon.net, nd

Hi Maxim,

Here is the link to the kernel VFIO patch you have asked earlier (v4).
https://lore.kernel.org/kvm/20250221224638.1836909-1-wathsala.vithanage@arm.com/T/#me73cf9b9c87da97d7d9461dfb97863b78ca1755b

Thanks

--wathsala

> -----Original Message-----
> From: Wathsala Vithanage <wathsala.vithanage@arm.com>
> Sent: Monday, June 2, 2025 5:38 PM
> To: Chenbo Xia <chenbox@nvidia.com>; Nipun Gupta <nipun.gupta@amd.com>;
> Maxime Coquelin <maxime.coquelin@redhat.com>
> Cc: dev@dpdk.org; nd <nd@arm.com>; Wathsala Wathawana Vithanage
> <wathsala.vithanage@arm.com>; Dhruv Tripathi <Dhruv.Tripathi@arm.com>
> Subject: [PATCH v5 1/4] pci: add non-merged Linux uAPI changes
> 
> This commit is a hack to prevent build failures the next commit in this patch series
> causes due to missing vfio uapi definitions.
> This commit should NEVER BE MERGED.
> Next commit in this patch series depends on additions to vfio uapi that enable
> TPH icotl in the vfio-pci driver in the Linux kernel.
> These additions have not yet been merged into the upstream kernel.
> 
> Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
> Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
> ---
>  drivers/bus/pci/linux/pci_init.h   |   1 +
>  kernel/linux/uapi/linux/vfio_tph.h | 102 +++++++++++++++++++++++++++++
>  2 files changed, 103 insertions(+)
>  create mode 100644 kernel/linux/uapi/linux/vfio_tph.h
> 
> diff --git a/drivers/bus/pci/linux/pci_init.h b/drivers/bus/pci/linux/pci_init.h
> index a4d37c0d0a..25b901f460 100644
> --- a/drivers/bus/pci/linux/pci_init.h
> +++ b/drivers/bus/pci/linux/pci_init.h
> @@ -6,6 +6,7 @@
>  #define EAL_PCI_INIT_H_
> 
>  #include <rte_vfio.h>
> +#include <uapi/linux/vfio_tph.h>
> 
>  #include "private.h"
> 
> diff --git a/kernel/linux/uapi/linux/vfio_tph.h
> b/kernel/linux/uapi/linux/vfio_tph.h
> new file mode 100644
> index 0000000000..9336c2e5af
> --- /dev/null
> +++ b/kernel/linux/uapi/linux/vfio_tph.h
> @@ -0,0 +1,102 @@
> +
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +/*
> + * VFIO API definition
> + *
> + * WARNING: CONTENTS OF THIS HEADER NEEDS TO BE MERGED INTO
> KERNEL'S
> + * uapi/linux/vifo.h IN A FUTURE KERNEL RELEASE. UNTIL THEN IT'S TACKED
> + * ON TO DPDK'S kernel/linux/uapi DIRECTORY TO PREVENT BUILD FAILURES.
> + *
> + * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
> + *     Author: Alex Williamson <alex.williamson@redhat.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#ifndef _UAPIVFIO_TPH_H
> +#define _UAPIVFIO_TPH_H
> +
> +/**
> + * VFIO_DEVICE_PCI_TPH	- _IO(VFIO_TYPE, VFIO_BASE + 22)
> + *
> + * This command is used to control PCIe TLP Processing Hints (TPH)
> + * capability in a PCIe device.
> + * It supports following operations on a PCIe device with respect to
> +TPH
> + * capability.
> + *
> + * - Enabling/disabling TPH capability in a PCIe device.
> + *
> + *   Setting VFIO_DEVICE_TPH_ENABLE flag enables TPH in no-steering-tag,
> + *   interrupt-vector, or device-specific mode defined in the PCIe specficiation
> + *   when feature flags TPH_ST_NS_MODE, TPH_ST_IV_MODE, and
> TPH_ST_DS_MODE are
> + *   set respectively. TPH_ST_xx_MODE macros are defined in
> + *   uapi/linux/pci_regs.h.
> + *
> + *   VFIO_DEVICE_TPH_DISABLE disables PCIe TPH on the device.
> + *
> + * - Writing STs to MSI-X or ST table in a PCIe device.
> + *
> + *   VFIO_DEVICE_TPH_SET_ST flag set steering tags on a device at an index in
> + *   MSI-X or ST-table depending on the VFIO_TPH_ST_x_MODE flag used and
> device
> + *   capabilities. The caller can set one or more steering tags by passing an
> + *   array of vfio_pci_tph_entry objects containing cpu_id, cache_level, and
> + *   MSI-X/ST-table index. The caller can also set the intended memory type and
> + *   the processing hint by setting VFIO_TPH_MEM_TYPE_x and
> VFIO_TPH_HINT_x
> + *   flags, respectively.
> + *
> + * - Reading Steering Tags (ST) from the host platform.
> + *
> + *   VFIO_DEVICE_TPH_GET_ST flags returns steering tags to the caller. Caller
> + *   can request one or more steering tags by passing an array of
> + *   vfio_pci_tph_entry objects. Steering Tag for each request is returned via
> + *   the st field in vfio_pci_tph_entry.
> + */
> +struct vfio_pci_tph_entry {
> +	/* in */
> +	__u32 cpu_id;			/* CPU logical ID */
> +	__u32 cache_level;		/* Cache level. L1 D= 0, L2D = 2, ...*/
> +	__u8  flags;
> +#define VFIO_TPH_MEM_TYPE_MASK		0x1
> +#define VFIO_TPH_MEM_TYPE_SHIFT		0
> +#define VFIO_TPH_MEM_TYPE_VMEM		0   /* Request volatile memory
> ST */
> +#define VFIO_TPH_MEM_TYPE_PMEM		1   /* Request persistent
> memory ST */
> +
> +#define VFIO_TPH_HINT_MASK		0x3
> +#define VFIO_TPH_HINT_SHIFT		1
> +#define VFIO_TPH_HINT_BIDIR		0
> +#define VFIO_TPH_HINT_REQSTR		(1 << VFIO_TPH_HINT_SHIFT)
> +#define VFIO_TPH_HINT_TARGET		(2 << VFIO_TPH_HINT_SHIFT)
> +#define VFIO_TPH_HINT_TARGET_PRIO	(3 << VFIO_TPH_HINT_SHIFT)
> +	__u8  pad0;
> +	__u16 index;			/* MSI-X/ST-table index to set ST */
> +	/* out */
> +	__u16 st;			/* Steering-Tag */
> +	__u8  ph_ignore;		/* Platform ignored the Processing */
> +	__u8  pad1;
> +};
> +
> +struct vfio_pci_tph {
> +	__u32 argsz;			/* Size of vfio_pci_tph and info[] */
> +	__u32 flags;
> +#define VFIO_TPH_ST_MODE_MASK		0x7
> +
> +#define VFIO_DEVICE_TPH_OP_SHIFT	3
> +#define VFIO_DEVICE_TPH_OP_MASK		(0x7 <<
> VFIO_DEVICE_TPH_OP_SHIFT)
> +/* Enable TPH on device */
> +#define VFIO_DEVICE_TPH_ENABLE		0
> +/* Disable TPH on device */
> +#define VFIO_DEVICE_TPH_DISABLE		(1 <<
> VFIO_DEVICE_TPH_OP_SHIFT)
> +/* Get steering-tags */
> +#define VFIO_DEVICE_TPH_GET_ST		(2 <<
> VFIO_DEVICE_TPH_OP_SHIFT)
> +/* Set steering-tags */
> +#define VFIO_DEVICE_TPH_SET_ST		(4 <<
> VFIO_DEVICE_TPH_OP_SHIFT)
> +	__u32 count;			/* Number of entries in ents[] */
> +	struct vfio_pci_tph_entry ents[];
> +#define VFIO_TPH_INFO_MAX	2048	/* Max entries in ents[] */
> +};
> +
> +#define VFIO_DEVICE_PCI_TPH	_IO(VFIO_TYPE, VFIO_BASE + 22)
> +
> +#endif /* _UAPIVFIO_TPH_H */
> --
> 2.43.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH v5 1/4] pci: add non-merged Linux uAPI changes
  2025-06-02 23:11     ` Wathsala Wathawana Vithanage
@ 2025-06-02 23:16       ` Wathsala Wathawana Vithanage
  0 siblings, 0 replies; 33+ messages in thread
From: Wathsala Wathawana Vithanage @ 2025-06-02 23:16 UTC (permalink / raw)
  To: Chenbo Xia, Nipun Gupta, Maxime Coquelin
  Cc: dev@dpdk.org, nd, Dhruv Tripathi, thomas@monjalon.net, nd, nd

 
> Hi Maxim,
> 
> Here is the link to the kernel VFIO patch you have asked earlier (v4).
> https://lore.kernel.org/kvm/20250221224638.1836909-1-
> wathsala.vithanage@arm.com/T/#me73cf9b9c87da97d7d9461dfb97863b78ca17
> 55b
> 

Sorry, here is the correct link to V2 of the VFIO patch
https://lore.kernel.org/kvm/20250221224638.1836909-1-wathsala.vithanage@arm.com/T/#me26a385f781002dbe330ce03a63cb20884200e91

--wathsala


^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH v5 2/4] bus/pci: introduce the PCIe TLP Processing Hints API
  2025-06-02 22:38   ` [PATCH v5 2/4] bus/pci: introduce the PCIe TLP Processing Hints API Wathsala Vithanage
@ 2025-06-03  8:11     ` Morten Brørup
  2025-06-04 16:54     ` Bruce Richardson
  2025-06-05 10:30     ` Bruce Richardson
  2 siblings, 0 replies; 33+ messages in thread
From: Morten Brørup @ 2025-06-03  8:11 UTC (permalink / raw)
  To: Wathsala Vithanage, Chenbo Xia, Nipun Gupta, Anatoly Burakov,
	Gaetan Rivet
  Cc: dev, nd, Honnappa Nagarahalli, Dhruv Tripathi

> From: Wathsala Vithanage [mailto:wathsala.vithanage@arm.com]
> Sent: Tuesday, 3 June 2025 00.38

Some nitpicking inline below.

[...]

> diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
> index 5e2e09d5a4..dff750c4d6 100644
> --- a/drivers/bus/pci/bsd/pci.c
> +++ b/drivers/bus/pci/bsd/pci.c
> @@ -650,3 +650,46 @@ rte_pci_ioport_unmap(struct rte_pci_ioport *p)
> 
>  	return ret;
>  }
> +
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_enable, 25.07)
> +int
> +rte_pci_tph_enable(struct rte_pci_device *dev, int mode)
> +{
> +	RTE_SET_USED(dev);
> +	RTE_SET_USED(mode);
> +	/* This feature is not yet implemented for BSD */
> +	return -1;

Either set rte_errno and return -1, or return -ENOTSUP.
If one of these two design patterns are already used in a module, please use the same design pattern for new functions.
Applies to all dummy functions, BSD and Windows.

[...]

> diff --git a/drivers/bus/pci/bus_pci_driver.h
> b/drivers/bus/pci/bus_pci_driver.h
> index 2cc1119072..b1c2829fc1 100644
> --- a/drivers/bus/pci/bus_pci_driver.h
> +++ b/drivers/bus/pci/bus_pci_driver.h
> @@ -46,6 +46,7 @@ struct rte_pci_device {
>  	char *bus_info;                     /**< PCI bus specific info */
>  	struct rte_intr_handle *vfio_req_intr_handle;
>  				/**< Handler of VFIO request interrupt */
> +	uint8_t tph_enabled;                /**< TPH enabled on this
> device */
>  };
> 
>  /**
> @@ -194,6 +195,57 @@ struct rte_pci_ioport {
>  	uint64_t len; /* only filled for memory mapped ports */
>  };
> 
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this structure may change, or be removed, without
> prior
> + * notice
> + *
> + * This structure is passed into the TPH Steering-Tag set or get
> function as an
> + * argument by the caller. Return values are set in the same structure
> in st and
> + * ph_ignore fields by the calee.
> + *
> + * Refer to PCI-SIG ECN "Revised _DSM for Cache Locality TPH Features"
> for
> + * details.
> + */
> +struct rte_tph_info {
> +	/* Input */
> +	uint32_t cpu_id;	/*Logical CPU id*/
> +	uint32_t cache_level;	/*Cache level relative to CPU.
> l1d=0,l2d=1,...*/
> +	uint8_t flags;		/*Memory type, procesisng hint etc.*/
> +	uint16_t index;		/*Index in vector table to store the ST*/
> +
> +	/* Output */

I don't think these are Output fields when used in the set() function.
Please update the documentation of this structure accordingly.

> +	uint16_t st;		/*Steering tag returned by the platform*/
> +	uint8_t ph_ignore;	/*Platform ignores PH for the returned
> ST*/
> +};
> +
> +#define RTE_PCI_TPH_MEM_TYPE_MASK		0x1
> +#define RTE_PCI_TPH_MEM_TYPE_SHIFT		0
> +/** Request volatile memory ST */
> +#define RTE_PCI_TPH_MEM_TYPE_VMEM		0
> +/** Request persistent memory ST */
> +#define RTE_PCI_TPH_MEM_TYPE_PMEM		1
> +
> +/** TLP Processing Hints - PCIe 6.0 specification section 2.2.7.1.1 */
> +#define RTE_PCI_TPH_HINT_MASK		0x3
> +#define RTE_PCI_TPH_HINT_SHIFT		1
> +/** Host and device access data equally */
> +#define RTE_PCI_TPH_HINT_BIDIR		0
> +/** Device accesses data more frequently */
> +#define RTE_PCI_TPH_HINT_REQSTR		(1 << RTE_PCI_TPH_HINT_SHIFT)
> +/** Host access data more frequently */
> +#define RTE_PCI_TPH_HINT_TARGET		(2 << RTE_PCI_TPH_HINT_SHIFT)
> +/** Host access data more frequently with a high temporal locality */
> +#define RTE_PCI_TPH_HINT_TARGET_PRIO	(3 << RTE_PCI_TPH_HINT_SHIFT)
> +
> +#define RTE_PCI_TPH_ST_MODE_MASK   0x3
> +/** TPH no ST mode */
> +#define RTE_PCI_TPH_ST_NS_MODE	   0
> +/** TPH interrupt vector mode */
> +#define RTE_PCI_TPH_ST_IV_MODE	   1
> +/** TPH device specific mode */
> +#define RTE_PCI_TPH_ST_DS_MODE	   2
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
> index c20d159218..b5a8ba0a86 100644
> --- a/drivers/bus/pci/linux/pci.c
> +++ b/drivers/bus/pci/linux/pci.c
> @@ -814,3 +814,103 @@ rte_pci_ioport_unmap(struct rte_pci_ioport *p)
> 
>  	return ret;
>  }
> +
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_enable, 25.07)
> +int
> +rte_pci_tph_enable(struct rte_pci_device *dev, int mode)
> +{
> +	int ret = 0;
> +
> +	switch (dev->kdrv) {
> +#ifdef VFIO_PRESENT
> +	case RTE_PCI_KDRV_VFIO:
> +		if (pci_vfio_is_enabled())
> +			ret = pci_vfio_tph_enable(dev, mode);

Is it correct to return 0 when pci_vfio_is_enabled() returns false?

> +		break;
> +#endif
> +	case RTE_PCI_KDRV_IGB_UIO:

Please add "/* fall through */" or similar compiler hint.
Or even better, define a new __rte_fallthrough macro as __attribute__((fallthrough)) in rte_common.h.
Refer to [1].

[1]: https://gcc.gnu.org/onlinedocs/gcc/Statement-Attributes.html

> +	case RTE_PCI_KDRV_UIO_GENERIC:
> +	default:
> +		ret = -ENOTSUP;
> +		break;
> +	}
> +
> +	if (!ret)

This looks wrong. "ret" is not Boolean, so please use "ret == 0" instead of "!ret".
Applies elsewhere in this patch too, e.g. comparing empty "count" in pci_vfio_tph_st_op().

> +		dev->tph_enabled = 1;
> +
> +	return ret;
> +}
> +

The above comments apply to all four functions.

[...]

> diff --git a/drivers/bus/pci/linux/pci_vfio.c
> b/drivers/bus/pci/linux/pci_vfio.c
> index 5317170231..bdbeb38658 100644
> --- a/drivers/bus/pci/linux/pci_vfio.c
> +++ b/drivers/bus/pci/linux/pci_vfio.c
> @@ -12,6 +12,7 @@
>  #include <stdbool.h>
> 
>  #include <rte_log.h>
> +#include <eal_export.h>
>  #include <rte_pci.h>
>  #include <rte_bus_pci.h>
>  #include <rte_eal_paging.h>
> @@ -1316,6 +1317,175 @@ pci_vfio_mmio_write(const struct rte_pci_device
> *dev, int bar,
>  	return pwrite(fd, buf, len, offset + offs);
>  }
> 
> +static int
> +pci_vfio_tph_ioctl(const struct rte_pci_device *dev, struct
> vfio_pci_tph *pci_tph)
> +{
> +	const struct rte_intr_handle *intr_handle = dev->intr_handle;
> +	int vfio_dev_fd = 0, ret = 0;
> +
> +	vfio_dev_fd = rte_intr_dev_fd_get(intr_handle);
> +	if (vfio_dev_fd < 0) {
> +		ret = -EINVAL;

This should be ret = rte_errno.
Or even better: Other code in /drivers/bus/pci/linux/pci_vfio.c follows the design pattern of setting rte_errno and returning -1 on error. Please use the same design pattern.
Applies to all four functions.

> +		goto out;
> +	}
> +
> +	ret = ioctl(vfio_dev_fd, VFIO_DEVICE_PCI_TPH, pci_tph);
> +out:
> +	return ret;
> +}
> +
> +static int
> +pci_vfio_tph_st_op(const struct rte_pci_device *dev,
> +		    struct rte_tph_info *info, size_t count,
> +		    enum rte_pci_st_op op)
> +{
> +	int ret = 0;
> +	size_t argsz = 0, i;
> +	struct vfio_pci_tph *pci_tph = NULL;
> +	uint8_t mem_type = 0, hint = 0;
> +
> +	if (!count) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	argsz = sizeof(struct vfio_pci_tph) +
> +		count * sizeof(struct vfio_pci_tph_entry);
> +
> +	pci_tph = rte_zmalloc(NULL, argsz, 0);
> +	if (!pci_tph) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	pci_tph->argsz = argsz;
> +	pci_tph->count = count;
> +
> +	switch (op) {
> +	case RTE_PCI_TPH_ST_GET:
> +		pci_tph->flags = VFIO_DEVICE_TPH_GET_ST;
> +		break;
> +	case RTE_PCI_TPH_ST_SET:
> +		pci_tph->flags = VFIO_DEVICE_TPH_SET_ST;
> +		break;
> +	default:
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	for (i = 0; i < count; i++) {
> +		pci_tph->ents[i].cpu_id = info[i].cpu_id;
> +		pci_tph->ents[i].cache_level = info[i].cache_level;
> +
> +		mem_type = info[i].flags & RTE_PCI_TPH_MEM_TYPE_MASK;
> +		switch (mem_type) {
> +		case RTE_PCI_TPH_MEM_TYPE_VMEM:
> +			pci_tph->ents[i].flags |= VFIO_TPH_MEM_TYPE_VMEM;
> +			break;
> +		case RTE_PCI_TPH_MEM_TYPE_PMEM:
> +			pci_tph->ents[i].flags |= VFIO_TPH_MEM_TYPE_PMEM;
> +			break;
> +		default:
> +			ret = -EINVAL;
> +			goto out;
> +		}
> +
> +		hint = info[i].flags & RTE_PCI_TPH_HINT_MASK;
> +		switch (hint) {
> +		case RTE_PCI_TPH_HINT_BIDIR:
> +			pci_tph->ents[i].flags |= VFIO_TPH_HINT_BIDIR;
> +			break;
> +		case RTE_PCI_TPH_HINT_REQSTR:
> +			pci_tph->ents[i].flags |= VFIO_TPH_HINT_REQSTR;
> +			break;
> +		case RTE_PCI_TPH_HINT_TARGET:
> +			pci_tph->ents[i].flags |= VFIO_TPH_HINT_TARGET;
> +			break;
> +		case RTE_PCI_TPH_HINT_TARGET_PRIO:
> +			pci_tph->ents[i].flags |= VFIO_TPH_HINT_TARGET_PRIO;
> +			break;
> +		default:
> +			ret = -EINVAL;
> +			goto out;
> +		}
> +
> +		if (op == RTE_PCI_TPH_ST_SET)
> +			pci_tph->ents[i].index = info[i].index;
> +	}
> +
> +	ret = pci_vfio_tph_ioctl(dev, pci_tph);
> +	if (ret)
> +		goto out;
> +
> +	/*
> +	 * Kernel returns steering-tag and ph-ignore bits for
> +	 * RTE_PCI_TPH_ST_SET too, therefore copy output for
> +	 * both RTE_PCI_TPH_ST_SET and RTE_PCI_TPH_ST_GET
> +	 * cases.
> +	 */
> +	for (i = 0; i < count; i++) {
> +		info[i].st = pci_tph->ents[i].st;
> +		info[i].ph_ignore = pci_tph->ents[i].ph_ignore;
> +	}
> +
> +out:
> +	if (pci_tph)
> +		rte_free(pci_tph);
> +	return ret;
> +}
> +
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(pci_vfio_tph_enable, 25.07)
> +int
> +pci_vfio_tph_enable(const struct rte_pci_device *dev, int mode)
> +{
> +	int ret;
> +
> +	if (!(mode ^ (mode & VFIO_TPH_ST_MODE_MASK))) {
> +		ret = -EINVAL;
> +		goto out;
> +	} else
> +		mode &= VFIO_TPH_ST_MODE_MASK;
> +
> +	struct vfio_pci_tph pci_tph = {
> +		.argsz = sizeof(struct vfio_pci_tph),
> +		.flags = VFIO_DEVICE_TPH_ENABLE | mode,
> +		.count = 0
> +	};
> +
> +	ret = pci_vfio_tph_ioctl(dev, &pci_tph);
> +out:
> +	return ret;
> +}
> +
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(pci_vfio_tph_disable, 25.07)
> +int
> +pci_vfio_tph_disable(const struct rte_pci_device *dev)
> +{
> +	struct vfio_pci_tph pci_tph = {
> +		.argsz = sizeof(struct vfio_pci_tph),
> +		.flags = VFIO_DEVICE_TPH_DISABLE,
> +		.count = 0
> +	};
> +
> +	return pci_vfio_tph_ioctl(dev, &pci_tph);
> +}
> +
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(pci_vfio_tph_st_get, 25.07)
> +int
> +pci_vfio_tph_st_get(const struct rte_pci_device *dev,
> +		    struct rte_tph_info *info, size_t count)
> +{
> +	return pci_vfio_tph_st_op(dev, info, count, RTE_PCI_TPH_ST_GET);
> +}
> +
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(pci_vfio_tph_st_set, 25.07)
> +int
> +pci_vfio_tph_st_set(const struct rte_pci_device *dev,
> +		    struct rte_tph_info *info, size_t count)
> +{
> +	return pci_vfio_tph_st_op(dev, info, count, RTE_PCI_TPH_ST_SET);
> +}
> +
>  int
>  pci_vfio_is_enabled(void)
>  {
> diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
> index 38109844b9..d2ec370320 100644
> --- a/drivers/bus/pci/private.h
> +++ b/drivers/bus/pci/private.h
> @@ -335,4 +335,12 @@ rte_pci_dev_iterate(const void *start,
>  int
>  rte_pci_devargs_parse(struct rte_devargs *da);
> 
> +/*
> + * TPH Steering-Tag operation types.
> + */
> +enum rte_pci_st_op {
> +	RTE_PCI_TPH_ST_SET, /* Set TPH Steering - Tags */
> +	RTE_PCI_TPH_ST_GET  /* Get TPH Steering - Tags */
> +};
> +
>  #endif /* _PCI_PRIVATE_H_ */
> diff --git a/drivers/bus/pci/rte_bus_pci.h
> b/drivers/bus/pci/rte_bus_pci.h
> index 19a7b15b99..e4d4780f54 100644
> --- a/drivers/bus/pci/rte_bus_pci.h
> +++ b/drivers/bus/pci/rte_bus_pci.h
> @@ -31,6 +31,7 @@ extern "C" {
>  struct rte_pci_device;
>  struct rte_pci_driver;
>  struct rte_pci_ioport;
> +struct rte_tph_info;
> 
>  struct rte_devargs;
> 
> @@ -312,6 +313,72 @@ void rte_pci_ioport_read(struct rte_pci_ioport *p,
>  void rte_pci_ioport_write(struct rte_pci_ioport *p,
>  		const void *data, size_t len, off_t offset);
> 
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Enable TLP Processing Hints (TPH) in the endpoint device.
> + *
> + * @param dev
> + *   A pointer to a rte_pci_device structure describing the device
> + *   to use.
> + * @param mode
> + *   TPH mode the device must operate in.

The values of the "mode" parameter should be well defined in a DPDK public API.
I cannot find them.

The return value is missing from the documentation.
For all four functions here.

> + */
> +__rte_experimental
> +int rte_pci_tph_enable(struct rte_pci_device *dev, int mode);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Disable TLP Processing Hints (TPH) in the endpoint device.
> + *
> + * @param dev
> + *   A pointer to a rte_pci_device structure describing the device
> + *   to use.
> + */
> +__rte_experimental
> +int rte_pci_tph_disable(struct rte_pci_device *dev);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Get PCI Steering-Tags (STs) for a list of stashing targets.
> + *
> + * @param mode
> + *   TPH mode the device must operate in.
> + * @param info
> + *   An array of rte_tph_info objects, each describing the target
> + *   cpu-id, cache-level, etc. Steering-tags for each target is
> + *   eturned via info array.
> + * @param count
> + *   The number of elements in the info array.
> + */
> +__rte_experimental
> +int rte_pci_tph_st_get(const struct rte_pci_device *dev,
> +		struct rte_tph_info *info, size_t count);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Set PCI Steering-Tags (STs) for a list of stashing targets.
> + *
> + * @param mode
> + *   TPH mode the device must operate in.
> + * @param info
> + *   An array of rte_tph_info objects, each describing the target
> + *   cpu-id, cache-level, etc. Steering-tags for each target is
> + *   eturned via info array.
> + * @param count
> + *   The number of elements in the info array.
> + */
> +__rte_experimental
> +int rte_pci_tph_st_set(const struct rte_pci_device *dev,
> +		struct rte_tph_info *info, size_t count);
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/drivers/bus/pci/windows/pci.c
> b/drivers/bus/pci/windows/pci.c
> index e7e449306e..218e667a5a 100644
> --- a/drivers/bus/pci/windows/pci.c
> +++ b/drivers/bus/pci/windows/pci.c
> @@ -511,3 +511,46 @@ rte_pci_scan(void)
> 
>  	return ret;
>  }
> +
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_enable, 25.07)
> +int
> +rte_pci_tph_enable(struct rte_pci_device *dev, int mode)
> +{
> +	RTE_SET_USED(dev);
> +	RTE_SET_USED(mode);
> +	/* This feature is not yet implemented for windows */
> +	return -1;
> +}
> +
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_disable, 25.07)
> +int
> +rte_pci_tph_disable(struct rte_pci_device *dev)
> +{
> +	RTE_SET_USED(dev);
> +	/* This feature is not yet implemented for windows */
> +	return -1;
> +}
> +
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_st_get, 25.07)
> +int
> +rte_pci_tph_st_get(const struct rte_pci_device *dev,
> +		   struct rte_tph_info *info, size_t count)
> +{
> +	RTE_SET_USED(dev);
> +	RTE_SET_USED(info);
> +	RTE_SET_USED(count);
> +	/* This feature is not yet implemented for windows */
> +	return -1;
> +}
> +
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_st_set, 25.07)
> +int
> +rte_pci_tph_st_set(const struct rte_pci_device *dev,
> +		   struct rte_tph_info *info, size_t count)
> +{
> +	RTE_SET_USED(dev);
> +	RTE_SET_USED(info);
> +	RTE_SET_USED(count);
> +	/* This feature is not yet implemented for windows */
> +	return -1;
> +}
> diff --git a/lib/pci/rte_pci.h b/lib/pci/rte_pci.h
> index 9a50a12142..da9cd666bf 100644
> --- a/lib/pci/rte_pci.h
> +++ b/lib/pci/rte_pci.h
> @@ -137,6 +137,21 @@ extern "C" {
>  /* Process Address Space ID (RTE_PCI_EXT_CAP_ID_PASID) */
>  #define RTE_PCI_PASID_CTRL		0x06    /* PASID control register
> */
> 
> +/* TPH Requester */
> +#define RTE_PCI_TPH_CAP            4       /* capability register */
> +#define RTE_PCI_TPH_CAP_ST_NS      0x00000001 /* No ST Mode Supported
> */
> +#define RTE_PCI_TPH_CAP_ST_IV      0x00000002 /* Interrupt Vector Mode
> Supported */
> +#define RTE_PCI_TPH_CAP_ST_DS      0x00000004 /* Device Specific Mode
> Supported */
> +#define RTE_PCI_TPH_CAP_EXT_TPH    0x00000100 /* Ext TPH Requester
> Supported */
> +#define RTE_PCI_TPH_CAP_LOC_MASK   0x00000600 /* ST Table Location */
> +#define RTE_PCI_TPH_LOC_NONE       0x00000000 /* Not present */
> +#define RTE_PCI_TPH_LOC_CAP        0x00000200 /* In capability */
> +#define RTE_PCI_TPH_LOC_MSIX       0x00000400 /* In MSI-X */
> +#define RTE_PCI_TPH_CAP_ST_MASK    0x07FF0000 /* ST Table Size */
> +#define RTE_PCI_TPH_CAP_ST_SHIFT   16      /* ST Table Size shift */
> +#define RTE_PCI_TPH_BASE_SIZEOF    0xc     /* Size with no ST table */
> +
> +
>  /** Formatting string for PCI device identifier: Ex: 0000:00:01.0 */
>  #define PCI_PRI_FMT "%.4" PRIx32 ":%.2" PRIx8 ":%.2" PRIx8 ".%" PRIx8
>  #define PCI_PRI_STR_SIZE sizeof("XXXXXXXX:XX:XX.X")
> --
> 2.43.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH v5 3/4] ethdev: introduce the cache stashing hints API
  2025-06-02 22:38   ` [PATCH v5 3/4] ethdev: introduce the cache stashing hints API Wathsala Vithanage
@ 2025-06-03  8:43     ` Morten Brørup
  2025-06-05 10:03     ` Bruce Richardson
  1 sibling, 0 replies; 33+ messages in thread
From: Morten Brørup @ 2025-06-03  8:43 UTC (permalink / raw)
  To: Wathsala Vithanage, Thomas Monjalon, Ferruh Yigit,
	Andrew Rybchenko
  Cc: dev, nd, Honnappa Nagarahalli, Dhruv Tripathi

> From: Wathsala Vithanage [mailto:wathsala.vithanage@arm.com]
> Sent: Tuesday, 3 June 2025 00.38
> 
> Extend the ethdev library to enable the stashing of different data
> objects, such as the ones listed below, into CPU caches directly
> from the NIC.
> 
> - Rx/Tx queue descriptors
> - Rx packets
> - Packet headers
> - packet payloads
> - Data of a packet at an offset from the start of the packet
> 
> The APIs are designed in a hardware/vendor agnostic manner such that
> supporting PMDs could use any capabilities available in the underlying
> hardware for fine-grained stashing of data objects into a CPU cache
> 
> The API provides an interface to query the availability of stashing
> capabilities, i.e., platform/NIC support, stashable object types, etc,
> via the rte_eth_dev_stashing_capabilities_get interface.
> 
> The function pair rte_eth_dev_stashing_rx_config_set and
> rte_eth_dev_stashing_tx_config_set sets the stashing hint (the CPU,
> cache level, and data object types) on the Rx and Tx queues.
> 
> PMDs that support stashing must register their implementations with the
> following eth_dev_ops callbacks, which are invoked by the ethdev
> functions listed above.
> 
> - stashing_capabilities_get
> - stashing_rx_hints_set
> - stashing_tx_hints_set
> 
> Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>

Nitpicking:

In DPDK, "objects" usually refer to mbufs, mempool elements or similar.
So please use the term "flags" instead of "objects". Also in the macros, definitions and comments.

And DPDK usually uses 64 bit for flags, so they should be uint64_t. Unless there is a fast path performance related reason to make them shorter.

If you use the same flags for both RX and TX, instead of providing separate definitions for TX flags, the descriptions of the flags should reflect this.

The capabilities APIs should output the list/range of valid "offset" values for RX and TX.

And if the offset cannot be negative, please use an unsigned type for it.

The lib/ethdev/rte_ethdev.c follows the design pattern of setting rte_errno and returning -1 on error.
Please follow the same design pattern in the new functions added to the rte_ethdev library.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 0/4] An API for Cache Stashing with TPH
  2025-06-02 22:38 ` [PATCH v5 0/4] An API for Cache Stashing with TPH Wathsala Vithanage
                     ` (3 preceding siblings ...)
  2025-06-02 22:38   ` [PATCH v5 4/4] net/i40e: enable TPH in i40e Wathsala Vithanage
@ 2025-06-04 16:51   ` Stephen Hemminger
  2025-06-04 22:24     ` Wathsala Wathawana Vithanage
  2026-01-08  0:30   ` fengchengwen
  5 siblings, 1 reply; 33+ messages in thread
From: Stephen Hemminger @ 2025-06-04 16:51 UTC (permalink / raw)
  To: Wathsala Vithanage; +Cc: dev, nd

On Mon,  2 Jun 2025 22:38:00 +0000
Wathsala Vithanage <wathsala.vithanage@arm.com> wrote:

> Today, DPDK applications benefit from Direct Cache Access (DCA) features
> like Intel DDIO and Arm's write-allocate-to-SLC. However, those features
> do not allow fine-grained control of direct cache access, such as
> stashing packets into upper-level caches (L2 caches) of a processor or
> the shared cache of a chiplet. PCIe TLP Processing Hints (TPH) addresses
> this need in a vendor-agnostic manner. TPH capability has existed since
> PCI Express Base Specification revision 3.0; today, numerous Network
> Interface Cards and interconnects from different vendors support TPH
> capability. TPH comprises a steering tag (ST) and a processing hint
> (PH). ST specifies the cache level of a CPU at which the data should be
> written to (or DCAed into), while PH is a hint provided by the PCIe
> requester to the completer on an upcoming traffic pattern. Some NIC
> vendors bundle TPH capability with fine-grained control over the type of
> objects that can be stashed into CPU caches, such as
> 
> - Rx/Tx queue descriptors
> - Packet-headers
> - Packet-payloads
> - Data from a given offset from the start of a packet
> 
> Note that stashable object types are outside the scope of the PCIe
> standard; therefore, vendors could support any combination of the above
> items as they see fit.
> 
> To enable TPH and fine-grained packet stashing, this API extends the
> ethdev library and the PCI bus driver. In this design, the application
> provides hints to the PMD via the ethdev stashing API to indicate the
> underlying hardware at which CPU and cache level it prefers a packet to
> end up. Once the PMD receives a CPU and a cache-level combination (or a
> list of such combinations), it must extract the matching ST from the PCI
> bus driver for such combinations. The PCI bus driver implements the TPH
> functions in an OS specific way; for Linux, it depends on the TPH
> capabilities of the VFIO kernel driver.
> 
> An application uses the cache stashing ethdev API by first calling the
> rte_eth_dev_stashing_capabilities_get() function to find out what object
> types can be stashed into a CPU cache by the NIC out of the object types
> in the bulleted list above. This function takes a port_id and a pointer
> to a uint16_t to report back the object type flags. PMD implements the
> stashing_capabilities_get function pointer in eth_dev_ops. If the
> underlying platform or the NIC does not support TPH, this function
> returns -ENOTSUP, and the application should consider any values stored
> in the object invalid.
> 
> Once the application knows the supported object types that can be
> stashed, the next step is to set the steering tags for the packets
> associated with Rx and Tx queues via
> rte_eth_dev_stashing_{rx,tx}_config_set() ethdev library functions. Both
> functions have an identical signature, a port_id, a queue_id, and a
> config object. The port_id and the queue_id are used to locate the
> device and the queue. The config object is of type struct
> rte_eth_stashing_config, which specifies the lcore_id and the
> cache_level, indicating where objects from this queue should be stashed.
> The 'objects' field in the config sets the types of objects the
> application wishes to stash based on the capabilities found earlier.
> Note that if the 'objects' field includes the flag
> RTE_ETH_DEV_STASH_OBJECT_OFFSET, the 'offset' field must be used to set
> the desired offset. These functions invoke PMD implementations of the
> stashing functionality via the stashing_{rx,tx}_hints_set function
> callbacks in the eth_dev_ops, respectively.
> 
> The PMD's implementation of the stashing_rx_hints_set() and
> stashing_tx_hints_set() functions is ultimately responsible for
> extracting the ST via the API provided by the PCI bus driver. Before
> extracting STs, the PMD should enable the TPH capability in the endpoint
> device by calling the rte_pci_tph_enable() function.  The application
> begins the ST extraction process by calling the rte_pci_tph_st_get()
> function in drivers/bus/pci/rte_bus_pci.h, which returns STs via the
> same rte_tph_info objects array passed into it as an argument.  Once PMD
> acquires ST, the stashing_{rx,tx}_hints_set callbacks implemented in the
> PMD are ready to set the ST as per the rte_eth_stashing_config object
> passed to them by the higher-level ethdev functions
> ret_eth_dev_stashing_{rx,tx}_hints(). As per the PCIe specification, STs
> can be placed on the MSI-X tables or in a device-specific location. For
> PMDs, setting the STs on queue contexts is the only viable way of using
> TPH. Therefore, the PMDs should only enable TPH in device-specific mode.
> 
> V4->V5:
>  * Enable stashing-hints (TPH) in Intel i40e driver.
>  * Update exported symbol version from 25.03 to 25.07.
>  * Add TPH mode macros.
> 
> V3->V4:
>  * Add VFIO IOCTL based ST extraction mechanism to Linux PCI bus driver
>  * Remove ST extraction via direct access to ACPI _DSM
>  * Replace rte_pci_extract_tph_st() with rte_pci_tph_st_get() in PCI
>    bus driver.
> 
> Wathsala Vithanage (4):
>   pci: add non-merged Linux uAPI changes
>   bus/pci: introduce the PCIe TLP Processing Hints API
>   ethdev: introduce the cache stashing hints API
>   net/i40e: enable TPH in i40e
> 
>  drivers/bus/pci/bsd/pci.c            |  43 +++++++
>  drivers/bus/pci/bus_pci_driver.h     |  52 ++++++++
>  drivers/bus/pci/linux/pci.c          | 100 ++++++++++++++++
>  drivers/bus/pci/linux/pci_init.h     |  14 +++
>  drivers/bus/pci/linux/pci_vfio.c     | 170 +++++++++++++++++++++++++++
>  drivers/bus/pci/private.h            |   8 ++
>  drivers/bus/pci/rte_bus_pci.h        |  67 +++++++++++
>  drivers/bus/pci/windows/pci.c        |  43 +++++++
>  drivers/net/intel/i40e/i40e_ethdev.c | 127 ++++++++++++++++++++
>  kernel/linux/uapi/linux/vfio_tph.h   | 102 ++++++++++++++++
>  lib/ethdev/ethdev_driver.h           |  66 +++++++++++
>  lib/ethdev/rte_ethdev.c              | 149 +++++++++++++++++++++++
>  lib/ethdev/rte_ethdev.h              | 158 +++++++++++++++++++++++++
>  lib/pci/rte_pci.h                    |  15 +++
>  14 files changed, 1114 insertions(+)
>  create mode 100644 kernel/linux/uapi/linux/vfio_tph.h
> 

How will this impact existing applications that never use the API?
It is crucial that existing 3rd party applications, just work without
modifications. We don't want to hear from Network Virtual Appliance
vendors that there is a performance regression in DPDK. They are already
reluctant to keep up with DPDK versions.

I.e if the application does nothing caching must be enabled.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 2/4] bus/pci: introduce the PCIe TLP Processing Hints API
  2025-06-02 22:38   ` [PATCH v5 2/4] bus/pci: introduce the PCIe TLP Processing Hints API Wathsala Vithanage
  2025-06-03  8:11     ` Morten Brørup
@ 2025-06-04 16:54     ` Bruce Richardson
  2025-06-04 22:52       ` Wathsala Wathawana Vithanage
  2025-06-05 10:30     ` Bruce Richardson
  2 siblings, 1 reply; 33+ messages in thread
From: Bruce Richardson @ 2025-06-04 16:54 UTC (permalink / raw)
  To: Wathsala Vithanage
  Cc: Chenbo Xia, Nipun Gupta, Anatoly Burakov, Gaetan Rivet, dev, nd,
	Honnappa Nagarahalli, Dhruv Tripathi

On Mon, Jun 02, 2025 at 10:38:02PM +0000, Wathsala Vithanage wrote:
> Extend the PCI bus driver to enable or disable TPH capability and set or
> get PCI Steering-Tags (STs) on an endpoint device. The functions
> rte_pci_tph_{enable, disable,st_set,st_get} provide the primary
> interface for DPDK device drivers. Implementation of the interface is OS
> dependent. For Linux, the kernel VFIO driver provides the
> implementation. rte_pci_tph_{enable, disable} functions enable and
> disable TPH capability, respectively. rte_pci_tph_enable enables TPH on
> the device in either of the device-specific, interrupt-vector, or
> no-steering-tag modes.
> 
> rte_pci_tph_st_{get, set} functions take an array of rte_tph_info
> objects with cpu-id, cache-level, flags (processing-hint, memory-type).
> The index in rte_tph_info is the MSI-X/MSI vector/ST-table index if TPH
> was enabled in the interrupt-vector mode; the rte_pci_tph_st_get
> function ignores it. Both rte_pci_tph_st_{set, get} functions return the
> steering-tag (st) and processing-hint-ignored (ph_ignore) fields via the
> same rte_tph_info object passed into them.
> 
> rte_pci_tph_st_{get, set} functions will return an error if processing
> any of the rte_tph_info objects fails. The API does not indicate which
> entry in the rte_tph_info array was executed successfully and which
> caused an error. Therefore, in case of an error, the caller should
> discard the output. If rte_pci_tph_set returns an error, it should be
> treated as a partial error. Hence, the steering-tag update on the device
> should be considered partial and inconsistent with the expected outcome.
> This should be resolved by resetting the endpoint device before further
> attempts to set steering tags.

This seems very clunky for the user. Is there a fundamental reason why we
cannot report out what ones passed or failed?

If it's a limitation of the kernel IOCTL, how about just making one ioctl
for each individual op requested, one at a time. That way we will know what
failed to report it?

Other comments inline below.

/Bruce

> 
> Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
> ---
>  drivers/bus/pci/bsd/pci.c        |  43 ++++++++
>  drivers/bus/pci/bus_pci_driver.h |  52 ++++++++++
>  drivers/bus/pci/linux/pci.c      | 100 ++++++++++++++++++
>  drivers/bus/pci/linux/pci_init.h |  13 +++
>  drivers/bus/pci/linux/pci_vfio.c | 170 +++++++++++++++++++++++++++++++
>  drivers/bus/pci/private.h        |   8 ++
>  drivers/bus/pci/rte_bus_pci.h    |  67 ++++++++++++
>  drivers/bus/pci/windows/pci.c    |  43 ++++++++
>  lib/pci/rte_pci.h                |  15 +++
>  9 files changed, 511 insertions(+)
> 
> diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
> index 5e2e09d5a4..dff750c4d6 100644
> --- a/drivers/bus/pci/bsd/pci.c
> +++ b/drivers/bus/pci/bsd/pci.c

<snip>

> diff --git a/drivers/bus/pci/bus_pci_driver.h b/drivers/bus/pci/bus_pci_driver.h
> index 2cc1119072..b1c2829fc1 100644
> --- a/drivers/bus/pci/bus_pci_driver.h
> +++ b/drivers/bus/pci/bus_pci_driver.h
> @@ -46,6 +46,7 @@ struct rte_pci_device {
>  	char *bus_info;                     /**< PCI bus specific info */
>  	struct rte_intr_handle *vfio_req_intr_handle;
>  				/**< Handler of VFIO request interrupt */
> +	uint8_t tph_enabled;                /**< TPH enabled on this device */

question: what would happen if we always enabled tph for each device. Does
doing so disable the default handling for the device?

>  };
>  
>  /**
> @@ -194,6 +195,57 @@ struct rte_pci_ioport {
>  	uint64_t len; /* only filled for memory mapped ports */
>  };
>  
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this structure may change, or be removed, without prior
> + * notice
> + *
> + * This structure is passed into the TPH Steering-Tag set or get function as an
> + * argument by the caller. Return values are set in the same structure in st and
> + * ph_ignore fields by the calee.
> + *
> + * Refer to PCI-SIG ECN "Revised _DSM for Cache Locality TPH Features" for
> + * details.
> + */
> +struct rte_tph_info {
> +	/* Input */
> +	uint32_t cpu_id;	/*Logical CPU id*/
> +	uint32_t cache_level;	/*Cache level relative to CPU. l1d=0,l2d=1,...*/
> +	uint8_t flags;		/*Memory type, procesisng hint etc.*/
> +	uint16_t index;		/*Index in vector table to store the ST*/
> +

These fields should be reordered in order of size to avoid unnecessary
gaps.

For the flags field, I dislike having different sets of flags all
multiplexed into a single field. Can we instead of the flags field, and the
set of #defines below split these out into separate enums, and then have
separate fields for each one.

For example:
	struct rte_tph_info {
		uint32_t cpu_id;
		uint32_t cache_level;
		enum rte_tph_mem_type mem_type;
		enum rte_tph_hint hints;
		enum rte_tph_mode mode;
		...
	}

While the structure takes more space this way, this is not a datapath
structure that we should be seeing large arrays of it, or that needs to be
processed quickly, so usability should be prioritized over
size/compactness.


> +	/* Output */
> +	uint16_t st;		/*Steering tag returned by the platform*/
> +	uint8_t ph_ignore;	/*Platform ignores PH for the returned ST*/
> +};
> +
> +#define RTE_PCI_TPH_MEM_TYPE_MASK		0x1
> +#define RTE_PCI_TPH_MEM_TYPE_SHIFT		0
> +/** Request volatile memory ST */
> +#define RTE_PCI_TPH_MEM_TYPE_VMEM		0
> +/** Request persistent memory ST */
> +#define RTE_PCI_TPH_MEM_TYPE_PMEM		1
> +
> +/** TLP Processing Hints - PCIe 6.0 specification section 2.2.7.1.1 */
> +#define RTE_PCI_TPH_HINT_MASK		0x3

Looking at the mask usage below, does this mask not need to also be shifted
by the TPH_HINT_SHIFT? Otherwise it overlaps with the type mask.

> +#define RTE_PCI_TPH_HINT_SHIFT		1
> +/** Host and device access data equally */
> +#define RTE_PCI_TPH_HINT_BIDIR		0
> +/** Device accesses data more frequently */
> +#define RTE_PCI_TPH_HINT_REQSTR		(1 << RTE_PCI_TPH_HINT_SHIFT)
> +/** Host access data more frequently */
> +#define RTE_PCI_TPH_HINT_TARGET		(2 << RTE_PCI_TPH_HINT_SHIFT)
> +/** Host access data more frequently with a high temporal locality */
> +#define RTE_PCI_TPH_HINT_TARGET_PRIO	(3 << RTE_PCI_TPH_HINT_SHIFT)
> +
> +#define RTE_PCI_TPH_ST_MODE_MASK   0x3
> +/** TPH no ST mode */
> +#define RTE_PCI_TPH_ST_NS_MODE	   0
> +/** TPH interrupt vector mode */
> +#define RTE_PCI_TPH_ST_IV_MODE	   1
> +/** TPH device specific mode */
> +#define RTE_PCI_TPH_ST_DS_MODE	   2
> +

As above, I think these would be nicer defined in different enums, going to
separate fields in the struct. That would also remove any ambiguity as to
whether the masks include the shift or not.

>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
> index c20d159218..b5a8ba0a86 100644
> --- a/drivers/bus/pci/linux/pci.c
> +++ b/drivers/bus/pci/linux/pci.c
> @@ -814,3 +814,103 @@ rte_pci_ioport_unmap(struct rte_pci_ioport *p)
>  
>  	return ret;
>  }
> +
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_enable, 25.07)
> +int
> +rte_pci_tph_enable(struct rte_pci_device *dev, int mode)
> +{
> +	int ret = 0;
> +

Should check here if dev->tph_enabled is already true.

> +	switch (dev->kdrv) {
> +#ifdef VFIO_PRESENT
> +	case RTE_PCI_KDRV_VFIO:
> +		if (pci_vfio_is_enabled())
> +			ret = pci_vfio_tph_enable(dev, mode);
> +		break;
> +#endif
> +	case RTE_PCI_KDRV_IGB_UIO:
> +	case RTE_PCI_KDRV_UIO_GENERIC:
> +	default:
> +		ret = -ENOTSUP;
> +		break;
> +	}
> +
> +	if (!ret)

Prefer "ret == 0" for this comparison.

> +		dev->tph_enabled = 1;
> +
> +	return ret;
> +}
> +

Function could probably be shortened to something like (including a check
for already enabled, 2 lines shorter if we rely on checks in the
vfio_tph_enable() call):

int
rte_pci_tph_enable(...)
{
#ifdef VFIO_PRESENT
	if (dev->kdrv == RTE_PCI_KDRV_VFIO && pci_vfio_is_enabled()) {
		if (dev->tph_enabled == 0) {
			int ret = pci_vfio_tph_enable(...);
			if (ret != 0)
				return ret;
			dev->tph_enabled = 1;
		}
		return 0;
	}
#endif
	return -ENOTSUP
}


> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_disable, 25.07)
> +int
> +rte_pci_tph_disable(struct rte_pci_device *dev)
> +{
> +	int ret = 0;
> +
> +	switch (dev->kdrv) {
> +#ifdef VFIO_PRESENT
> +	case RTE_PCI_KDRV_VFIO:
> +		if (pci_vfio_is_enabled())
> +			ret = pci_vfio_tph_disable(dev);
> +		break;
> +#endif
> +	case RTE_PCI_KDRV_IGB_UIO:
> +	case RTE_PCI_KDRV_UIO_GENERIC:
> +	default:
> +		ret = -ENOTSUP;
> +		break;
> +	}
> +
> +	if (!ret)
> +		dev->tph_enabled = 0;
> +
> +	return ret;
> +}

As above, we can shorten this function by replacing the switch with a
straight check for kdrv == RTE_PCI_KDRV_VFIO. Same with functions below
too.

> +
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_st_get, 25.07)
> +int
> +rte_pci_tph_st_get(const struct rte_pci_device *dev,
> +		   struct rte_tph_info *info, size_t count)
> +{
> +	int ret = 0;
> +
> +	switch (dev->kdrv) {
> +#ifdef VFIO_PRESENT
> +	case RTE_PCI_KDRV_VFIO:
> +		if (pci_vfio_is_enabled())
> +			ret = pci_vfio_tph_st_get(dev, info, count);
> +		break;
> +#endif
> +	case RTE_PCI_KDRV_IGB_UIO:
> +	case RTE_PCI_KDRV_UIO_GENERIC:
> +	default:
> +		ret = -ENOTSUP;
> +		break;
> +	}
> +
> +	return ret;
> +}
> +
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_st_set, 25.07)
> +int
> +rte_pci_tph_st_set(const struct rte_pci_device *dev,
> +		   struct rte_tph_info *info, size_t count)
> +{
> +	int ret = 0;
> +
> +	switch (dev->kdrv) {
> +#ifdef VFIO_PRESENT
> +	case RTE_PCI_KDRV_VFIO:
> +		if (pci_vfio_is_enabled())
> +			ret = pci_vfio_tph_st_set(dev, info, count);
> +		break;
> +#endif
> +	case RTE_PCI_KDRV_IGB_UIO:
> +	case RTE_PCI_KDRV_UIO_GENERIC:
> +	default:
> +		ret = -ENOTSUP;
> +		break;
> +	}
> +
> +	return ret;
> +}
> diff --git a/drivers/bus/pci/linux/pci_init.h b/drivers/bus/pci/linux/pci_init.h
> index 25b901f460..e71bfd2dce 100644
> --- a/drivers/bus/pci/linux/pci_init.h
> +++ b/drivers/bus/pci/linux/pci_init.h
> @@ -5,6 +5,7 @@
>  #ifndef EAL_PCI_INIT_H_
>  #define EAL_PCI_INIT_H_
>  
> +#include <rte_compat.h>
>  #include <rte_vfio.h>
>  #include <uapi/linux/vfio_tph.h>
>  
> @@ -76,6 +77,18 @@ int pci_vfio_ioport_unmap(struct rte_pci_ioport *p);
>  int pci_vfio_map_resource(struct rte_pci_device *dev);
>  int pci_vfio_unmap_resource(struct rte_pci_device *dev);
>  
> +/* TLP Processing Hints control functions */
> +__rte_experimental
> +int pci_vfio_tph_enable(const struct rte_pci_device *dev, int mode);
> +__rte_experimental
> +int pci_vfio_tph_disable(const struct rte_pci_device *dev);
> +__rte_experimental
> +int pci_vfio_tph_st_get(const struct rte_pci_device *dev,
> +			struct rte_tph_info *info, size_t ent_count);
> +__rte_experimental
> +int pci_vfio_tph_st_set(const struct rte_pci_device *dev,
> +			struct rte_tph_info *info, size_t ent_count);
> +
>  int pci_vfio_is_enabled(void);
>  
>  #endif
> diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
> index 5317170231..bdbeb38658 100644
> --- a/drivers/bus/pci/linux/pci_vfio.c
> +++ b/drivers/bus/pci/linux/pci_vfio.c
> @@ -12,6 +12,7 @@
>  #include <stdbool.h>
>  
>  #include <rte_log.h>
> +#include <eal_export.h>
>  #include <rte_pci.h>
>  #include <rte_bus_pci.h>
>  #include <rte_eal_paging.h>
> @@ -1316,6 +1317,175 @@ pci_vfio_mmio_write(const struct rte_pci_device *dev, int bar,
>  	return pwrite(fd, buf, len, offset + offs);
>  }
>  
> +static int
> +pci_vfio_tph_ioctl(const struct rte_pci_device *dev, struct vfio_pci_tph *pci_tph)
> +{
> +	const struct rte_intr_handle *intr_handle = dev->intr_handle;
> +	int vfio_dev_fd = 0, ret = 0;
> +
> +	vfio_dev_fd = rte_intr_dev_fd_get(intr_handle);
> +	if (vfio_dev_fd < 0) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	ret = ioctl(vfio_dev_fd, VFIO_DEVICE_PCI_TPH, pci_tph);
> +out:
> +	return ret;
> +}
> +
> +static int
> +pci_vfio_tph_st_op(const struct rte_pci_device *dev,
> +		    struct rte_tph_info *info, size_t count,
> +		    enum rte_pci_st_op op)
> +{
> +	int ret = 0;
> +	size_t argsz = 0, i;
> +	struct vfio_pci_tph *pci_tph = NULL;
> +	uint8_t mem_type = 0, hint = 0;
> +
> +	if (!count) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	argsz = sizeof(struct vfio_pci_tph) +
> +		count * sizeof(struct vfio_pci_tph_entry);
> +
> +	pci_tph = rte_zmalloc(NULL, argsz, 0);

For ioctl we should not need pinned memory. Use regular malloc here.

> +	if (!pci_tph) {

Coding style guidelines say to compare pointers explicitly to NULL.

> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	pci_tph->argsz = argsz;
> +	pci_tph->count = count;
> +
> +	switch (op) {
> +	case RTE_PCI_TPH_ST_GET:
> +		pci_tph->flags = VFIO_DEVICE_TPH_GET_ST;
> +		break;
> +	case RTE_PCI_TPH_ST_SET:
> +		pci_tph->flags = VFIO_DEVICE_TPH_SET_ST;
> +		break;
> +	default:
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	for (i = 0; i < count; i++) {
> +		pci_tph->ents[i].cpu_id = info[i].cpu_id;
> +		pci_tph->ents[i].cache_level = info[i].cache_level;
> +
> +		mem_type = info[i].flags & RTE_PCI_TPH_MEM_TYPE_MASK;
> +		switch (mem_type) {
> +		case RTE_PCI_TPH_MEM_TYPE_VMEM:
> +			pci_tph->ents[i].flags |= VFIO_TPH_MEM_TYPE_VMEM;
> +			break;
> +		case RTE_PCI_TPH_MEM_TYPE_PMEM:
> +			pci_tph->ents[i].flags |= VFIO_TPH_MEM_TYPE_PMEM;
> +			break;
> +		default:
> +			ret = -EINVAL;
> +			goto out;
> +		}
> +
> +		hint = info[i].flags & RTE_PCI_TPH_HINT_MASK;

As pointed out above, unshifted, this HINT_MASK overlaps with the
TYPE_MASK.

> +		switch (hint) {
> +		case RTE_PCI_TPH_HINT_BIDIR:
> +			pci_tph->ents[i].flags |= VFIO_TPH_HINT_BIDIR;
> +			break;
> +		case RTE_PCI_TPH_HINT_REQSTR:
> +			pci_tph->ents[i].flags |= VFIO_TPH_HINT_REQSTR;
> +			break;
> +		case RTE_PCI_TPH_HINT_TARGET:
> +			pci_tph->ents[i].flags |= VFIO_TPH_HINT_TARGET;
> +			break;
> +		case RTE_PCI_TPH_HINT_TARGET_PRIO:
> +			pci_tph->ents[i].flags |= VFIO_TPH_HINT_TARGET_PRIO;
> +			break;
> +		default:
> +			ret = -EINVAL;
> +			goto out;
> +		}
> +
> +		if (op == RTE_PCI_TPH_ST_SET)
> +			pci_tph->ents[i].index = info[i].index;
> +	}
> +
> +	ret = pci_vfio_tph_ioctl(dev, pci_tph);
> +	if (ret)

Again, check explicitly for "== 0".

> +		goto out;
> +
> +	/*
> +	 * Kernel returns steering-tag and ph-ignore bits for
> +	 * RTE_PCI_TPH_ST_SET too, therefore copy output for
> +	 * both RTE_PCI_TPH_ST_SET and RTE_PCI_TPH_ST_GET
> +	 * cases.
> +	 */
> +	for (i = 0; i < count; i++) {
> +		info[i].st = pci_tph->ents[i].st;
> +		info[i].ph_ignore = pci_tph->ents[i].ph_ignore;
> +	}
> +
> +out:
> +	if (pci_tph)
> +		rte_free(pci_tph);

Free functions work fine with null pointers, so just call free without a
null check.

> +	return ret;
> +}
> +
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(pci_vfio_tph_enable, 25.07)
> +int
> +pci_vfio_tph_enable(const struct rte_pci_device *dev, int mode)
> +{
> +	int ret;
> +
> +	if (!(mode ^ (mode & VFIO_TPH_ST_MODE_MASK))) {

So it's an error to twice set the mode to the same thing? should it not
just be a no-op?

> +		ret = -EINVAL;
> +		goto out;
> +	} else
> +		mode &= VFIO_TPH_ST_MODE_MASK;
> +
> +	struct vfio_pci_tph pci_tph = {
> +		.argsz = sizeof(struct vfio_pci_tph),
> +		.flags = VFIO_DEVICE_TPH_ENABLE | mode,
> +		.count = 0
> +	};
> +
> +	ret = pci_vfio_tph_ioctl(dev, &pci_tph);
> +out:
> +	return ret;
> +}
> +
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(pci_vfio_tph_disable, 25.07)
> +int
> +pci_vfio_tph_disable(const struct rte_pci_device *dev)
> +{

Check here, or in caller to see if it's already enabled?

> +	struct vfio_pci_tph pci_tph = {
> +		.argsz = sizeof(struct vfio_pci_tph),
> +		.flags = VFIO_DEVICE_TPH_DISABLE,
> +		.count = 0
> +	};
> +
> +	return pci_vfio_tph_ioctl(dev, &pci_tph);
> +}
> +
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(pci_vfio_tph_st_get, 25.07)
> +int
> +pci_vfio_tph_st_get(const struct rte_pci_device *dev,
> +		    struct rte_tph_info *info, size_t count)
> +{
> +	return pci_vfio_tph_st_op(dev, info, count, RTE_PCI_TPH_ST_GET);
> +}
> +
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(pci_vfio_tph_st_set, 25.07)
> +int
> +pci_vfio_tph_st_set(const struct rte_pci_device *dev,
> +		    struct rte_tph_info *info, size_t count)
> +{
> +	return pci_vfio_tph_st_op(dev, info, count, RTE_PCI_TPH_ST_SET);
> +}
> +
>  int
>  pci_vfio_is_enabled(void)
>  {
> diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
> index 38109844b9..d2ec370320 100644
> --- a/drivers/bus/pci/private.h
> +++ b/drivers/bus/pci/private.h
> @@ -335,4 +335,12 @@ rte_pci_dev_iterate(const void *start,
>  int
>  rte_pci_devargs_parse(struct rte_devargs *da);
>  
> +/*
> + * TPH Steering-Tag operation types.
> + */
> +enum rte_pci_st_op {
> +	RTE_PCI_TPH_ST_SET, /* Set TPH Steering - Tags */
> +	RTE_PCI_TPH_ST_GET  /* Get TPH Steering - Tags */
> +};
> +
>  #endif /* _PCI_PRIVATE_H_ */
> diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
> index 19a7b15b99..e4d4780f54 100644
> --- a/drivers/bus/pci/rte_bus_pci.h
> +++ b/drivers/bus/pci/rte_bus_pci.h
> @@ -31,6 +31,7 @@ extern "C" {
>  struct rte_pci_device;
>  struct rte_pci_driver;
>  struct rte_pci_ioport;
> +struct rte_tph_info;
>  
>  struct rte_devargs;
>  
> @@ -312,6 +313,72 @@ void rte_pci_ioport_read(struct rte_pci_ioport *p,
>  void rte_pci_ioport_write(struct rte_pci_ioport *p,
>  		const void *data, size_t len, off_t offset);
>  
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Enable TLP Processing Hints (TPH) in the endpoint device.
> + *
> + * @param dev
> + *   A pointer to a rte_pci_device structure describing the device
> + *   to use.
> + * @param mode
> + *   TPH mode the device must operate in.
> + */
> +__rte_experimental
> +int rte_pci_tph_enable(struct rte_pci_device *dev, int mode);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Disable TLP Processing Hints (TPH) in the endpoint device.
> + *
> + * @param dev
> + *   A pointer to a rte_pci_device structure describing the device
> + *   to use.
> + */
> +__rte_experimental
> +int rte_pci_tph_disable(struct rte_pci_device *dev);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Get PCI Steering-Tags (STs) for a list of stashing targets.
> + *
> + * @param mode
> + *   TPH mode the device must operate in.
> + * @param info
> + *   An array of rte_tph_info objects, each describing the target
> + *   cpu-id, cache-level, etc. Steering-tags for each target is
> + *   eturned via info array.
> + * @param count
> + *   The number of elements in the info array.
> + */
> +__rte_experimental
> +int rte_pci_tph_st_get(const struct rte_pci_device *dev,
> +		struct rte_tph_info *info, size_t count);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Set PCI Steering-Tags (STs) for a list of stashing targets.
> + *
> + * @param mode
> + *   TPH mode the device must operate in.
> + * @param info
> + *   An array of rte_tph_info objects, each describing the target
> + *   cpu-id, cache-level, etc. Steering-tags for each target is
> + *   eturned via info array.
> + * @param count
> + *   The number of elements in the info array.
> + */
> +__rte_experimental
> +int rte_pci_tph_st_set(const struct rte_pci_device *dev,
> +		struct rte_tph_info *info, size_t count);
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/drivers/bus/pci/windows/pci.c b/drivers/bus/pci/windows/pci.c
> index e7e449306e..218e667a5a 100644
> --- a/drivers/bus/pci/windows/pci.c
> +++ b/drivers/bus/pci/windows/pci.c
> @@ -511,3 +511,46 @@ rte_pci_scan(void)
>  
>  	return ret;
>  }
> +
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_enable, 25.07)
> +int
> +rte_pci_tph_enable(struct rte_pci_device *dev, int mode)
> +{
> +	RTE_SET_USED(dev);
> +	RTE_SET_USED(mode);
> +	/* This feature is not yet implemented for windows */
> +	return -1;
> +}
> +
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_disable, 25.07)
> +int
> +rte_pci_tph_disable(struct rte_pci_device *dev)
> +{
> +	RTE_SET_USED(dev);
> +	/* This feature is not yet implemented for windows */
> +	return -1;
> +}
> +
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_st_get, 25.07)
> +int
> +rte_pci_tph_st_get(const struct rte_pci_device *dev,
> +		   struct rte_tph_info *info, size_t count)
> +{
> +	RTE_SET_USED(dev);
> +	RTE_SET_USED(info);
> +	RTE_SET_USED(count);
> +	/* This feature is not yet implemented for windows */
> +	return -1;
> +}
> +
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_st_set, 25.07)
> +int
> +rte_pci_tph_st_set(const struct rte_pci_device *dev,
> +		   struct rte_tph_info *info, size_t count)
> +{
> +	RTE_SET_USED(dev);
> +	RTE_SET_USED(info);
> +	RTE_SET_USED(count);
> +	/* This feature is not yet implemented for windows */
> +	return -1;
> +}
> diff --git a/lib/pci/rte_pci.h b/lib/pci/rte_pci.h
> index 9a50a12142..da9cd666bf 100644
> --- a/lib/pci/rte_pci.h
> +++ b/lib/pci/rte_pci.h
> @@ -137,6 +137,21 @@ extern "C" {
>  /* Process Address Space ID (RTE_PCI_EXT_CAP_ID_PASID) */
>  #define RTE_PCI_PASID_CTRL		0x06    /* PASID control register */
>  
> +/* TPH Requester */
> +#define RTE_PCI_TPH_CAP            4       /* capability register */
> +#define RTE_PCI_TPH_CAP_ST_NS      0x00000001 /* No ST Mode Supported */
> +#define RTE_PCI_TPH_CAP_ST_IV      0x00000002 /* Interrupt Vector Mode Supported */
> +#define RTE_PCI_TPH_CAP_ST_DS      0x00000004 /* Device Specific Mode Supported */
> +#define RTE_PCI_TPH_CAP_EXT_TPH    0x00000100 /* Ext TPH Requester Supported */
> +#define RTE_PCI_TPH_CAP_LOC_MASK   0x00000600 /* ST Table Location */
> +#define RTE_PCI_TPH_LOC_NONE       0x00000000 /* Not present */
> +#define RTE_PCI_TPH_LOC_CAP        0x00000200 /* In capability */
> +#define RTE_PCI_TPH_LOC_MSIX       0x00000400 /* In MSI-X */
> +#define RTE_PCI_TPH_CAP_ST_MASK    0x07FF0000 /* ST Table Size */
> +#define RTE_PCI_TPH_CAP_ST_SHIFT   16      /* ST Table Size shift */
> +#define RTE_PCI_TPH_BASE_SIZEOF    0xc     /* Size with no ST table */
> +
> +

Where are all these values used? They don't seem to be needed by this
patch. If needed in later patches, I'd suggest adding them there.

>  /** Formatting string for PCI device identifier: Ex: 0000:00:01.0 */
>  #define PCI_PRI_FMT "%.4" PRIx32 ":%.2" PRIx8 ":%.2" PRIx8 ".%" PRIx8
>  #define PCI_PRI_STR_SIZE sizeof("XXXXXXXX:XX:XX.X")
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 1/4] pci: add non-merged Linux uAPI changes
  2025-06-02 22:38   ` [PATCH v5 1/4] pci: add non-merged Linux uAPI changes Wathsala Vithanage
  2025-06-02 23:11     ` Wathsala Wathawana Vithanage
@ 2025-06-04 20:43     ` Stephen Hemminger
  1 sibling, 0 replies; 33+ messages in thread
From: Stephen Hemminger @ 2025-06-04 20:43 UTC (permalink / raw)
  To: Wathsala Vithanage
  Cc: Chenbo Xia, Nipun Gupta, Maxime Coquelin, dev, nd, Dhruv Tripathi

On Mon,  2 Jun 2025 22:38:01 +0000
Wathsala Vithanage <wathsala.vithanage@arm.com> wrote:

> This commit is a hack to prevent build failures the next commit in this
> patch series causes due to missing vfio uapi definitions.
> This commit should NEVER BE MERGED.
> Next commit in this patch series depends on additions to vfio uapi that
> enable TPH icotl in the vfio-pci driver in the Linux kernel.
> These additions have not yet been merged into the upstream kernel.
> 
> Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
> Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
> ---

The series will not be reviewed or accepted until the kernel
changes are merged upstream and in a kernel release from Linus.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH v5 0/4] An API for Cache Stashing with TPH
  2025-06-04 16:51   ` [PATCH v5 0/4] An API for Cache Stashing with TPH Stephen Hemminger
@ 2025-06-04 22:24     ` Wathsala Wathawana Vithanage
  0 siblings, 0 replies; 33+ messages in thread
From: Wathsala Wathawana Vithanage @ 2025-06-04 22:24 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev@dpdk.org, nd, nd

> > Today, DPDK applications benefit from Direct Cache Access (DCA)
> > features like Intel DDIO and Arm's write-allocate-to-SLC. However,
> > those features do not allow fine-grained control of direct cache
> > access, such as stashing packets into upper-level caches (L2 caches)
> > of a processor or the shared cache of a chiplet. PCIe TLP Processing
> > Hints (TPH) addresses this need in a vendor-agnostic manner. TPH
> > capability has existed since PCI Express Base Specification revision
> > 3.0; today, numerous Network Interface Cards and interconnects from
> > different vendors support TPH capability. TPH comprises a steering tag
> > (ST) and a processing hint (PH). ST specifies the cache level of a CPU
> > at which the data should be written to (or DCAed into), while PH is a
> > hint provided by the PCIe requester to the completer on an upcoming
> > traffic pattern. Some NIC vendors bundle TPH capability with
> > fine-grained control over the type of objects that can be stashed into
> > CPU caches, such as
> >
> > - Rx/Tx queue descriptors
> > - Packet-headers
> > - Packet-payloads
> > - Data from a given offset from the start of a packet
> >
> > Note that stashable object types are outside the scope of the PCIe
> > standard; therefore, vendors could support any combination of the
> > above items as they see fit.
> >
> > To enable TPH and fine-grained packet stashing, this API extends the
> > ethdev library and the PCI bus driver. In this design, the application
> > provides hints to the PMD via the ethdev stashing API to indicate the
> > underlying hardware at which CPU and cache level it prefers a packet
> > to end up. Once the PMD receives a CPU and a cache-level combination
> > (or a list of such combinations), it must extract the matching ST from
> > the PCI bus driver for such combinations. The PCI bus driver
> > implements the TPH functions in an OS specific way; for Linux, it
> > depends on the TPH capabilities of the VFIO kernel driver.
> >
> > An application uses the cache stashing ethdev API by first calling the
> > rte_eth_dev_stashing_capabilities_get() function to find out what
> > object types can be stashed into a CPU cache by the NIC out of the
> > object types in the bulleted list above. This function takes a port_id
> > and a pointer to a uint16_t to report back the object type flags. PMD
> > implements the stashing_capabilities_get function pointer in
> > eth_dev_ops. If the underlying platform or the NIC does not support
> > TPH, this function returns -ENOTSUP, and the application should
> > consider any values stored in the object invalid.
> >
> > Once the application knows the supported object types that can be
> > stashed, the next step is to set the steering tags for the packets
> > associated with Rx and Tx queues via
> > rte_eth_dev_stashing_{rx,tx}_config_set() ethdev library functions.
> > Both functions have an identical signature, a port_id, a queue_id, and
> > a config object. The port_id and the queue_id are used to locate the
> > device and the queue. The config object is of type struct
> > rte_eth_stashing_config, which specifies the lcore_id and the
> > cache_level, indicating where objects from this queue should be stashed.
> > The 'objects' field in the config sets the types of objects the
> > application wishes to stash based on the capabilities found earlier.
> > Note that if the 'objects' field includes the flag
> > RTE_ETH_DEV_STASH_OBJECT_OFFSET, the 'offset' field must be used to
> > set the desired offset. These functions invoke PMD implementations of
> > the stashing functionality via the stashing_{rx,tx}_hints_set function
> > callbacks in the eth_dev_ops, respectively.
> >
> > The PMD's implementation of the stashing_rx_hints_set() and
> > stashing_tx_hints_set() functions is ultimately responsible for
> > extracting the ST via the API provided by the PCI bus driver. Before
> > extracting STs, the PMD should enable the TPH capability in the
> > endpoint device by calling the rte_pci_tph_enable() function.  The
> > application begins the ST extraction process by calling the
> > rte_pci_tph_st_get() function in drivers/bus/pci/rte_bus_pci.h, which
> > returns STs via the same rte_tph_info objects array passed into it as
> > an argument.  Once PMD acquires ST, the stashing_{rx,tx}_hints_set
> > callbacks implemented in the PMD are ready to set the ST as per the
> > rte_eth_stashing_config object passed to them by the higher-level
> > ethdev functions ret_eth_dev_stashing_{rx,tx}_hints(). As per the PCIe
> > specification, STs can be placed on the MSI-X tables or in a
> > device-specific location. For PMDs, setting the STs on queue contexts
> > is the only viable way of using TPH. Therefore, the PMDs should only enable TPH
> in device-specific mode.
> >
> > V4->V5:
> >  * Enable stashing-hints (TPH) in Intel i40e driver.
> >  * Update exported symbol version from 25.03 to 25.07.
> >  * Add TPH mode macros.
> >
> > V3->V4:
> >  * Add VFIO IOCTL based ST extraction mechanism to Linux PCI bus
> > driver
> >  * Remove ST extraction via direct access to ACPI _DSM
> >  * Replace rte_pci_extract_tph_st() with rte_pci_tph_st_get() in PCI
> >    bus driver.
> >
> > Wathsala Vithanage (4):
> >   pci: add non-merged Linux uAPI changes
> >   bus/pci: introduce the PCIe TLP Processing Hints API
> >   ethdev: introduce the cache stashing hints API
> >   net/i40e: enable TPH in i40e
> >
> >  drivers/bus/pci/bsd/pci.c            |  43 +++++++
> >  drivers/bus/pci/bus_pci_driver.h     |  52 ++++++++
> >  drivers/bus/pci/linux/pci.c          | 100 ++++++++++++++++
> >  drivers/bus/pci/linux/pci_init.h     |  14 +++
> >  drivers/bus/pci/linux/pci_vfio.c     | 170 +++++++++++++++++++++++++++
> >  drivers/bus/pci/private.h            |   8 ++
> >  drivers/bus/pci/rte_bus_pci.h        |  67 +++++++++++
> >  drivers/bus/pci/windows/pci.c        |  43 +++++++
> >  drivers/net/intel/i40e/i40e_ethdev.c | 127 ++++++++++++++++++++
> >  kernel/linux/uapi/linux/vfio_tph.h   | 102 ++++++++++++++++
> >  lib/ethdev/ethdev_driver.h           |  66 +++++++++++
> >  lib/ethdev/rte_ethdev.c              | 149 +++++++++++++++++++++++
> >  lib/ethdev/rte_ethdev.h              | 158 +++++++++++++++++++++++++
> >  lib/pci/rte_pci.h                    |  15 +++
> >  14 files changed, 1114 insertions(+)
> >  create mode 100644 kernel/linux/uapi/linux/vfio_tph.h
> >
> 
> How will this impact existing applications that never use the API?
> It is crucial that existing 3rd party applications, just work without modifications.
> We don't want to hear from Network Virtual Appliance vendors that there is a
> performance regression in DPDK. They are already reluctant to keep up with
> DPDK versions.
> 
> I.e if the application does nothing caching must be enabled.

It won't affect such applications.

--wathsala


^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH v5 2/4] bus/pci: introduce the PCIe TLP Processing Hints API
  2025-06-04 16:54     ` Bruce Richardson
@ 2025-06-04 22:52       ` Wathsala Wathawana Vithanage
  2025-06-05  7:50         ` Bruce Richardson
  2025-06-05 10:18         ` Bruce Richardson
  0 siblings, 2 replies; 33+ messages in thread
From: Wathsala Wathawana Vithanage @ 2025-06-04 22:52 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: Chenbo Xia, Nipun Gupta, Anatoly Burakov, Gaetan Rivet,
	dev@dpdk.org, nd, Honnappa Nagarahalli, Dhruv Tripathi, nd

> > rte_pci_tph_st_{get, set} functions will return an error if processing
> > any of the rte_tph_info objects fails. The API does not indicate which
> > entry in the rte_tph_info array was executed successfully and which
> > caused an error. Therefore, in case of an error, the caller should
> > discard the output. If rte_pci_tph_set returns an error, it should be
> > treated as a partial error. Hence, the steering-tag update on the
> > device should be considered partial and inconsistent with the expected
> outcome.
> > This should be resolved by resetting the endpoint device before
> > further attempts to set steering tags.
> 
> This seems very clunky for the user. Is there a fundamental reason why we cannot
> report out what ones passed or failed?
> 
> If it's a limitation of the kernel IOCTL, how about just making one ioctl for each
> individual op requested, one at a time. That way we will know what failed to
> report it?
> 

The V1 of the kernel patch had that feature, but it was frowned upon, and I was
asked to implement the IOCTL this way. Please find it here (V1)
https://lore.kernel.org/kvm/20250221224638.1836909-1-wathsala.vithanage@arm.com/T/#me73cf9b9c87da97d7d9461dfb97863b78ca1755b

> Other comments inline below.
> 

I will address them in the next version.

Thanks.

--wathsala

> /Bruce
> 
> >
> > Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
> > ---
> >  drivers/bus/pci/bsd/pci.c        |  43 ++++++++
> >  drivers/bus/pci/bus_pci_driver.h |  52 ++++++++++
> >  drivers/bus/pci/linux/pci.c      | 100 ++++++++++++++++++
> >  drivers/bus/pci/linux/pci_init.h |  13 +++
> > drivers/bus/pci/linux/pci_vfio.c | 170 +++++++++++++++++++++++++++++++
> >  drivers/bus/pci/private.h        |   8 ++
> >  drivers/bus/pci/rte_bus_pci.h    |  67 ++++++++++++
> >  drivers/bus/pci/windows/pci.c    |  43 ++++++++
> >  lib/pci/rte_pci.h                |  15 +++
> >  9 files changed, 511 insertions(+)
> >
> > diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
> > index 5e2e09d5a4..dff750c4d6 100644
> > --- a/drivers/bus/pci/bsd/pci.c
> > +++ b/drivers/bus/pci/bsd/pci.c
> 
> <snip>
> 
> > diff --git a/drivers/bus/pci/bus_pci_driver.h
> > b/drivers/bus/pci/bus_pci_driver.h
> > index 2cc1119072..b1c2829fc1 100644
> > --- a/drivers/bus/pci/bus_pci_driver.h
> > +++ b/drivers/bus/pci/bus_pci_driver.h
> > @@ -46,6 +46,7 @@ struct rte_pci_device {
> >  	char *bus_info;                     /**< PCI bus specific info */
> >  	struct rte_intr_handle *vfio_req_intr_handle;
> >  				/**< Handler of VFIO request interrupt */
> > +	uint8_t tph_enabled;                /**< TPH enabled on this device */
> 
> question: what would happen if we always enabled tph for each device. Does
> doing so disable the default handling for the device?
> 
> >  };
> >
> >  /**
> > @@ -194,6 +195,57 @@ struct rte_pci_ioport {
> >  	uint64_t len; /* only filled for memory mapped ports */  };
> >
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this structure may change, or be removed, without
> > +prior
> > + * notice
> > + *
> > + * This structure is passed into the TPH Steering-Tag set or get
> > +function as an
> > + * argument by the caller. Return values are set in the same
> > +structure in st and
> > + * ph_ignore fields by the calee.
> > + *
> > + * Refer to PCI-SIG ECN "Revised _DSM for Cache Locality TPH
> > +Features" for
> > + * details.
> > + */
> > +struct rte_tph_info {
> > +	/* Input */
> > +	uint32_t cpu_id;	/*Logical CPU id*/
> > +	uint32_t cache_level;	/*Cache level relative to CPU. l1d=0,l2d=1,...*/
> > +	uint8_t flags;		/*Memory type, procesisng hint etc.*/
> > +	uint16_t index;		/*Index in vector table to store the ST*/
> > +
> 
> These fields should be reordered in order of size to avoid unnecessary gaps.
> 
> For the flags field, I dislike having different sets of flags all multiplexed into a
> single field. Can we instead of the flags field, and the set of #defines below split
> these out into separate enums, and then have separate fields for each one.
> 
> For example:
> 	struct rte_tph_info {
> 		uint32_t cpu_id;
> 		uint32_t cache_level;
> 		enum rte_tph_mem_type mem_type;
> 		enum rte_tph_hint hints;
> 		enum rte_tph_mode mode;
> 		...
> 	}
> 
> While the structure takes more space this way, this is not a datapath structure
> that we should be seeing large arrays of it, or that needs to be processed quickly,
> so usability should be prioritized over size/compactness.
> 
> 

+1

> > +	/* Output */
> > +	uint16_t st;		/*Steering tag returned by the platform*/
> > +	uint8_t ph_ignore;	/*Platform ignores PH for the returned ST*/
> > +};
> > +
> > +#define RTE_PCI_TPH_MEM_TYPE_MASK		0x1
> > +#define RTE_PCI_TPH_MEM_TYPE_SHIFT		0
> > +/** Request volatile memory ST */
> > +#define RTE_PCI_TPH_MEM_TYPE_VMEM		0
> > +/** Request persistent memory ST */
> > +#define RTE_PCI_TPH_MEM_TYPE_PMEM		1
> > +
> > +/** TLP Processing Hints - PCIe 6.0 specification section 2.2.7.1.1 */
> > +#define RTE_PCI_TPH_HINT_MASK		0x3
> 
> Looking at the mask usage below, does this mask not need to also be shifted by
> the TPH_HINT_SHIFT? Otherwise it overlaps with the type mask.
> 
> > +#define RTE_PCI_TPH_HINT_SHIFT		1
> > +/** Host and device access data equally */
> > +#define RTE_PCI_TPH_HINT_BIDIR		0
> > +/** Device accesses data more frequently */
> > +#define RTE_PCI_TPH_HINT_REQSTR		(1 <<
> RTE_PCI_TPH_HINT_SHIFT)
> > +/** Host access data more frequently */
> > +#define RTE_PCI_TPH_HINT_TARGET		(2 <<
> RTE_PCI_TPH_HINT_SHIFT)
> > +/** Host access data more frequently with a high temporal locality */
> > +#define RTE_PCI_TPH_HINT_TARGET_PRIO	(3 <<
> RTE_PCI_TPH_HINT_SHIFT)
> > +
> > +#define RTE_PCI_TPH_ST_MODE_MASK   0x3
> > +/** TPH no ST mode */
> > +#define RTE_PCI_TPH_ST_NS_MODE	   0
> > +/** TPH interrupt vector mode */
> > +#define RTE_PCI_TPH_ST_IV_MODE	   1
> > +/** TPH device specific mode */
> > +#define RTE_PCI_TPH_ST_DS_MODE	   2
> > +
> 
> As above, I think these would be nicer defined in different enums, going to
> separate fields in the struct. That would also remove any ambiguity as to whether
> the masks include the shift or not.
> 

+1

> >  #ifdef __cplusplus
> >  }
> >  #endif
> > diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
> > index c20d159218..b5a8ba0a86 100644
> > --- a/drivers/bus/pci/linux/pci.c
> > +++ b/drivers/bus/pci/linux/pci.c
> > @@ -814,3 +814,103 @@ rte_pci_ioport_unmap(struct rte_pci_ioport *p)
> >
> >  	return ret;
> >  }
> > +
> > +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_enable, 25.07) int
> > +rte_pci_tph_enable(struct rte_pci_device *dev, int mode) {
> > +	int ret = 0;
> > +
> 
> Should check here if dev->tph_enabled is already true.
> 

+1

> > +	switch (dev->kdrv) {
> > +#ifdef VFIO_PRESENT
> > +	case RTE_PCI_KDRV_VFIO:
> > +		if (pci_vfio_is_enabled())
> > +			ret = pci_vfio_tph_enable(dev, mode);
> > +		break;
> > +#endif
> > +	case RTE_PCI_KDRV_IGB_UIO:
> > +	case RTE_PCI_KDRV_UIO_GENERIC:
> > +	default:
> > +		ret = -ENOTSUP;
> > +		break;
> > +	}
> > +
> > +	if (!ret)
> 
> Prefer "ret == 0" for this comparison.
> 
> > +		dev->tph_enabled = 1;
> > +
> > +	return ret;
> > +}
> > +
> 
> Function could probably be shortened to something like (including a check for
> already enabled, 2 lines shorter if we rely on checks in the
> vfio_tph_enable() call):
> 
> int
> rte_pci_tph_enable(...)
> {
> #ifdef VFIO_PRESENT
> 	if (dev->kdrv == RTE_PCI_KDRV_VFIO && pci_vfio_is_enabled()) {
> 		if (dev->tph_enabled == 0) {
> 			int ret = pci_vfio_tph_enable(...);
> 			if (ret != 0)
> 				return ret;
> 			dev->tph_enabled = 1;
> 		}
> 		return 0;
> 	}
> #endif
> 	return -ENOTSUP
> }
> 
> 
> > +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_disable, 25.07) int
> > +rte_pci_tph_disable(struct rte_pci_device *dev) {
> > +	int ret = 0;
> > +
> > +	switch (dev->kdrv) {
> > +#ifdef VFIO_PRESENT
> > +	case RTE_PCI_KDRV_VFIO:
> > +		if (pci_vfio_is_enabled())
> > +			ret = pci_vfio_tph_disable(dev);
> > +		break;
> > +#endif
> > +	case RTE_PCI_KDRV_IGB_UIO:
> > +	case RTE_PCI_KDRV_UIO_GENERIC:
> > +	default:
> > +		ret = -ENOTSUP;
> > +		break;
> > +	}
> > +
> > +	if (!ret)
> > +		dev->tph_enabled = 0;
> > +
> > +	return ret;
> > +}
> 
> As above, we can shorten this function by replacing the switch with a straight
> check for kdrv == RTE_PCI_KDRV_VFIO. Same with functions below too.
> 
 
+1

> > +
> > +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_st_get, 25.07) int
> > +rte_pci_tph_st_get(const struct rte_pci_device *dev,
> > +		   struct rte_tph_info *info, size_t count) {
> > +	int ret = 0;
> > +
> > +	switch (dev->kdrv) {
> > +#ifdef VFIO_PRESENT
> > +	case RTE_PCI_KDRV_VFIO:
> > +		if (pci_vfio_is_enabled())
> > +			ret = pci_vfio_tph_st_get(dev, info, count);
> > +		break;
> > +#endif
> > +	case RTE_PCI_KDRV_IGB_UIO:
> > +	case RTE_PCI_KDRV_UIO_GENERIC:
> > +	default:
> > +		ret = -ENOTSUP;
> > +		break;
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_st_set, 25.07) int
> > +rte_pci_tph_st_set(const struct rte_pci_device *dev,
> > +		   struct rte_tph_info *info, size_t count) {
> > +	int ret = 0;
> > +
> > +	switch (dev->kdrv) {
> > +#ifdef VFIO_PRESENT
> > +	case RTE_PCI_KDRV_VFIO:
> > +		if (pci_vfio_is_enabled())
> > +			ret = pci_vfio_tph_st_set(dev, info, count);
> > +		break;
> > +#endif
> > +	case RTE_PCI_KDRV_IGB_UIO:
> > +	case RTE_PCI_KDRV_UIO_GENERIC:
> > +	default:
> > +		ret = -ENOTSUP;
> > +		break;
> > +	}
> > +
> > +	return ret;
> > +}
> > diff --git a/drivers/bus/pci/linux/pci_init.h
> > b/drivers/bus/pci/linux/pci_init.h
> > index 25b901f460..e71bfd2dce 100644
> > --- a/drivers/bus/pci/linux/pci_init.h
> > +++ b/drivers/bus/pci/linux/pci_init.h
> > @@ -5,6 +5,7 @@
> >  #ifndef EAL_PCI_INIT_H_
> >  #define EAL_PCI_INIT_H_
> >
> > +#include <rte_compat.h>
> >  #include <rte_vfio.h>
> >  #include <uapi/linux/vfio_tph.h>
> >
> > @@ -76,6 +77,18 @@ int pci_vfio_ioport_unmap(struct rte_pci_ioport
> > *p);  int pci_vfio_map_resource(struct rte_pci_device *dev);  int
> > pci_vfio_unmap_resource(struct rte_pci_device *dev);
> >
> > +/* TLP Processing Hints control functions */ __rte_experimental int
> > +pci_vfio_tph_enable(const struct rte_pci_device *dev, int mode);
> > +__rte_experimental int pci_vfio_tph_disable(const struct
> > +rte_pci_device *dev); __rte_experimental int
> > +pci_vfio_tph_st_get(const struct rte_pci_device *dev,
> > +			struct rte_tph_info *info, size_t ent_count);
> __rte_experimental
> > +int pci_vfio_tph_st_set(const struct rte_pci_device *dev,
> > +			struct rte_tph_info *info, size_t ent_count);
> > +
> >  int pci_vfio_is_enabled(void);
> >
> >  #endif
> > diff --git a/drivers/bus/pci/linux/pci_vfio.c
> > b/drivers/bus/pci/linux/pci_vfio.c
> > index 5317170231..bdbeb38658 100644
> > --- a/drivers/bus/pci/linux/pci_vfio.c
> > +++ b/drivers/bus/pci/linux/pci_vfio.c
> > @@ -12,6 +12,7 @@
> >  #include <stdbool.h>
> >
> >  #include <rte_log.h>
> > +#include <eal_export.h>
> >  #include <rte_pci.h>
> >  #include <rte_bus_pci.h>
> >  #include <rte_eal_paging.h>
> > @@ -1316,6 +1317,175 @@ pci_vfio_mmio_write(const struct rte_pci_device
> *dev, int bar,
> >  	return pwrite(fd, buf, len, offset + offs);  }
> >
> > +static int
> > +pci_vfio_tph_ioctl(const struct rte_pci_device *dev, struct
> > +vfio_pci_tph *pci_tph) {
> > +	const struct rte_intr_handle *intr_handle = dev->intr_handle;
> > +	int vfio_dev_fd = 0, ret = 0;
> > +
> > +	vfio_dev_fd = rte_intr_dev_fd_get(intr_handle);
> > +	if (vfio_dev_fd < 0) {
> > +		ret = -EINVAL;
> > +		goto out;
> > +	}
> > +
> > +	ret = ioctl(vfio_dev_fd, VFIO_DEVICE_PCI_TPH, pci_tph);
> > +out:
> > +	return ret;
> > +}
> > +
> > +static int
> > +pci_vfio_tph_st_op(const struct rte_pci_device *dev,
> > +		    struct rte_tph_info *info, size_t count,
> > +		    enum rte_pci_st_op op)
> > +{
> > +	int ret = 0;
> > +	size_t argsz = 0, i;
> > +	struct vfio_pci_tph *pci_tph = NULL;
> > +	uint8_t mem_type = 0, hint = 0;
> > +
> > +	if (!count) {
> > +		ret = -EINVAL;
> > +		goto out;
> > +	}
> > +
> > +	argsz = sizeof(struct vfio_pci_tph) +
> > +		count * sizeof(struct vfio_pci_tph_entry);
> > +
> > +	pci_tph = rte_zmalloc(NULL, argsz, 0);
> 
> For ioctl we should not need pinned memory. Use regular malloc here.
> 

+1

> > +	if (!pci_tph) {
> 
> Coding style guidelines say to compare pointers explicitly to NULL.
> 

+1

> > +		ret = -ENOMEM;
> > +		goto out;
> > +	}
> > +
> > +	pci_tph->argsz = argsz;
> > +	pci_tph->count = count;
> > +
> > +	switch (op) {
> > +	case RTE_PCI_TPH_ST_GET:
> > +		pci_tph->flags = VFIO_DEVICE_TPH_GET_ST;
> > +		break;
> > +	case RTE_PCI_TPH_ST_SET:
> > +		pci_tph->flags = VFIO_DEVICE_TPH_SET_ST;
> > +		break;
> > +	default:
> > +		ret = -EINVAL;
> > +		goto out;
> > +	}
> > +
> > +	for (i = 0; i < count; i++) {
> > +		pci_tph->ents[i].cpu_id = info[i].cpu_id;
> > +		pci_tph->ents[i].cache_level = info[i].cache_level;
> > +
> > +		mem_type = info[i].flags & RTE_PCI_TPH_MEM_TYPE_MASK;
> > +		switch (mem_type) {
> > +		case RTE_PCI_TPH_MEM_TYPE_VMEM:
> > +			pci_tph->ents[i].flags |= VFIO_TPH_MEM_TYPE_VMEM;
> > +			break;
> > +		case RTE_PCI_TPH_MEM_TYPE_PMEM:
> > +			pci_tph->ents[i].flags |= VFIO_TPH_MEM_TYPE_PMEM;
> > +			break;
> > +		default:
> > +			ret = -EINVAL;
> > +			goto out;
> > +		}
> > +
> > +		hint = info[i].flags & RTE_PCI_TPH_HINT_MASK;
> 
> As pointed out above, unshifted, this HINT_MASK overlaps with the TYPE_MASK.
> 
> > +		switch (hint) {
> > +		case RTE_PCI_TPH_HINT_BIDIR:
> > +			pci_tph->ents[i].flags |= VFIO_TPH_HINT_BIDIR;
> > +			break;
> > +		case RTE_PCI_TPH_HINT_REQSTR:
> > +			pci_tph->ents[i].flags |= VFIO_TPH_HINT_REQSTR;
> > +			break;
> > +		case RTE_PCI_TPH_HINT_TARGET:
> > +			pci_tph->ents[i].flags |= VFIO_TPH_HINT_TARGET;
> > +			break;
> > +		case RTE_PCI_TPH_HINT_TARGET_PRIO:
> > +			pci_tph->ents[i].flags |=
> VFIO_TPH_HINT_TARGET_PRIO;
> > +			break;
> > +		default:
> > +			ret = -EINVAL;
> > +			goto out;
> > +		}
> > +
> > +		if (op == RTE_PCI_TPH_ST_SET)
> > +			pci_tph->ents[i].index = info[i].index;
> > +	}
> > +
> > +	ret = pci_vfio_tph_ioctl(dev, pci_tph);
> > +	if (ret)
> 
> Again, check explicitly for "== 0".
> 

+1

> > +		goto out;
> > +
> > +	/*
> > +	 * Kernel returns steering-tag and ph-ignore bits for
> > +	 * RTE_PCI_TPH_ST_SET too, therefore copy output for
> > +	 * both RTE_PCI_TPH_ST_SET and RTE_PCI_TPH_ST_GET
> > +	 * cases.
> > +	 */
> > +	for (i = 0; i < count; i++) {
> > +		info[i].st = pci_tph->ents[i].st;
> > +		info[i].ph_ignore = pci_tph->ents[i].ph_ignore;
> > +	}
> > +
> > +out:
> > +	if (pci_tph)
> > +		rte_free(pci_tph);
> 
> Free functions work fine with null pointers, so just call free without a null check.
> 

+1

> > +	return ret;
> > +}
> > +
> > +RTE_EXPORT_EXPERIMENTAL_SYMBOL(pci_vfio_tph_enable, 25.07) int
> > +pci_vfio_tph_enable(const struct rte_pci_device *dev, int mode) {
> > +	int ret;
> > +
> > +	if (!(mode ^ (mode & VFIO_TPH_ST_MODE_MASK))) {
> 
> So it's an error to twice set the mode to the same thing? should it not just be a
> no-op?
> 

This doesn't make sense; I will fix this in next version.

> > +		ret = -EINVAL;
> > +		goto out;
> > +	} else
> > +		mode &= VFIO_TPH_ST_MODE_MASK;
> > +
> > +	struct vfio_pci_tph pci_tph = {
> > +		.argsz = sizeof(struct vfio_pci_tph),
> > +		.flags = VFIO_DEVICE_TPH_ENABLE | mode,
> > +		.count = 0
> > +	};
> > +
> > +	ret = pci_vfio_tph_ioctl(dev, &pci_tph);
> > +out:
> > +	return ret;
> > +}
> > +
> > +RTE_EXPORT_EXPERIMENTAL_SYMBOL(pci_vfio_tph_disable, 25.07) int
> > +pci_vfio_tph_disable(const struct rte_pci_device *dev) {
> 
> Check here, or in caller to see if it's already enabled?
> 

This check should happen in rte_pci_tph_disable() rather than here which is
specific for Linux. I will do the check where you pointed it out earlier.

> > +	struct vfio_pci_tph pci_tph = {
> > +		.argsz = sizeof(struct vfio_pci_tph),
> > +		.flags = VFIO_DEVICE_TPH_DISABLE,
> > +		.count = 0
> > +	};
> > +
> > +	return pci_vfio_tph_ioctl(dev, &pci_tph); }
> > +
> > +RTE_EXPORT_EXPERIMENTAL_SYMBOL(pci_vfio_tph_st_get, 25.07) int
> > +pci_vfio_tph_st_get(const struct rte_pci_device *dev,
> > +		    struct rte_tph_info *info, size_t count) {
> > +	return pci_vfio_tph_st_op(dev, info, count, RTE_PCI_TPH_ST_GET); }
> > +
> > +RTE_EXPORT_EXPERIMENTAL_SYMBOL(pci_vfio_tph_st_set, 25.07) int
> > +pci_vfio_tph_st_set(const struct rte_pci_device *dev,
> > +		    struct rte_tph_info *info, size_t count) {
> > +	return pci_vfio_tph_st_op(dev, info, count, RTE_PCI_TPH_ST_SET); }
> > +
> >  int
> >  pci_vfio_is_enabled(void)
> >  {
> > diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
> > index 38109844b9..d2ec370320 100644
> > --- a/drivers/bus/pci/private.h
> > +++ b/drivers/bus/pci/private.h
> > @@ -335,4 +335,12 @@ rte_pci_dev_iterate(const void *start,  int
> > rte_pci_devargs_parse(struct rte_devargs *da);
> >
> > +/*
> > + * TPH Steering-Tag operation types.
> > + */
> > +enum rte_pci_st_op {
> > +	RTE_PCI_TPH_ST_SET, /* Set TPH Steering - Tags */
> > +	RTE_PCI_TPH_ST_GET  /* Get TPH Steering - Tags */ };
> > +
> >  #endif /* _PCI_PRIVATE_H_ */
> > diff --git a/drivers/bus/pci/rte_bus_pci.h
> > b/drivers/bus/pci/rte_bus_pci.h index 19a7b15b99..e4d4780f54 100644
> > --- a/drivers/bus/pci/rte_bus_pci.h
> > +++ b/drivers/bus/pci/rte_bus_pci.h
> > @@ -31,6 +31,7 @@ extern "C" {
> >  struct rte_pci_device;
> >  struct rte_pci_driver;
> >  struct rte_pci_ioport;
> > +struct rte_tph_info;
> >
> >  struct rte_devargs;
> >
> > @@ -312,6 +313,72 @@ void rte_pci_ioport_read(struct rte_pci_ioport
> > *p,  void rte_pci_ioport_write(struct rte_pci_ioport *p,
> >  		const void *data, size_t len, off_t offset);
> >
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Enable TLP Processing Hints (TPH) in the endpoint device.
> > + *
> > + * @param dev
> > + *   A pointer to a rte_pci_device structure describing the device
> > + *   to use.
> > + * @param mode
> > + *   TPH mode the device must operate in.
> > + */
> > +__rte_experimental
> > +int rte_pci_tph_enable(struct rte_pci_device *dev, int mode);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Disable TLP Processing Hints (TPH) in the endpoint device.
> > + *
> > + * @param dev
> > + *   A pointer to a rte_pci_device structure describing the device
> > + *   to use.
> > + */
> > +__rte_experimental
> > +int rte_pci_tph_disable(struct rte_pci_device *dev);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Get PCI Steering-Tags (STs) for a list of stashing targets.
> > + *
> > + * @param mode
> > + *   TPH mode the device must operate in.
> > + * @param info
> > + *   An array of rte_tph_info objects, each describing the target
> > + *   cpu-id, cache-level, etc. Steering-tags for each target is
> > + *   eturned via info array.
> > + * @param count
> > + *   The number of elements in the info array.
> > + */
> > +__rte_experimental
> > +int rte_pci_tph_st_get(const struct rte_pci_device *dev,
> > +		struct rte_tph_info *info, size_t count);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Set PCI Steering-Tags (STs) for a list of stashing targets.
> > + *
> > + * @param mode
> > + *   TPH mode the device must operate in.
> > + * @param info
> > + *   An array of rte_tph_info objects, each describing the target
> > + *   cpu-id, cache-level, etc. Steering-tags for each target is
> > + *   eturned via info array.
> > + * @param count
> > + *   The number of elements in the info array.
> > + */
> > +__rte_experimental
> > +int rte_pci_tph_st_set(const struct rte_pci_device *dev,
> > +		struct rte_tph_info *info, size_t count);
> > +
> >  #ifdef __cplusplus
> >  }
> >  #endif
> > diff --git a/drivers/bus/pci/windows/pci.c
> > b/drivers/bus/pci/windows/pci.c index e7e449306e..218e667a5a 100644
> > --- a/drivers/bus/pci/windows/pci.c
> > +++ b/drivers/bus/pci/windows/pci.c
> > @@ -511,3 +511,46 @@ rte_pci_scan(void)
> >
> >  	return ret;
> >  }
> > +
> > +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_enable, 25.07) int
> > +rte_pci_tph_enable(struct rte_pci_device *dev, int mode) {
> > +	RTE_SET_USED(dev);
> > +	RTE_SET_USED(mode);
> > +	/* This feature is not yet implemented for windows */
> > +	return -1;
> > +}
> > +
> > +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_disable, 25.07) int
> > +rte_pci_tph_disable(struct rte_pci_device *dev) {
> > +	RTE_SET_USED(dev);
> > +	/* This feature is not yet implemented for windows */
> > +	return -1;
> > +}
> > +
> > +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_st_get, 25.07) int
> > +rte_pci_tph_st_get(const struct rte_pci_device *dev,
> > +		   struct rte_tph_info *info, size_t count) {
> > +	RTE_SET_USED(dev);
> > +	RTE_SET_USED(info);
> > +	RTE_SET_USED(count);
> > +	/* This feature is not yet implemented for windows */
> > +	return -1;
> > +}
> > +
> > +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_st_set, 25.07) int
> > +rte_pci_tph_st_set(const struct rte_pci_device *dev,
> > +		   struct rte_tph_info *info, size_t count) {
> > +	RTE_SET_USED(dev);
> > +	RTE_SET_USED(info);
> > +	RTE_SET_USED(count);
> > +	/* This feature is not yet implemented for windows */
> > +	return -1;
> > +}
> > diff --git a/lib/pci/rte_pci.h b/lib/pci/rte_pci.h index
> > 9a50a12142..da9cd666bf 100644
> > --- a/lib/pci/rte_pci.h
> > +++ b/lib/pci/rte_pci.h
> > @@ -137,6 +137,21 @@ extern "C" {
> >  /* Process Address Space ID (RTE_PCI_EXT_CAP_ID_PASID) */
> >  #define RTE_PCI_PASID_CTRL		0x06    /* PASID control register */
> >
> > +/* TPH Requester */
> > +#define RTE_PCI_TPH_CAP            4       /* capability register */
> > +#define RTE_PCI_TPH_CAP_ST_NS      0x00000001 /* No ST Mode Supported
> */
> > +#define RTE_PCI_TPH_CAP_ST_IV      0x00000002 /* Interrupt Vector Mode
> Supported */
> > +#define RTE_PCI_TPH_CAP_ST_DS      0x00000004 /* Device Specific Mode
> Supported */
> > +#define RTE_PCI_TPH_CAP_EXT_TPH    0x00000100 /* Ext TPH Requester
> Supported */
> > +#define RTE_PCI_TPH_CAP_LOC_MASK   0x00000600 /* ST Table Location */
> > +#define RTE_PCI_TPH_LOC_NONE       0x00000000 /* Not present */
> > +#define RTE_PCI_TPH_LOC_CAP        0x00000200 /* In capability */
> > +#define RTE_PCI_TPH_LOC_MSIX       0x00000400 /* In MSI-X */
> > +#define RTE_PCI_TPH_CAP_ST_MASK    0x07FF0000 /* ST Table Size */
> > +#define RTE_PCI_TPH_CAP_ST_SHIFT   16      /* ST Table Size shift */
> > +#define RTE_PCI_TPH_BASE_SIZEOF    0xc     /* Size with no ST table */
> > +
> > +
> 
> Where are all these values used? They don't seem to be needed by this patch. If
> needed in later patches, I'd suggest adding them there.
> 

RTE_PCI_TPH_CAP_ST_NS, RTE_PCI_TPH_CAP_ST_IV and RTE_PCI_TPH_CAP_ST_DS
are used by drivers. I40e patch uses RTE_PCI_TPH_CAP_ST_DS.
I will remove the rest, added here for completeness.

> >  /** Formatting string for PCI device identifier: Ex: 0000:00:01.0 */
> > #define PCI_PRI_FMT "%.4" PRIx32 ":%.2" PRIx8 ":%.2" PRIx8 ".%" PRIx8
> > #define PCI_PRI_STR_SIZE sizeof("XXXXXXXX:XX:XX.X")
> > --
> > 2.43.0
> >

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 2/4] bus/pci: introduce the PCIe TLP Processing Hints API
  2025-06-04 22:52       ` Wathsala Wathawana Vithanage
@ 2025-06-05  7:50         ` Bruce Richardson
  2025-06-05 14:32           ` Wathsala Wathawana Vithanage
  2025-06-05 10:18         ` Bruce Richardson
  1 sibling, 1 reply; 33+ messages in thread
From: Bruce Richardson @ 2025-06-05  7:50 UTC (permalink / raw)
  To: Wathsala Wathawana Vithanage
  Cc: Chenbo Xia, Nipun Gupta, Anatoly Burakov, Gaetan Rivet,
	dev@dpdk.org, nd, Honnappa Nagarahalli, Dhruv Tripathi

On Wed, Jun 04, 2025 at 10:52:24PM +0000, Wathsala Wathawana Vithanage wrote:
> > > rte_pci_tph_st_{get, set} functions will return an error if processing
> > > any of the rte_tph_info objects fails. The API does not indicate which
> > > entry in the rte_tph_info array was executed successfully and which
> > > caused an error. Therefore, in case of an error, the caller should
> > > discard the output. If rte_pci_tph_set returns an error, it should be
> > > treated as a partial error. Hence, the steering-tag update on the
> > > device should be considered partial and inconsistent with the expected
> > outcome.
> > > This should be resolved by resetting the endpoint device before
> > > further attempts to set steering tags.
> > 
> > This seems very clunky for the user. Is there a fundamental reason why we cannot
> > report out what ones passed or failed?
> > 
> > If it's a limitation of the kernel IOCTL, how about just making one ioctl for each
> > individual op requested, one at a time. That way we will know what failed to
> > report it?
> > 
> 
> The V1 of the kernel patch had that feature, but it was frowned upon, and I was
> asked to implement the IOCTL this way. Please find it here (V1)
> https://lore.kernel.org/kvm/20250221224638.1836909-1-wathsala.vithanage@arm.com/T/#me73cf9b9c87da97d7d9461dfb97863b78ca1755b
> 
> > Other comments inline below.
> > 
> 
> I will address them in the next version.
> 
> Thanks.
> 
> --wathsala
> 
> > /Bruce
> > 

<snip>

> > > diff --git a/lib/pci/rte_pci.h b/lib/pci/rte_pci.h index
> > > 9a50a12142..da9cd666bf 100644
> > > --- a/lib/pci/rte_pci.h
> > > +++ b/lib/pci/rte_pci.h
> > > @@ -137,6 +137,21 @@ extern "C" {
> > >  /* Process Address Space ID (RTE_PCI_EXT_CAP_ID_PASID) */
> > >  #define RTE_PCI_PASID_CTRL		0x06    /* PASID control register */
> > >
> > > +/* TPH Requester */
> > > +#define RTE_PCI_TPH_CAP            4       /* capability register */
> > > +#define RTE_PCI_TPH_CAP_ST_NS      0x00000001 /* No ST Mode Supported
> > */
> > > +#define RTE_PCI_TPH_CAP_ST_IV      0x00000002 /* Interrupt Vector Mode
> > Supported */
> > > +#define RTE_PCI_TPH_CAP_ST_DS      0x00000004 /* Device Specific Mode
> > Supported */
> > > +#define RTE_PCI_TPH_CAP_EXT_TPH    0x00000100 /* Ext TPH Requester
> > Supported */
> > > +#define RTE_PCI_TPH_CAP_LOC_MASK   0x00000600 /* ST Table Location */
> > > +#define RTE_PCI_TPH_LOC_NONE       0x00000000 /* Not present */
> > > +#define RTE_PCI_TPH_LOC_CAP        0x00000200 /* In capability */
> > > +#define RTE_PCI_TPH_LOC_MSIX       0x00000400 /* In MSI-X */
> > > +#define RTE_PCI_TPH_CAP_ST_MASK    0x07FF0000 /* ST Table Size */
> > > +#define RTE_PCI_TPH_CAP_ST_SHIFT   16      /* ST Table Size shift */
> > > +#define RTE_PCI_TPH_BASE_SIZEOF    0xc     /* Size with no ST table */
> > > +
> > > +
> > 
> > Where are all these values used? They don't seem to be needed by this patch. If
> > needed in later patches, I'd suggest adding them there.
> > 
> 
> RTE_PCI_TPH_CAP_ST_NS, RTE_PCI_TPH_CAP_ST_IV and RTE_PCI_TPH_CAP_ST_DS
> are used by drivers. I40e patch uses RTE_PCI_TPH_CAP_ST_DS.
> I will remove the rest, added here for completeness.
> 

Having them all for completeness is fine. You can keep this as-is in next
version then.

/Bruce

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 3/4] ethdev: introduce the cache stashing hints API
  2025-06-02 22:38   ` [PATCH v5 3/4] ethdev: introduce the cache stashing hints API Wathsala Vithanage
  2025-06-03  8:43     ` Morten Brørup
@ 2025-06-05 10:03     ` Bruce Richardson
  2025-06-05 14:30       ` Wathsala Wathawana Vithanage
  1 sibling, 1 reply; 33+ messages in thread
From: Bruce Richardson @ 2025-06-05 10:03 UTC (permalink / raw)
  To: Wathsala Vithanage
  Cc: Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, dev, nd,
	Honnappa Nagarahalli, Dhruv Tripathi

On Mon, Jun 02, 2025 at 10:38:03PM +0000, Wathsala Vithanage wrote:
> Extend the ethdev library to enable the stashing of different data
> objects, such as the ones listed below, into CPU caches directly
> from the NIC.
> 
> - Rx/Tx queue descriptors
> - Rx packets
> - Packet headers
> - packet payloads
> - Data of a packet at an offset from the start of the packet
> 
> The APIs are designed in a hardware/vendor agnostic manner such that
> supporting PMDs could use any capabilities available in the underlying
> hardware for fine-grained stashing of data objects into a CPU cache
> 
> The API provides an interface to query the availability of stashing
> capabilities, i.e., platform/NIC support, stashable object types, etc,
> via the rte_eth_dev_stashing_capabilities_get interface.
> 
> The function pair rte_eth_dev_stashing_rx_config_set and
> rte_eth_dev_stashing_tx_config_set sets the stashing hint (the CPU, 
> cache level, and data object types) on the Rx and Tx queues.
> 
> PMDs that support stashing must register their implementations with the
> following eth_dev_ops callbacks, which are invoked by the ethdev
> functions listed above.
> 
> - stashing_capabilities_get
> - stashing_rx_hints_set
> - stashing_tx_hints_set
> 
> Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
> ---

Few small comments inline below

/Bruce

>  lib/ethdev/ethdev_driver.h |  66 ++++++++++++++++
>  lib/ethdev/rte_ethdev.c    | 149 ++++++++++++++++++++++++++++++++++
>  lib/ethdev/rte_ethdev.h    | 158 +++++++++++++++++++++++++++++++++++++
>  3 files changed, 373 insertions(+)
> 
> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
> index 2b4d2ae9c3..8a4012db08 100644
> --- a/lib/ethdev/ethdev_driver.h
> +++ b/lib/ethdev/ethdev_driver.h
> @@ -1376,6 +1376,68 @@ enum rte_eth_dev_operation {
>  typedef uint64_t (*eth_get_restore_flags_t)(struct rte_eth_dev *dev,
>  					    enum rte_eth_dev_operation op);
>  
> +/**
> + * @internal
> + * Set cache stashing hints in Rx queue.
> + *
> + * @param dev
> + *   Port (ethdev) handle.
> + * @param queue_id
> + *   Rx queue.
> + * @param config
> + *   Stashing hints configuration for the queue.
> + *
> + * @return
> + *   -ENOTSUP if the device or the platform does not support cache stashing.
> + *   -ENOSYS  if the underlying PMD hasn't implemented cache stashing feature.
> + *   -EINVAL  on invalid arguments.
> + *   0 on success.
> + */
> +typedef int (*eth_stashing_rx_hints_set_t)(struct rte_eth_dev *dev, uint16_t queue_id,
> +					   struct rte_eth_stashing_config *config);
> +
> +/**
> + * @internal
> + * Set cache stashing hints in Tx queue.
> + *
> + * @param dev
> + *   Port (ethdev) handle.
> + * @param queue_id
> + *   Tx queue.
> + * @param config
> + *   Stashing hints configuration for the queue.
> + *
> + * @return
> + *   -ENOTSUP if the device or the platform does not support cache stashing.
> + *   -ENOSYS  if the underlying PMD hasn't implemented cache stashing feature.
> + *   -EINVAL  on invalid arguments.
> + *   0 on success.

What about on failure of the underlying ioctl call?

> + */
> +typedef int (*eth_stashing_tx_hints_set_t)(struct rte_eth_dev *dev, uint16_t queue_id,
> +					   struct rte_eth_stashing_config *config);
> +
> +/**
> + * @internal
> + * Get cache stashing object types supported in the ethernet device.
> + * The return value indicates availability of stashing hints support
> + * in the hardware and the PMD.
> + *
> + * @param dev
> + *   Port (ethdev) handle.
> + * @param objects
> + *   PMD sets supported bits on return.
> + *
> + * @return
> + *   -ENOTSUP if the device or the platform does not support cache stashing.
> + *   -ENOSYS  if the underlying PMD hasn't implemented cache stashing feature.
> + *   -EINVAL  on NULL values for types or hints parameters.
> + *   On return, types and hints parameters will have bits set for supported
> + *   object types and hints.
> + *   0 on success.
> + */
> +typedef int (*eth_stashing_capabilities_get_t)(struct rte_eth_dev *dev,
> +					     uint16_t *objects);
> +
>  /**
>   * @internal A structure containing the functions exported by an Ethernet driver.
>   */
> @@ -1402,6 +1464,10 @@ struct eth_dev_ops {
>  	eth_mac_addr_remove_t      mac_addr_remove; /**< Remove MAC address */
>  	eth_mac_addr_add_t         mac_addr_add;  /**< Add a MAC address */
>  	eth_mac_addr_set_t         mac_addr_set;  /**< Set a MAC address */
> +	eth_stashing_rx_hints_set_t   stashing_rx_hints_set; /**< Set Rx cache stashing*/
> +	eth_stashing_tx_hints_set_t   stashing_tx_hints_set; /**< Set Tx cache stashing*/
> +	/** Get supported stashing hints*/
> +	eth_stashing_capabilities_get_t stashing_capabilities_get;
>  	/** Set list of multicast addresses */
>  	eth_set_mc_addr_list_t     set_mc_addr_list;
>  	mtu_set_t                  mtu_set;       /**< Set MTU */
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index d4197322a0..ae666c370b 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -158,6 +158,7 @@ static const struct {
>  	{RTE_ETH_DEV_CAPA_RXQ_SHARE, "RXQ_SHARE"},
>  	{RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP, "FLOW_RULE_KEEP"},
>  	{RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP, "FLOW_SHARED_OBJECT_KEEP"},
> +	{RTE_ETH_DEV_CAPA_CACHE_STASHING, "CACHE_STASHING"},
>  };
>  
>  enum {
> @@ -7419,5 +7420,153 @@ int rte_eth_dev_map_aggr_tx_affinity(uint16_t port_id, uint16_t tx_queue_id,
>  	return ret;
>  }
>  
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_dev_validate_stashing_config, 25.07)
> +int
> +rte_eth_dev_validate_stashing_config(uint16_t port_id, uint16_t queue_id,
> +				     uint8_t queue_direction,
> +				     struct rte_eth_stashing_config *config)
> +{
> +	struct rte_eth_dev *dev;
> +	struct rte_eth_dev_info dev_info;
> +	int ret = 0;
> +	uint16_t nb_queues;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> +
> +	if (!config) {
> +		RTE_ETHDEV_LOG_LINE(ERR, "Invalid stashing configuration");
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	/*
> +	 * Check for invalid objects
> +	 */
> +	if (!RTE_ETH_DEV_STASH_OBJECTS_VALID(config->objects)) {
> +		RTE_ETHDEV_LOG_LINE(ERR, "Invalid stashing objects");
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	dev = &rte_eth_devices[port_id];
> +
> +	nb_queues = (queue_direction == RTE_ETH_DEV_RX_QUEUE) ?
> +				      dev->data->nb_rx_queues :
> +				      dev->data->nb_tx_queues;
> +
> +	if (queue_id >= nb_queues) {
> +		RTE_ETHDEV_LOG_LINE(ERR, "Invalid Rx queue_id=%u", queue_id);
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	ret = rte_eth_dev_info_get(port_id, &dev_info);
> +	if (ret < 0)
> +		goto out;
> +
> +	if ((dev_info.dev_capa & RTE_ETH_DEV_CAPA_CACHE_STASHING) !=
> +	    RTE_ETH_DEV_CAPA_CACHE_STASHING) {

Nit: check if all this can fit on one line under 100 chars. I think it can.

> +		ret = -ENOTSUP;
> +		goto out;
> +	}
> +
> +	if (*dev->dev_ops->stashing_rx_hints_set == NULL ||
> +	    *dev->dev_ops->stashing_tx_hints_set == NULL) {
> +		RTE_ETHDEV_LOG_LINE(ERR, "Stashing hints are not implemented "
> +				    "in %s for %s", dev_info.driver_name,

Don't split error messages across lines. Text strings are allowed to go
over the 100 char limit if necessary to avoid splitting.

> +				    dev_info.device->name);
> +		ret = -ENOSYS;
> +	}
> +
> +out:
> +	return ret;
> +}
> +
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_dev_stashing_rx_config_set, 25.07)
> +int
> +rte_eth_dev_stashing_rx_config_set(uint16_t port_id, uint16_t queue_id,
> +				   struct rte_eth_stashing_config *config)
> +{
> +	struct rte_eth_dev *dev;
> +	int ret = 0;
> +
> +	ret = rte_eth_dev_validate_stashing_config(port_id, queue_id,
> +						   RTE_ETH_DEV_RX_QUEUE,
> +						   config);
> +	if (ret < 0)
> +		goto out;
> +
> +	dev = &rte_eth_devices[port_id];
> +
> +	ret = eth_err(port_id,
> +		      (*dev->dev_ops->stashing_rx_hints_set)(dev, queue_id,
> +		      config));
> +out:
> +	return ret;
> +}
> +
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_dev_stashing_tx_config_set, 25.07)
> +int
> +rte_eth_dev_stashing_tx_config_set(uint16_t port_id, uint16_t queue_id,
> +				   struct rte_eth_stashing_config *config)
> +{
> +	struct rte_eth_dev *dev;
> +	int ret = 0;
> +
> +	ret = rte_eth_dev_validate_stashing_config(port_id, queue_id,
> +						   RTE_ETH_DEV_TX_QUEUE,
> +						   config);
> +	if (ret < 0)
> +		goto out;
> +
> +	dev = &rte_eth_devices[port_id];
> +
> +	ret = eth_err(port_id,
> +		      (*dev->dev_ops->stashing_rx_hints_set) (dev, queue_id,
> +		       config));
> +out:
> +	return ret;
> +}
> +
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_dev_stashing_capabilities_get, 25.07)
> +int
> +rte_eth_dev_stashing_capabilities_get(uint16_t port_id, uint16_t *objects)
> +{
> +	struct rte_eth_dev *dev;
> +	struct rte_eth_dev_info dev_info;
> +	int ret = 0;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> +
> +	if (!objects) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	dev = &rte_eth_devices[port_id];
> +
> +	ret = rte_eth_dev_info_get(port_id, &dev_info);
> +	if (ret < 0)
> +		goto out;
> +
> +	if ((dev_info.dev_capa & RTE_ETH_DEV_CAPA_CACHE_STASHING) !=
> +	    RTE_ETH_DEV_CAPA_CACHE_STASHING) {
> +		ret = -ENOTSUP;
> +		goto out;
> +	}
> +
> +	if (*dev->dev_ops->stashing_capabilities_get == NULL) {
> +		RTE_ETHDEV_LOG_LINE(ERR, "Stashing hints are not implemented "
> +				    "in %s for %s", dev_info.driver_name,
> +				    dev_info.device->name);
> +		ret = -ENOSYS;
> +		goto out;
> +	}
> +	ret = eth_err(port_id,
> +		      (*dev->dev_ops->stashing_capabilities_get)(dev, objects));
> +out:
> +	return ret;
> +}
> +
>  RTE_EXPORT_SYMBOL(rte_eth_dev_logtype)
>  RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index ea7f8c4a1a..1398f8c837 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -1667,6 +1667,9 @@ struct rte_eth_conf {
>  #define RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP RTE_BIT64(4)
>  /**@}*/
>  
> +/** Device supports stashing to CPU/system caches. */
> +#define RTE_ETH_DEV_CAPA_CACHE_STASHING RTE_BIT64(5)
> +
>  /*
>   * Fallback default preferred Rx/Tx port parameters.
>   * These are used if an application requests default parameters
> @@ -1838,6 +1841,7 @@ struct rte_eth_dev_info {
>  	struct rte_eth_dev_portconf default_txportconf;
>  	/** Generic device capabilities (RTE_ETH_DEV_CAPA_). */
>  	uint64_t dev_capa;
> +	uint16_t stashing_capa;
>  	/**
>  	 * Switching information for ports on a device with a
>  	 * embedded managed interconnect/switch.
> @@ -6173,6 +6177,160 @@ int rte_eth_cman_config_set(uint16_t port_id, const struct rte_eth_cman_config *
>  __rte_experimental
>  int rte_eth_cman_config_get(uint16_t port_id, struct rte_eth_cman_config *config);
>  
> +
> +/** Queue type is RX. */
> +#define RTE_ETH_DEV_RX_QUEUE		0
> +/** Queue type is TX. */
> +#define RTE_ETH_DEV_TX_QUEUE		1
> +

I'd prefer an enum for these.
Why are the necessary since we have separate rx and tx functions for the
caching hints.

> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this structure may change, or be removed, without prior notice
> + *
> + * A structure used for configuring the cache stashing hints.
> + */
> +struct rte_eth_stashing_config {
> +	/**
> +	 * lcore_id of the processor the stashing hints are applied to.
> +	 */
> +	uint32_t	lcore_id;
> +	/**
> +	 * Zero based cache level relative to the CPU.
> +	 * E.g. l1d = 0, l2d = 1,...
> +	 */
> +	uint32_t	cache_level;
> +	/**
> +	 * Object types the configuration is applied to
> +	 */
> +	uint16_t	objects;

What are the objects? That needs to be covered by the docs, or make this an
enum type so that it's clear from the typesystem what it applies to [though
that would require an array of objects and count, it may be clearer for
user].

> +	/**
> +	 * The offset if RTE_ETH_DEV_STASH_OBJECT_OFFSET bit is set
> +	 *  in objects
> +	 */
> +	int		offset;
> +};
> +
> +/**@{@name Stashable Rx/Tx queue object types supported by the ethernet device
> + *@see rte_eth_dev_stashing_capabilities_get
> + *@see rte_eth_dev_stashing_rx_config_set
> + *@see rte_eth_dev_stashing_tx_config_set
> + */
> +
> +/**
> + * Apply stashing hint to data at a given offset from the start of a
> + * received packet.
> + */
> +#define RTE_ETH_DEV_STASH_OBJECT_OFFSET		0x0001
> +
> +/** Apply stashing hint to an rx descriptor. */
> +#define RTE_ETH_DEV_STASH_OBJECT_DESC		0x0002
> +
> +/** Apply stashing hint to a header of a received packet. */
> +#define RTE_ETH_DEV_STASH_OBJECT_HEADER		0x0004
> +
> +/** Apply stashing hint to a payload of a received packet. */
> +#define RTE_ETH_DEV_STASH_OBJECT_PAYLOAD	0x0008
> +
> +#define __RTE_ETH_DEV_STASH_OBJECT_MASK		0x000f
> +/**@}*/
> +
> +#define RTE_ETH_DEV_STASH_OBJECTS_VALID(t)				\
> +	((!((t) & (~__RTE_ETH_DEV_STASH_OBJECT_MASK))) && (t))
> +
> +/**
> + *
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * @internal
> + * Helper function to validate stashing hints configuration.
> + */
> +__rte_experimental
> +int rte_eth_dev_validate_stashing_config(uint16_t port_id, uint16_t queue_id,
> +					 uint8_t queue_direction,
> +					 struct rte_eth_stashing_config *config);
> +
> +/**
> + *
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Provide cache stashing hints for improved memory access latencies for
> + * packets received by the NIC.
> + * This feature is available only in supported NICs and platforms.
> + *
> + * @param port_id
> + *  The port identifier of the Ethernet device.
> + * @param queue_id
> + *  The index of the receive queue to which hints are applied.
> + * @param config
> + *  Stashing configuration.
> + * @return
> + *  - (-ENODEV) on incorrect port_ids.
> + *  - (-EINVAL) if both RX and TX object types used in conjuection in objects
> + *  parameter.
> + *  - (-EINVAL) on invalid queue_id.
> + *  - (-ENOTSUP) if RTE_ETH_DEV_CAPA_CACHE_STASHING capability is unavailable.
> + *  - (-ENOSYS) if PMD does not implement cache stashing hints.
> + *  - (0) on Success.
> + */
> +__rte_experimental
> +int rte_eth_dev_stashing_rx_config_set(uint16_t port_id, uint16_t queue_id,
> +				   struct rte_eth_stashing_config *config);
> +
> +/**
> + *
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Configure cache stashing for improved memory access latencies for Tx
> + * queue completion descriptors being sent to host system by the NIC.
> + * This feature is available only in supported NICs and platforms.
> + *
> + * @param port_id
> + *  The port identifier of the Ethernet device.
> + * @param queue_id
> + *  The index of the receive queue to which hints are applied.
> + * @param config
> + *  Stashing configuration.
> + * @return
> + *  - (-ENODEV) on incorrect port_ids.
> + *  - (-EINVAL) if both RX and TX object types are used in conjuection in objects
> + *  parameter.
> + *  - (-EINVAL) if hints are incompatible with TX queues.
> + *  - (-EINVAL) on invalid queue_id.
> + *  - (-ENOTSUP) if RTE_ETH_DEV_CAPA_CACHE_STASHING capability is unavailable.
> + *  - (-ENOSYS) if PMD does not implement cache stashing hints.
> + *  - (0) on Success.
> + */
> +__rte_experimental
> +int rte_eth_dev_stashing_tx_config_set(uint16_t port_id, uint16_t queue_id,
> +				   struct rte_eth_stashing_config *config);
> +
> +/**
> + *
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Discover cache stashing objects supported in the ethernet device.
> + *
> + * @param port_id
> + *  The port identifier of the Ethernet device.
> + * @param objects
> + *  Supported objects vector set by the ethernet device.
> + * @return
> + *  On return types and hints parameters will have bits set for supported
> + *  object types.
> + *  - (-ENOTSUP) if the device or the platform does not support cache stashing.
> + *  - (-ENOSYS)  if the underlying PMD hasn't implemented cache stashing
> + *  feature.
> + *  - (-EINVAL)  on NULL values for types or hints parameters.
> + *  - (0) on success.
> + */
> +__rte_experimental
> +int rte_eth_dev_stashing_capabilities_get(uint16_t port_id, uint16_t *objects);
> +
>  #ifdef __cplusplus
>  }
>  #endif
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 2/4] bus/pci: introduce the PCIe TLP Processing Hints API
  2025-06-04 22:52       ` Wathsala Wathawana Vithanage
  2025-06-05  7:50         ` Bruce Richardson
@ 2025-06-05 10:18         ` Bruce Richardson
  2025-06-05 14:25           ` Wathsala Wathawana Vithanage
  1 sibling, 1 reply; 33+ messages in thread
From: Bruce Richardson @ 2025-06-05 10:18 UTC (permalink / raw)
  To: Wathsala Wathawana Vithanage
  Cc: Chenbo Xia, Nipun Gupta, Anatoly Burakov, Gaetan Rivet,
	dev@dpdk.org, nd, Honnappa Nagarahalli, Dhruv Tripathi

On Wed, Jun 04, 2025 at 10:52:24PM +0000, Wathsala Wathawana Vithanage
wrote:
> > > rte_pci_tph_st_{get, set} functions will return an error if
> > > processing any of the rte_tph_info objects fails. The API does not
> > > indicate which entry in the rte_tph_info array was executed
> > > successfully and which caused an error. Therefore, in case of an
> > > error, the caller should discard the output. If rte_pci_tph_set
> > > returns an error, it should be treated as a partial error. Hence, the
> > > steering-tag update on the device should be considered partial and
> > > inconsistent with the expected
> > outcome.
> > > This should be resolved by resetting the endpoint device before
> > > further attempts to set steering tags.
> > 
> > This seems very clunky for the user. Is there a fundamental reason why
> > we cannot report out what ones passed or failed?
> > 
> > If it's a limitation of the kernel IOCTL, how about just making one
> > ioctl for each individual op requested, one at a time. That way we will
> > know what failed to report it?
> > 
> 
> The V1 of the kernel patch had that feature, but it was frowned upon, and
> I was asked to implement the IOCTL this way. Please find it here (V1)
> https://lore.kernel.org/kvm/20250221224638.1836909-1-wathsala.vithanage@arm.com/T/#me73cf9b9c87da97d7d9461dfb97863b78ca1755b
> 
Read the thread. However, from my reading, there is nothing in there that
mandates having an interface where the user won't know the state on error.
We need some method to have userspace know what tags were applied or not on
failure. Resetting the whole device is not a good solution. Whatever API is
provided, if it is going to take multiple ops in one go it needs to either
return the number applied on failure, or if just returning success/failure,
it should rollback the successful ones to give an all-or-nothing interface.

/Bruce

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 2/4] bus/pci: introduce the PCIe TLP Processing Hints API
  2025-06-02 22:38   ` [PATCH v5 2/4] bus/pci: introduce the PCIe TLP Processing Hints API Wathsala Vithanage
  2025-06-03  8:11     ` Morten Brørup
  2025-06-04 16:54     ` Bruce Richardson
@ 2025-06-05 10:30     ` Bruce Richardson
  2 siblings, 0 replies; 33+ messages in thread
From: Bruce Richardson @ 2025-06-05 10:30 UTC (permalink / raw)
  To: Wathsala Vithanage
  Cc: Chenbo Xia, Nipun Gupta, Anatoly Burakov, Gaetan Rivet, dev, nd,
	Honnappa Nagarahalli, Dhruv Tripathi

On Mon, Jun 02, 2025 at 10:38:02PM +0000, Wathsala Vithanage wrote:
> Extend the PCI bus driver to enable or disable TPH capability and set or
> get PCI Steering-Tags (STs) on an endpoint device. The functions
> rte_pci_tph_{enable, disable,st_set,st_get} provide the primary
> interface for DPDK device drivers. Implementation of the interface is OS
> dependent. For Linux, the kernel VFIO driver provides the
> implementation. rte_pci_tph_{enable, disable} functions enable and
> disable TPH capability, respectively. rte_pci_tph_enable enables TPH on
> the device in either of the device-specific, interrupt-vector, or
> no-steering-tag modes.
> 
> rte_pci_tph_st_{get, set} functions take an array of rte_tph_info
> objects with cpu-id, cache-level, flags (processing-hint, memory-type).
> The index in rte_tph_info is the MSI-X/MSI vector/ST-table index if TPH
> was enabled in the interrupt-vector mode; the rte_pci_tph_st_get
> function ignores it. Both rte_pci_tph_st_{set, get} functions return the
> steering-tag (st) and processing-hint-ignored (ph_ignore) fields via the
> same rte_tph_info object passed into them.
> 
> rte_pci_tph_st_{get, set} functions will return an error if processing
> any of the rte_tph_info objects fails. The API does not indicate which
> entry in the rte_tph_info array was executed successfully and which
> caused an error. Therefore, in case of an error, the caller should
> discard the output. If rte_pci_tph_set returns an error, it should be
> treated as a partial error. Hence, the steering-tag update on the device
> should be considered partial and inconsistent with the expected outcome.
> This should be resolved by resetting the endpoint device before further
> attempts to set steering tags.
> 
> Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
> ---
>  drivers/bus/pci/bsd/pci.c        |  43 ++++++++
>  drivers/bus/pci/bus_pci_driver.h |  52 ++++++++++
>  drivers/bus/pci/linux/pci.c      | 100 ++++++++++++++++++
>  drivers/bus/pci/linux/pci_init.h |  13 +++
>  drivers/bus/pci/linux/pci_vfio.c | 170 +++++++++++++++++++++++++++++++
>  drivers/bus/pci/private.h        |   8 ++
>  drivers/bus/pci/rte_bus_pci.h    |  67 ++++++++++++
>  drivers/bus/pci/windows/pci.c    |  43 ++++++++
>  lib/pci/rte_pci.h                |  15 +++
>  9 files changed, 511 insertions(+)
> 
> diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
> index 5e2e09d5a4..dff750c4d6 100644
> --- a/drivers/bus/pci/bsd/pci.c
> +++ b/drivers/bus/pci/bsd/pci.c
> @@ -650,3 +650,46 @@ rte_pci_ioport_unmap(struct rte_pci_ioport *p)
>  
>  	return ret;
>  }
> +
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_enable, 25.07)
> +int
> +rte_pci_tph_enable(struct rte_pci_device *dev, int mode)
> +{
> +	RTE_SET_USED(dev);
> +	RTE_SET_USED(mode);
> +	/* This feature is not yet implemented for BSD */
> +	return -1;
> +}
> +
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_disable, 25.07)
> +int
> +rte_pci_tph_disable(struct rte_pci_device *dev)
> +{
> +	RTE_SET_USED(dev);
> +	/* This feature is not yet implemented for BSD */
> +	return -1;
> +}
> +
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_st_get, 25.07)
> +int
> +rte_pci_tph_st_get(const struct rte_pci_device *dev,
> +		   struct rte_tph_info *info, size_t count)
> +{
> +	RTE_SET_USED(dev);
> +	RTE_SET_USED(info);
> +	RTE_SET_USED(count);
> +	/* This feature is not yet implemented for BSD */
> +	return -1;
> +}
> +
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pci_tph_st_set, 25.07)
> +int
> +rte_pci_tph_st_set(const struct rte_pci_device *dev,
> +		   struct rte_tph_info *info, size_t count)
> +{
> +	RTE_SET_USED(dev);
> +	RTE_SET_USED(info);
> +	RTE_SET_USED(count);
> +	/* This feature is not yet implemented for BSD */
> +	return -1;
> +}
> diff --git a/drivers/bus/pci/bus_pci_driver.h b/drivers/bus/pci/bus_pci_driver.h
> index 2cc1119072..b1c2829fc1 100644
> --- a/drivers/bus/pci/bus_pci_driver.h
> +++ b/drivers/bus/pci/bus_pci_driver.h
> @@ -46,6 +46,7 @@ struct rte_pci_device {
>  	char *bus_info;                     /**< PCI bus specific info */
>  	struct rte_intr_handle *vfio_req_intr_handle;
>  				/**< Handler of VFIO request interrupt */
> +	uint8_t tph_enabled;                /**< TPH enabled on this device */
>  };
>  
>  /**
> @@ -194,6 +195,57 @@ struct rte_pci_ioport {
>  	uint64_t len; /* only filled for memory mapped ports */
>  };
>  
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this structure may change, or be removed, without prior
> + * notice
> + *
> + * This structure is passed into the TPH Steering-Tag set or get function as an
> + * argument by the caller. Return values are set in the same structure in st and
> + * ph_ignore fields by the calee.
> + *
> + * Refer to PCI-SIG ECN "Revised _DSM for Cache Locality TPH Features" for
> + * details.
> + */
> +struct rte_tph_info {
> +	/* Input */
> +	uint32_t cpu_id;	/*Logical CPU id*/
> +	uint32_t cache_level;	/*Cache level relative to CPU. l1d=0,l2d=1,...*/
> +	uint8_t flags;		/*Memory type, procesisng hint etc.*/
> +	uint16_t index;		/*Index in vector table to store the ST*/
> +
> +	/* Output */
> +	uint16_t st;		/*Steering tag returned by the platform*/
> +	uint8_t ph_ignore;	/*Platform ignores PH for the returned ST*/
> +};

Looking at the driver implementation in patch 4, I realised this use of the
info struct for "get" API is very confusing. You partially populate the
structure, then make an API call, passing that struct as input, and output
is also filled into other fields of the same structure.

I dislike having the input and output fields of the structure mixed. Can we
separate out the last two fields here into a separate output struct, or
alternatively just drop them from the struct and have the get API take two
additional arrays as output, one for the tags and another for the ph_ignore
fields.

[Note: documentation also needs to cover what PH is]

/Bruce

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH v5 2/4] bus/pci: introduce the PCIe TLP Processing Hints API
  2025-06-05 10:18         ` Bruce Richardson
@ 2025-06-05 14:25           ` Wathsala Wathawana Vithanage
  0 siblings, 0 replies; 33+ messages in thread
From: Wathsala Wathawana Vithanage @ 2025-06-05 14:25 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: Chenbo Xia, Nipun Gupta, Anatoly Burakov, Gaetan Rivet,
	dev@dpdk.org, nd, Honnappa Nagarahalli, Dhruv Tripathi, nd



> -----Original Message-----
> From: Bruce Richardson <bruce.richardson@intel.com>
> Sent: Thursday, June 5, 2025 5:19 AM
> To: Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com>
> Cc: Chenbo Xia <chenbox@nvidia.com>; Nipun Gupta <nipun.gupta@amd.com>;
> Anatoly Burakov <anatoly.burakov@intel.com>; Gaetan Rivet <grive@u256.net>;
> dev@dpdk.org; nd <nd@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Dhruv Tripathi <Dhruv.Tripathi@arm.com>
> Subject: Re: [PATCH v5 2/4] bus/pci: introduce the PCIe TLP Processing Hints API
> 
> On Wed, Jun 04, 2025 at 10:52:24PM +0000, Wathsala Wathawana Vithanage
> wrote:
> > > > rte_pci_tph_st_{get, set} functions will return an error if
> > > > processing any of the rte_tph_info objects fails. The API does not
> > > > indicate which entry in the rte_tph_info array was executed
> > > > successfully and which caused an error. Therefore, in case of an
> > > > error, the caller should discard the output. If rte_pci_tph_set
> > > > returns an error, it should be treated as a partial error. Hence,
> > > > the steering-tag update on the device should be considered partial
> > > > and inconsistent with the expected
> > > outcome.
> > > > This should be resolved by resetting the endpoint device before
> > > > further attempts to set steering tags.
> > >
> > > This seems very clunky for the user. Is there a fundamental reason
> > > why we cannot report out what ones passed or failed?
> > >
> > > If it's a limitation of the kernel IOCTL, how about just making one
> > > ioctl for each individual op requested, one at a time. That way we
> > > will know what failed to report it?
> > >
> >
> > The V1 of the kernel patch had that feature, but it was frowned upon,
> > and I was asked to implement the IOCTL this way. Please find it here
> > (V1)
> > https://lore.kernel.org/kvm/20250221224638.1836909-1-wathsala.vithanag
> > e@arm.com/T/#me73cf9b9c87da97d7d9461dfb97863b78ca1755b
> >
> Read the thread. However, from my reading, there is nothing in there that
> mandates having an interface where the user won't know the state on error.
> We need some method to have userspace know what tags were applied or not on
> failure. Resetting the whole device is not a good solution. Whatever API is
> provided, if it is going to take multiple ops in one go it needs to either return the
> number applied on failure, or if just returning success/failure, it should rollback
> the successful ones to give an all-or-nothing interface.
> 
> /Bruce

I will bring this up in the V2 review. If there is a pushback we can alternatively change
the API to do one tag at a time.

Thanks.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH v5 3/4] ethdev: introduce the cache stashing hints API
  2025-06-05 10:03     ` Bruce Richardson
@ 2025-06-05 14:30       ` Wathsala Wathawana Vithanage
  0 siblings, 0 replies; 33+ messages in thread
From: Wathsala Wathawana Vithanage @ 2025-06-05 14:30 UTC (permalink / raw)
  To: Bruce Richardson, stephen@networkplumber.org
  Cc: thomas@monjalon.net, Ferruh Yigit, Andrew Rybchenko, dev@dpdk.org,
	nd, Honnappa Nagarahalli, Dhruv Tripathi, nd

> On Mon, Jun 02, 2025 at 10:38:03PM +0000, Wathsala Vithanage wrote:
> > Extend the ethdev library to enable the stashing of different data
> > objects, such as the ones listed below, into CPU caches directly from
> > the NIC.
> >
> > - Rx/Tx queue descriptors
> > - Rx packets
> > - Packet headers
> > - packet payloads
> > - Data of a packet at an offset from the start of the packet
> >
> > The APIs are designed in a hardware/vendor agnostic manner such that
> > supporting PMDs could use any capabilities available in the underlying
> > hardware for fine-grained stashing of data objects into a CPU cache
> >
> > The API provides an interface to query the availability of stashing
> > capabilities, i.e., platform/NIC support, stashable object types, etc,
> > via the rte_eth_dev_stashing_capabilities_get interface.
> >
> > The function pair rte_eth_dev_stashing_rx_config_set and
> > rte_eth_dev_stashing_tx_config_set sets the stashing hint (the CPU,
> > cache level, and data object types) on the Rx and Tx queues.
> >
> > PMDs that support stashing must register their implementations with
> > the following eth_dev_ops callbacks, which are invoked by the ethdev
> > functions listed above.
> >
> > - stashing_capabilities_get
> > - stashing_rx_hints_set
> > - stashing_tx_hints_set
> >
> > Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
> > ---
> 
> Few small comments inline below
> 

Thank you, Bruce. I will address them in the next version. As asked by
@stephen@networkplumber I will only send it only if kernel piece makes it
into Linus branch.

--wathsala

> /Bruce
> 
> >  lib/ethdev/ethdev_driver.h |  66 ++++++++++++++++
> >  lib/ethdev/rte_ethdev.c    | 149 ++++++++++++++++++++++++++++++++++
> >  lib/ethdev/rte_ethdev.h    | 158 +++++++++++++++++++++++++++++++++++++
> >  3 files changed, 373 insertions(+)
> >
> > diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
> > index 2b4d2ae9c3..8a4012db08 100644
> > --- a/lib/ethdev/ethdev_driver.h
> > +++ b/lib/ethdev/ethdev_driver.h
> > @@ -1376,6 +1376,68 @@ enum rte_eth_dev_operation {  typedef uint64_t
> > (*eth_get_restore_flags_t)(struct rte_eth_dev *dev,
> >  					    enum rte_eth_dev_operation op);
> >
> > +/**
> > + * @internal
> > + * Set cache stashing hints in Rx queue.
> > + *
> > + * @param dev
> > + *   Port (ethdev) handle.
> > + * @param queue_id
> > + *   Rx queue.
> > + * @param config
> > + *   Stashing hints configuration for the queue.
> > + *
> > + * @return
> > + *   -ENOTSUP if the device or the platform does not support cache stashing.
> > + *   -ENOSYS  if the underlying PMD hasn't implemented cache stashing
> feature.
> > + *   -EINVAL  on invalid arguments.
> > + *   0 on success.
> > + */
> > +typedef int (*eth_stashing_rx_hints_set_t)(struct rte_eth_dev *dev, uint16_t
> queue_id,
> > +					   struct rte_eth_stashing_config
> *config);
> > +
> > +/**
> > + * @internal
> > + * Set cache stashing hints in Tx queue.
> > + *
> > + * @param dev
> > + *   Port (ethdev) handle.
> > + * @param queue_id
> > + *   Tx queue.
> > + * @param config
> > + *   Stashing hints configuration for the queue.
> > + *
> > + * @return
> > + *   -ENOTSUP if the device or the platform does not support cache stashing.
> > + *   -ENOSYS  if the underlying PMD hasn't implemented cache stashing
> feature.
> > + *   -EINVAL  on invalid arguments.
> > + *   0 on success.
> 
> What about on failure of the underlying ioctl call?
> 
> > + */
> > +typedef int (*eth_stashing_tx_hints_set_t)(struct rte_eth_dev *dev, uint16_t
> queue_id,
> > +					   struct rte_eth_stashing_config
> *config);
> > +
> > +/**
> > + * @internal
> > + * Get cache stashing object types supported in the ethernet device.
> > + * The return value indicates availability of stashing hints support
> > + * in the hardware and the PMD.
> > + *
> > + * @param dev
> > + *   Port (ethdev) handle.
> > + * @param objects
> > + *   PMD sets supported bits on return.
> > + *
> > + * @return
> > + *   -ENOTSUP if the device or the platform does not support cache stashing.
> > + *   -ENOSYS  if the underlying PMD hasn't implemented cache stashing
> feature.
> > + *   -EINVAL  on NULL values for types or hints parameters.
> > + *   On return, types and hints parameters will have bits set for supported
> > + *   object types and hints.
> > + *   0 on success.
> > + */
> > +typedef int (*eth_stashing_capabilities_get_t)(struct rte_eth_dev *dev,
> > +					     uint16_t *objects);
> > +
> >  /**
> >   * @internal A structure containing the functions exported by an Ethernet
> driver.
> >   */
> > @@ -1402,6 +1464,10 @@ struct eth_dev_ops {
> >  	eth_mac_addr_remove_t      mac_addr_remove; /**< Remove MAC
> address */
> >  	eth_mac_addr_add_t         mac_addr_add;  /**< Add a MAC address */
> >  	eth_mac_addr_set_t         mac_addr_set;  /**< Set a MAC address */
> > +	eth_stashing_rx_hints_set_t   stashing_rx_hints_set; /**< Set Rx cache
> stashing*/
> > +	eth_stashing_tx_hints_set_t   stashing_tx_hints_set; /**< Set Tx cache
> stashing*/
> > +	/** Get supported stashing hints*/
> > +	eth_stashing_capabilities_get_t stashing_capabilities_get;
> >  	/** Set list of multicast addresses */
> >  	eth_set_mc_addr_list_t     set_mc_addr_list;
> >  	mtu_set_t                  mtu_set;       /**< Set MTU */
> > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> > d4197322a0..ae666c370b 100644
> > --- a/lib/ethdev/rte_ethdev.c
> > +++ b/lib/ethdev/rte_ethdev.c
> > @@ -158,6 +158,7 @@ static const struct {
> >  	{RTE_ETH_DEV_CAPA_RXQ_SHARE, "RXQ_SHARE"},
> >  	{RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP, "FLOW_RULE_KEEP"},
> >  	{RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP,
> > "FLOW_SHARED_OBJECT_KEEP"},
> > +	{RTE_ETH_DEV_CAPA_CACHE_STASHING, "CACHE_STASHING"},
> >  };
> >
> >  enum {
> > @@ -7419,5 +7420,153 @@ int rte_eth_dev_map_aggr_tx_affinity(uint16_t
> port_id, uint16_t tx_queue_id,
> >  	return ret;
> >  }
> >
> >
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_dev_validate_stashing_config,
> > +25.07) int rte_eth_dev_validate_stashing_config(uint16_t port_id,
> > +uint16_t queue_id,
> > +				     uint8_t queue_direction,
> > +				     struct rte_eth_stashing_config *config) {
> > +	struct rte_eth_dev *dev;
> > +	struct rte_eth_dev_info dev_info;
> > +	int ret = 0;
> > +	uint16_t nb_queues;
> > +
> > +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> > +
> > +	if (!config) {
> > +		RTE_ETHDEV_LOG_LINE(ERR, "Invalid stashing configuration");
> > +		ret = -EINVAL;
> > +		goto out;
> > +	}
> > +
> > +	/*
> > +	 * Check for invalid objects
> > +	 */
> > +	if (!RTE_ETH_DEV_STASH_OBJECTS_VALID(config->objects)) {
> > +		RTE_ETHDEV_LOG_LINE(ERR, "Invalid stashing objects");
> > +		ret = -EINVAL;
> > +		goto out;
> > +	}
> > +
> > +	dev = &rte_eth_devices[port_id];
> > +
> > +	nb_queues = (queue_direction == RTE_ETH_DEV_RX_QUEUE) ?
> > +				      dev->data->nb_rx_queues :
> > +				      dev->data->nb_tx_queues;
> > +
> > +	if (queue_id >= nb_queues) {
> > +		RTE_ETHDEV_LOG_LINE(ERR, "Invalid Rx queue_id=%u",
> queue_id);
> > +		ret = -EINVAL;
> > +		goto out;
> > +	}
> > +
> > +	ret = rte_eth_dev_info_get(port_id, &dev_info);
> > +	if (ret < 0)
> > +		goto out;
> > +
> > +	if ((dev_info.dev_capa & RTE_ETH_DEV_CAPA_CACHE_STASHING) !=
> > +	    RTE_ETH_DEV_CAPA_CACHE_STASHING) {
> 
> Nit: check if all this can fit on one line under 100 chars. I think it can.
> 
> > +		ret = -ENOTSUP;
> > +		goto out;
> > +	}
> > +
> > +	if (*dev->dev_ops->stashing_rx_hints_set == NULL ||
> > +	    *dev->dev_ops->stashing_tx_hints_set == NULL) {
> > +		RTE_ETHDEV_LOG_LINE(ERR, "Stashing hints are not
> implemented "
> > +				    "in %s for %s", dev_info.driver_name,
> 
> Don't split error messages across lines. Text strings are allowed to go over the 100
> char limit if necessary to avoid splitting.
> 
> > +				    dev_info.device->name);
> > +		ret = -ENOSYS;
> > +	}
> > +
> > +out:
> > +	return ret;
> > +}
> > +
> > +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_dev_stashing_rx_config_set,
> > +25.07) int rte_eth_dev_stashing_rx_config_set(uint16_t port_id,
> > +uint16_t queue_id,
> > +				   struct rte_eth_stashing_config *config) {
> > +	struct rte_eth_dev *dev;
> > +	int ret = 0;
> > +
> > +	ret = rte_eth_dev_validate_stashing_config(port_id, queue_id,
> > +						   RTE_ETH_DEV_RX_QUEUE,
> > +						   config);
> > +	if (ret < 0)
> > +		goto out;
> > +
> > +	dev = &rte_eth_devices[port_id];
> > +
> > +	ret = eth_err(port_id,
> > +		      (*dev->dev_ops->stashing_rx_hints_set)(dev, queue_id,
> > +		      config));
> > +out:
> > +	return ret;
> > +}
> > +
> > +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_dev_stashing_tx_config_set,
> > +25.07) int rte_eth_dev_stashing_tx_config_set(uint16_t port_id,
> > +uint16_t queue_id,
> > +				   struct rte_eth_stashing_config *config) {
> > +	struct rte_eth_dev *dev;
> > +	int ret = 0;
> > +
> > +	ret = rte_eth_dev_validate_stashing_config(port_id, queue_id,
> > +						   RTE_ETH_DEV_TX_QUEUE,
> > +						   config);
> > +	if (ret < 0)
> > +		goto out;
> > +
> > +	dev = &rte_eth_devices[port_id];
> > +
> > +	ret = eth_err(port_id,
> > +		      (*dev->dev_ops->stashing_rx_hints_set) (dev, queue_id,
> > +		       config));
> > +out:
> > +	return ret;
> > +}
> > +
> >
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_dev_stashing_capabilities_get,
> > +25.07) int rte_eth_dev_stashing_capabilities_get(uint16_t port_id,
> > +uint16_t *objects) {
> > +	struct rte_eth_dev *dev;
> > +	struct rte_eth_dev_info dev_info;
> > +	int ret = 0;
> > +
> > +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> > +
> > +	if (!objects) {
> > +		ret = -EINVAL;
> > +		goto out;
> > +	}
> > +
> > +	dev = &rte_eth_devices[port_id];
> > +
> > +	ret = rte_eth_dev_info_get(port_id, &dev_info);
> > +	if (ret < 0)
> > +		goto out;
> > +
> > +	if ((dev_info.dev_capa & RTE_ETH_DEV_CAPA_CACHE_STASHING) !=
> > +	    RTE_ETH_DEV_CAPA_CACHE_STASHING) {
> > +		ret = -ENOTSUP;
> > +		goto out;
> > +	}
> > +
> > +	if (*dev->dev_ops->stashing_capabilities_get == NULL) {
> > +		RTE_ETHDEV_LOG_LINE(ERR, "Stashing hints are not
> implemented "
> > +				    "in %s for %s", dev_info.driver_name,
> > +				    dev_info.device->name);
> > +		ret = -ENOSYS;
> > +		goto out;
> > +	}
> > +	ret = eth_err(port_id,
> > +		      (*dev->dev_ops->stashing_capabilities_get)(dev, objects));
> > +out:
> > +	return ret;
> > +}
> > +
> >  RTE_EXPORT_SYMBOL(rte_eth_dev_logtype)
> >  RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO); diff --git
> > a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> > ea7f8c4a1a..1398f8c837 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -1667,6 +1667,9 @@ struct rte_eth_conf {  #define
> > RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP RTE_BIT64(4)  /**@}*/
> >
> > +/** Device supports stashing to CPU/system caches. */ #define
> > +RTE_ETH_DEV_CAPA_CACHE_STASHING RTE_BIT64(5)
> > +
> >  /*
> >   * Fallback default preferred Rx/Tx port parameters.
> >   * These are used if an application requests default parameters @@
> > -1838,6 +1841,7 @@ struct rte_eth_dev_info {
> >  	struct rte_eth_dev_portconf default_txportconf;
> >  	/** Generic device capabilities (RTE_ETH_DEV_CAPA_). */
> >  	uint64_t dev_capa;
> > +	uint16_t stashing_capa;
> >  	/**
> >  	 * Switching information for ports on a device with a
> >  	 * embedded managed interconnect/switch.
> > @@ -6173,6 +6177,160 @@ int rte_eth_cman_config_set(uint16_t port_id,
> > const struct rte_eth_cman_config *  __rte_experimental  int
> > rte_eth_cman_config_get(uint16_t port_id, struct rte_eth_cman_config
> > *config);
> >
> > +
> > +/** Queue type is RX. */
> > +#define RTE_ETH_DEV_RX_QUEUE		0
> > +/** Queue type is TX. */
> > +#define RTE_ETH_DEV_TX_QUEUE		1
> > +
> 
> I'd prefer an enum for these.
> Why are the necessary since we have separate rx and tx functions for the caching
> hints.
> 
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this structure may change, or be removed, without
> > +prior notice
> > + *
> > + * A structure used for configuring the cache stashing hints.
> > + */
> > +struct rte_eth_stashing_config {
> > +	/**
> > +	 * lcore_id of the processor the stashing hints are applied to.
> > +	 */
> > +	uint32_t	lcore_id;
> > +	/**
> > +	 * Zero based cache level relative to the CPU.
> > +	 * E.g. l1d = 0, l2d = 1,...
> > +	 */
> > +	uint32_t	cache_level;
> > +	/**
> > +	 * Object types the configuration is applied to
> > +	 */
> > +	uint16_t	objects;
> 
> What are the objects? That needs to be covered by the docs, or make this an
> enum type so that it's clear from the typesystem what it applies to [though that
> would require an array of objects and count, it may be clearer for user].
> 
> > +	/**
> > +	 * The offset if RTE_ETH_DEV_STASH_OBJECT_OFFSET bit is set
> > +	 *  in objects
> > +	 */
> > +	int		offset;
> > +};
> > +
> > +/**@{@name Stashable Rx/Tx queue object types supported by the
> > +ethernet device  *@see rte_eth_dev_stashing_capabilities_get
> > + *@see rte_eth_dev_stashing_rx_config_set
> > + *@see rte_eth_dev_stashing_tx_config_set
> > + */
> > +
> > +/**
> > + * Apply stashing hint to data at a given offset from the start of a
> > + * received packet.
> > + */
> > +#define RTE_ETH_DEV_STASH_OBJECT_OFFSET		0x0001
> > +
> > +/** Apply stashing hint to an rx descriptor. */
> > +#define RTE_ETH_DEV_STASH_OBJECT_DESC		0x0002
> > +
> > +/** Apply stashing hint to a header of a received packet. */
> > +#define RTE_ETH_DEV_STASH_OBJECT_HEADER		0x0004
> > +
> > +/** Apply stashing hint to a payload of a received packet. */
> > +#define RTE_ETH_DEV_STASH_OBJECT_PAYLOAD	0x0008
> > +
> > +#define __RTE_ETH_DEV_STASH_OBJECT_MASK		0x000f
> > +/**@}*/
> > +
> > +#define RTE_ETH_DEV_STASH_OBJECTS_VALID(t)
> 	\
> > +	((!((t) & (~__RTE_ETH_DEV_STASH_OBJECT_MASK))) && (t))
> > +
> > +/**
> > + *
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> > +notice
> > + *
> > + * @internal
> > + * Helper function to validate stashing hints configuration.
> > + */
> > +__rte_experimental
> > +int rte_eth_dev_validate_stashing_config(uint16_t port_id, uint16_t queue_id,
> > +					 uint8_t queue_direction,
> > +					 struct rte_eth_stashing_config
> *config);
> > +
> > +/**
> > + *
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> > +notice
> > + *
> > + * Provide cache stashing hints for improved memory access latencies
> > +for
> > + * packets received by the NIC.
> > + * This feature is available only in supported NICs and platforms.
> > + *
> > + * @param port_id
> > + *  The port identifier of the Ethernet device.
> > + * @param queue_id
> > + *  The index of the receive queue to which hints are applied.
> > + * @param config
> > + *  Stashing configuration.
> > + * @return
> > + *  - (-ENODEV) on incorrect port_ids.
> > + *  - (-EINVAL) if both RX and TX object types used in conjuection in
> > +objects
> > + *  parameter.
> > + *  - (-EINVAL) on invalid queue_id.
> > + *  - (-ENOTSUP) if RTE_ETH_DEV_CAPA_CACHE_STASHING capability is
> unavailable.
> > + *  - (-ENOSYS) if PMD does not implement cache stashing hints.
> > + *  - (0) on Success.
> > + */
> > +__rte_experimental
> > +int rte_eth_dev_stashing_rx_config_set(uint16_t port_id, uint16_t queue_id,
> > +				   struct rte_eth_stashing_config *config);
> > +
> > +/**
> > + *
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> > +notice
> > + *
> > + * Configure cache stashing for improved memory access latencies for
> > +Tx
> > + * queue completion descriptors being sent to host system by the NIC.
> > + * This feature is available only in supported NICs and platforms.
> > + *
> > + * @param port_id
> > + *  The port identifier of the Ethernet device.
> > + * @param queue_id
> > + *  The index of the receive queue to which hints are applied.
> > + * @param config
> > + *  Stashing configuration.
> > + * @return
> > + *  - (-ENODEV) on incorrect port_ids.
> > + *  - (-EINVAL) if both RX and TX object types are used in
> > +conjuection in objects
> > + *  parameter.
> > + *  - (-EINVAL) if hints are incompatible with TX queues.
> > + *  - (-EINVAL) on invalid queue_id.
> > + *  - (-ENOTSUP) if RTE_ETH_DEV_CAPA_CACHE_STASHING capability is
> unavailable.
> > + *  - (-ENOSYS) if PMD does not implement cache stashing hints.
> > + *  - (0) on Success.
> > + */
> > +__rte_experimental
> > +int rte_eth_dev_stashing_tx_config_set(uint16_t port_id, uint16_t queue_id,
> > +				   struct rte_eth_stashing_config *config);
> > +
> > +/**
> > + *
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> > +notice
> > + *
> > + * Discover cache stashing objects supported in the ethernet device.
> > + *
> > + * @param port_id
> > + *  The port identifier of the Ethernet device.
> > + * @param objects
> > + *  Supported objects vector set by the ethernet device.
> > + * @return
> > + *  On return types and hints parameters will have bits set for
> > +supported
> > + *  object types.
> > + *  - (-ENOTSUP) if the device or the platform does not support cache stashing.
> > + *  - (-ENOSYS)  if the underlying PMD hasn't implemented cache
> > +stashing
> > + *  feature.
> > + *  - (-EINVAL)  on NULL values for types or hints parameters.
> > + *  - (0) on success.
> > + */
> > +__rte_experimental
> > +int rte_eth_dev_stashing_capabilities_get(uint16_t port_id, uint16_t
> > +*objects);
> > +
> >  #ifdef __cplusplus
> >  }
> >  #endif
> > --
> > 2.43.0
> >

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH v5 2/4] bus/pci: introduce the PCIe TLP Processing Hints API
  2025-06-05  7:50         ` Bruce Richardson
@ 2025-06-05 14:32           ` Wathsala Wathawana Vithanage
  0 siblings, 0 replies; 33+ messages in thread
From: Wathsala Wathawana Vithanage @ 2025-06-05 14:32 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: Chenbo Xia, Nipun Gupta, Anatoly Burakov, Gaetan Rivet,
	dev@dpdk.org, nd, Honnappa Nagarahalli, Dhruv Tripathi, nd



> -----Original Message-----
> From: Bruce Richardson <bruce.richardson@intel.com>
> Sent: Thursday, June 5, 2025 2:51 AM
> To: Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com>
> Cc: Chenbo Xia <chenbox@nvidia.com>; Nipun Gupta <nipun.gupta@amd.com>;
> Anatoly Burakov <anatoly.burakov@intel.com>; Gaetan Rivet <grive@u256.net>;
> dev@dpdk.org; nd <nd@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Dhruv Tripathi <Dhruv.Tripathi@arm.com>
> Subject: Re: [PATCH v5 2/4] bus/pci: introduce the PCIe TLP Processing Hints API
> 
> On Wed, Jun 04, 2025 at 10:52:24PM +0000, Wathsala Wathawana Vithanage
> wrote:
> > > > rte_pci_tph_st_{get, set} functions will return an error if
> > > > processing any of the rte_tph_info objects fails. The API does not
> > > > indicate which entry in the rte_tph_info array was executed
> > > > successfully and which caused an error. Therefore, in case of an
> > > > error, the caller should discard the output. If rte_pci_tph_set
> > > > returns an error, it should be treated as a partial error. Hence,
> > > > the steering-tag update on the device should be considered partial
> > > > and inconsistent with the expected
> > > outcome.
> > > > This should be resolved by resetting the endpoint device before
> > > > further attempts to set steering tags.
> > >
> > > This seems very clunky for the user. Is there a fundamental reason
> > > why we cannot report out what ones passed or failed?
> > >
> > > If it's a limitation of the kernel IOCTL, how about just making one
> > > ioctl for each individual op requested, one at a time. That way we
> > > will know what failed to report it?
> > >
> >
> > The V1 of the kernel patch had that feature, but it was frowned upon,
> > and I was asked to implement the IOCTL this way. Please find it here
> > (V1)
> > https://lore.kernel.org/kvm/20250221224638.1836909-1-wathsala.vithanag
> > e@arm.com/T/#me73cf9b9c87da97d7d9461dfb97863b78ca1755b
> >
> > > Other comments inline below.
> > >
> >
> > I will address them in the next version.
> >
> > Thanks.
> >
> > --wathsala
> >
> > > /Bruce
> > >
> 
> <snip>
> 
> > > > diff --git a/lib/pci/rte_pci.h b/lib/pci/rte_pci.h index
> > > > 9a50a12142..da9cd666bf 100644
> > > > --- a/lib/pci/rte_pci.h
> > > > +++ b/lib/pci/rte_pci.h
> > > > @@ -137,6 +137,21 @@ extern "C" {
> > > >  /* Process Address Space ID (RTE_PCI_EXT_CAP_ID_PASID) */
> > > >  #define RTE_PCI_PASID_CTRL		0x06    /* PASID control register
> */
> > > >
> > > > +/* TPH Requester */
> > > > +#define RTE_PCI_TPH_CAP            4       /* capability register */
> > > > +#define RTE_PCI_TPH_CAP_ST_NS      0x00000001 /* No ST Mode
> Supported
> > > */
> > > > +#define RTE_PCI_TPH_CAP_ST_IV      0x00000002 /* Interrupt Vector
> Mode
> > > Supported */
> > > > +#define RTE_PCI_TPH_CAP_ST_DS      0x00000004 /* Device Specific
> Mode
> > > Supported */
> > > > +#define RTE_PCI_TPH_CAP_EXT_TPH    0x00000100 /* Ext TPH Requester
> > > Supported */
> > > > +#define RTE_PCI_TPH_CAP_LOC_MASK   0x00000600 /* ST Table Location
> */
> > > > +#define RTE_PCI_TPH_LOC_NONE       0x00000000 /* Not present */
> > > > +#define RTE_PCI_TPH_LOC_CAP        0x00000200 /* In capability */
> > > > +#define RTE_PCI_TPH_LOC_MSIX       0x00000400 /* In MSI-X */
> > > > +#define RTE_PCI_TPH_CAP_ST_MASK    0x07FF0000 /* ST Table Size */
> > > > +#define RTE_PCI_TPH_CAP_ST_SHIFT   16      /* ST Table Size shift */
> > > > +#define RTE_PCI_TPH_BASE_SIZEOF    0xc     /* Size with no ST table */
> > > > +
> > > > +
> > >
> > > Where are all these values used? They don't seem to be needed by
> > > this patch. If needed in later patches, I'd suggest adding them there.
> > >
> >
> > RTE_PCI_TPH_CAP_ST_NS, RTE_PCI_TPH_CAP_ST_IV and
> RTE_PCI_TPH_CAP_ST_DS
> > are used by drivers. I40e patch uses RTE_PCI_TPH_CAP_ST_DS.
> > I will remove the rest, added here for completeness.
> >
> 
> Having them all for completeness is fine. You can keep this as-is in next version
> then.
> 

+1

--wathsala
> /Bruce

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 0/4] An API for Cache Stashing with TPH
  2025-06-02 22:38 ` [PATCH v5 0/4] An API for Cache Stashing with TPH Wathsala Vithanage
                     ` (4 preceding siblings ...)
  2025-06-04 16:51   ` [PATCH v5 0/4] An API for Cache Stashing with TPH Stephen Hemminger
@ 2026-01-08  0:30   ` fengchengwen
  2026-01-19  1:16     ` fengchengwen
  5 siblings, 1 reply; 33+ messages in thread
From: fengchengwen @ 2026-01-08  0:30 UTC (permalink / raw)
  To: Wathsala Vithanage; +Cc: dev, nd

Hi Wathsala,

Sorry to ask if this patchset is under development or stopped?

PCIe Steer-tag provides a mechanism for precise data stash, which
delivers a positive performance gain and is therefore a valuable
feature I think.

This patchset concludes with the statement: "the PMDs should only
enable TPH in device-specific mode", I don't think such restraints
should be made, the framework should be compatible with various
device capabilities:
1. The PCIe protocol defines two modes: one is the interrupt-vector
   mode, and the other is the device-specific mode. A device may
   choose to support either one or both.
2. If device support device-specific mode, it has a large degree of
   freedom to implement, such as locate ST table in self-defined
   place (just like '[PATCH v5 4/4] net/i40e: enable TPH in i40e'),
   and also support only stash part of data (e.g. only desc or header
   or even an offset data).
3. If device only support interrupt-vector mode (which each TLP will
   use ST from an ST table entry), we could also support it, in this
   framework, it could only report basic stash capability.

Thanks

On 6/3/2025 6:38 AM, Wathsala Vithanage wrote:
> Today, DPDK applications benefit from Direct Cache Access (DCA) features
> like Intel DDIO and Arm's write-allocate-to-SLC. However, those features
> do not allow fine-grained control of direct cache access, such as
> stashing packets into upper-level caches (L2 caches) of a processor or
> the shared cache of a chiplet. PCIe TLP Processing Hints (TPH) addresses
> this need in a vendor-agnostic manner. TPH capability has existed since
> PCI Express Base Specification revision 3.0; today, numerous Network
> Interface Cards and interconnects from different vendors support TPH
> capability. TPH comprises a steering tag (ST) and a processing hint
> (PH). ST specifies the cache level of a CPU at which the data should be
> written to (or DCAed into), while PH is a hint provided by the PCIe
> requester to the completer on an upcoming traffic pattern. Some NIC
> vendors bundle TPH capability with fine-grained control over the type of
> objects that can be stashed into CPU caches, such as
> 
> - Rx/Tx queue descriptors
> - Packet-headers
> - Packet-payloads
> - Data from a given offset from the start of a packet
> 
> Note that stashable object types are outside the scope of the PCIe
> standard; therefore, vendors could support any combination of the above
> items as they see fit.
> 
> To enable TPH and fine-grained packet stashing, this API extends the
> ethdev library and the PCI bus driver. In this design, the application
> provides hints to the PMD via the ethdev stashing API to indicate the
> underlying hardware at which CPU and cache level it prefers a packet to
> end up. Once the PMD receives a CPU and a cache-level combination (or a
> list of such combinations), it must extract the matching ST from the PCI
> bus driver for such combinations. The PCI bus driver implements the TPH
> functions in an OS specific way; for Linux, it depends on the TPH
> capabilities of the VFIO kernel driver.
> 
> An application uses the cache stashing ethdev API by first calling the
> rte_eth_dev_stashing_capabilities_get() function to find out what object
> types can be stashed into a CPU cache by the NIC out of the object types
> in the bulleted list above. This function takes a port_id and a pointer
> to a uint16_t to report back the object type flags. PMD implements the
> stashing_capabilities_get function pointer in eth_dev_ops. If the
> underlying platform or the NIC does not support TPH, this function
> returns -ENOTSUP, and the application should consider any values stored
> in the object invalid.
> 
> Once the application knows the supported object types that can be
> stashed, the next step is to set the steering tags for the packets
> associated with Rx and Tx queues via
> rte_eth_dev_stashing_{rx,tx}_config_set() ethdev library functions. Both
> functions have an identical signature, a port_id, a queue_id, and a
> config object. The port_id and the queue_id are used to locate the
> device and the queue. The config object is of type struct
> rte_eth_stashing_config, which specifies the lcore_id and the
> cache_level, indicating where objects from this queue should be stashed.
> The 'objects' field in the config sets the types of objects the
> application wishes to stash based on the capabilities found earlier.
> Note that if the 'objects' field includes the flag
> RTE_ETH_DEV_STASH_OBJECT_OFFSET, the 'offset' field must be used to set
> the desired offset. These functions invoke PMD implementations of the
> stashing functionality via the stashing_{rx,tx}_hints_set function
> callbacks in the eth_dev_ops, respectively.
> 
> The PMD's implementation of the stashing_rx_hints_set() and
> stashing_tx_hints_set() functions is ultimately responsible for
> extracting the ST via the API provided by the PCI bus driver. Before
> extracting STs, the PMD should enable the TPH capability in the endpoint
> device by calling the rte_pci_tph_enable() function.  The application
> begins the ST extraction process by calling the rte_pci_tph_st_get()
> function in drivers/bus/pci/rte_bus_pci.h, which returns STs via the
> same rte_tph_info objects array passed into it as an argument.  Once PMD
> acquires ST, the stashing_{rx,tx}_hints_set callbacks implemented in the
> PMD are ready to set the ST as per the rte_eth_stashing_config object
> passed to them by the higher-level ethdev functions
> ret_eth_dev_stashing_{rx,tx}_hints(). As per the PCIe specification, STs
> can be placed on the MSI-X tables or in a device-specific location. For
> PMDs, setting the STs on queue contexts is the only viable way of using
> TPH. Therefore, the PMDs should only enable TPH in device-specific mode.
> 
> V4->V5:
>  * Enable stashing-hints (TPH) in Intel i40e driver.
>  * Update exported symbol version from 25.03 to 25.07.
>  * Add TPH mode macros.
> 
> V3->V4:
>  * Add VFIO IOCTL based ST extraction mechanism to Linux PCI bus driver
>  * Remove ST extraction via direct access to ACPI _DSM
>  * Replace rte_pci_extract_tph_st() with rte_pci_tph_st_get() in PCI
>    bus driver.
> 
> Wathsala Vithanage (4):
>   pci: add non-merged Linux uAPI changes
>   bus/pci: introduce the PCIe TLP Processing Hints API
>   ethdev: introduce the cache stashing hints API
>   net/i40e: enable TPH in i40e
> 
>  drivers/bus/pci/bsd/pci.c            |  43 +++++++
>  drivers/bus/pci/bus_pci_driver.h     |  52 ++++++++
>  drivers/bus/pci/linux/pci.c          | 100 ++++++++++++++++
>  drivers/bus/pci/linux/pci_init.h     |  14 +++
>  drivers/bus/pci/linux/pci_vfio.c     | 170 +++++++++++++++++++++++++++
>  drivers/bus/pci/private.h            |   8 ++
>  drivers/bus/pci/rte_bus_pci.h        |  67 +++++++++++
>  drivers/bus/pci/windows/pci.c        |  43 +++++++
>  drivers/net/intel/i40e/i40e_ethdev.c | 127 ++++++++++++++++++++
>  kernel/linux/uapi/linux/vfio_tph.h   | 102 ++++++++++++++++
>  lib/ethdev/ethdev_driver.h           |  66 +++++++++++
>  lib/ethdev/rte_ethdev.c              | 149 +++++++++++++++++++++++
>  lib/ethdev/rte_ethdev.h              | 158 +++++++++++++++++++++++++
>  lib/pci/rte_pci.h                    |  15 +++
>  14 files changed, 1114 insertions(+)
>  create mode 100644 kernel/linux/uapi/linux/vfio_tph.h
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 0/4] An API for Cache Stashing with TPH
  2026-01-08  0:30   ` fengchengwen
@ 2026-01-19  1:16     ` fengchengwen
  2026-04-14 17:02       ` Wathsala Vithanage
  0 siblings, 1 reply; 33+ messages in thread
From: fengchengwen @ 2026-01-19  1:16 UTC (permalink / raw)
  To: Wathsala Vithanage; +Cc: dev, nd

Hi Wathsala,

Looking forward to your reply.

Thanks

On 1/8/2026 8:30 AM, fengchengwen wrote:
> Hi Wathsala,
> 
> Sorry to ask if this patchset is under development or stopped?
> 
> PCIe Steer-tag provides a mechanism for precise data stash, which
> delivers a positive performance gain and is therefore a valuable
> feature I think.
> 
> This patchset concludes with the statement: "the PMDs should only
> enable TPH in device-specific mode", I don't think such restraints
> should be made, the framework should be compatible with various
> device capabilities:
> 1. The PCIe protocol defines two modes: one is the interrupt-vector
>    mode, and the other is the device-specific mode. A device may
>    choose to support either one or both.
> 2. If device support device-specific mode, it has a large degree of
>    freedom to implement, such as locate ST table in self-defined
>    place (just like '[PATCH v5 4/4] net/i40e: enable TPH in i40e'),
>    and also support only stash part of data (e.g. only desc or header
>    or even an offset data).
> 3. If device only support interrupt-vector mode (which each TLP will
>    use ST from an ST table entry), we could also support it, in this
>    framework, it could only report basic stash capability.
> 
> Thanks
> 
> On 6/3/2025 6:38 AM, Wathsala Vithanage wrote:
>> Today, DPDK applications benefit from Direct Cache Access (DCA) features
>> like Intel DDIO and Arm's write-allocate-to-SLC. However, those features
>> do not allow fine-grained control of direct cache access, such as
>> stashing packets into upper-level caches (L2 caches) of a processor or
>> the shared cache of a chiplet. PCIe TLP Processing Hints (TPH) addresses
>> this need in a vendor-agnostic manner. TPH capability has existed since
>> PCI Express Base Specification revision 3.0; today, numerous Network
>> Interface Cards and interconnects from different vendors support TPH
>> capability. TPH comprises a steering tag (ST) and a processing hint
>> (PH). ST specifies the cache level of a CPU at which the data should be
>> written to (or DCAed into), while PH is a hint provided by the PCIe
>> requester to the completer on an upcoming traffic pattern. Some NIC
>> vendors bundle TPH capability with fine-grained control over the type of
>> objects that can be stashed into CPU caches, such as
>>
>> - Rx/Tx queue descriptors
>> - Packet-headers
>> - Packet-payloads
>> - Data from a given offset from the start of a packet
>>
>> Note that stashable object types are outside the scope of the PCIe
>> standard; therefore, vendors could support any combination of the above
>> items as they see fit.
>>
>> To enable TPH and fine-grained packet stashing, this API extends the
>> ethdev library and the PCI bus driver. In this design, the application
>> provides hints to the PMD via the ethdev stashing API to indicate the
>> underlying hardware at which CPU and cache level it prefers a packet to
>> end up. Once the PMD receives a CPU and a cache-level combination (or a
>> list of such combinations), it must extract the matching ST from the PCI
>> bus driver for such combinations. The PCI bus driver implements the TPH
>> functions in an OS specific way; for Linux, it depends on the TPH
>> capabilities of the VFIO kernel driver.
>>
>> An application uses the cache stashing ethdev API by first calling the
>> rte_eth_dev_stashing_capabilities_get() function to find out what object
>> types can be stashed into a CPU cache by the NIC out of the object types
>> in the bulleted list above. This function takes a port_id and a pointer
>> to a uint16_t to report back the object type flags. PMD implements the
>> stashing_capabilities_get function pointer in eth_dev_ops. If the
>> underlying platform or the NIC does not support TPH, this function
>> returns -ENOTSUP, and the application should consider any values stored
>> in the object invalid.
>>
>> Once the application knows the supported object types that can be
>> stashed, the next step is to set the steering tags for the packets
>> associated with Rx and Tx queues via
>> rte_eth_dev_stashing_{rx,tx}_config_set() ethdev library functions. Both
>> functions have an identical signature, a port_id, a queue_id, and a
>> config object. The port_id and the queue_id are used to locate the
>> device and the queue. The config object is of type struct
>> rte_eth_stashing_config, which specifies the lcore_id and the
>> cache_level, indicating where objects from this queue should be stashed.
>> The 'objects' field in the config sets the types of objects the
>> application wishes to stash based on the capabilities found earlier.
>> Note that if the 'objects' field includes the flag
>> RTE_ETH_DEV_STASH_OBJECT_OFFSET, the 'offset' field must be used to set
>> the desired offset. These functions invoke PMD implementations of the
>> stashing functionality via the stashing_{rx,tx}_hints_set function
>> callbacks in the eth_dev_ops, respectively.
>>
>> The PMD's implementation of the stashing_rx_hints_set() and
>> stashing_tx_hints_set() functions is ultimately responsible for
>> extracting the ST via the API provided by the PCI bus driver. Before
>> extracting STs, the PMD should enable the TPH capability in the endpoint
>> device by calling the rte_pci_tph_enable() function.  The application
>> begins the ST extraction process by calling the rte_pci_tph_st_get()
>> function in drivers/bus/pci/rte_bus_pci.h, which returns STs via the
>> same rte_tph_info objects array passed into it as an argument.  Once PMD
>> acquires ST, the stashing_{rx,tx}_hints_set callbacks implemented in the
>> PMD are ready to set the ST as per the rte_eth_stashing_config object
>> passed to them by the higher-level ethdev functions
>> ret_eth_dev_stashing_{rx,tx}_hints(). As per the PCIe specification, STs
>> can be placed on the MSI-X tables or in a device-specific location. For
>> PMDs, setting the STs on queue contexts is the only viable way of using
>> TPH. Therefore, the PMDs should only enable TPH in device-specific mode.
>>
>> V4->V5:
>>  * Enable stashing-hints (TPH) in Intel i40e driver.
>>  * Update exported symbol version from 25.03 to 25.07.
>>  * Add TPH mode macros.
>>
>> V3->V4:
>>  * Add VFIO IOCTL based ST extraction mechanism to Linux PCI bus driver
>>  * Remove ST extraction via direct access to ACPI _DSM
>>  * Replace rte_pci_extract_tph_st() with rte_pci_tph_st_get() in PCI
>>    bus driver.
>>
>> Wathsala Vithanage (4):
>>   pci: add non-merged Linux uAPI changes
>>   bus/pci: introduce the PCIe TLP Processing Hints API
>>   ethdev: introduce the cache stashing hints API
>>   net/i40e: enable TPH in i40e
>>
>>  drivers/bus/pci/bsd/pci.c            |  43 +++++++
>>  drivers/bus/pci/bus_pci_driver.h     |  52 ++++++++
>>  drivers/bus/pci/linux/pci.c          | 100 ++++++++++++++++
>>  drivers/bus/pci/linux/pci_init.h     |  14 +++
>>  drivers/bus/pci/linux/pci_vfio.c     | 170 +++++++++++++++++++++++++++
>>  drivers/bus/pci/private.h            |   8 ++
>>  drivers/bus/pci/rte_bus_pci.h        |  67 +++++++++++
>>  drivers/bus/pci/windows/pci.c        |  43 +++++++
>>  drivers/net/intel/i40e/i40e_ethdev.c | 127 ++++++++++++++++++++
>>  kernel/linux/uapi/linux/vfio_tph.h   | 102 ++++++++++++++++
>>  lib/ethdev/ethdev_driver.h           |  66 +++++++++++
>>  lib/ethdev/rte_ethdev.c              | 149 +++++++++++++++++++++++
>>  lib/ethdev/rte_ethdev.h              | 158 +++++++++++++++++++++++++
>>  lib/pci/rte_pci.h                    |  15 +++
>>  14 files changed, 1114 insertions(+)
>>  create mode 100644 kernel/linux/uapi/linux/vfio_tph.h
>>
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v5 0/4] An API for Cache Stashing with TPH
  2026-01-19  1:16     ` fengchengwen
@ 2026-04-14 17:02       ` Wathsala Vithanage
  0 siblings, 0 replies; 33+ messages in thread
From: Wathsala Vithanage @ 2026-04-14 17:02 UTC (permalink / raw)
  To: fengchengwen; +Cc: dev, nd

Hi,

Thank you for your enthusiasm :)
This patch is barred until kernel VFIO-TPH patch gets merged.

--wathsala

On 1/18/26 19:16, fengchengwen wrote:
> Hi Wathsala,
>
> Looking forward to your reply.
>
> Thanks
>
> On 1/8/2026 8:30 AM, fengchengwen wrote:
>> Hi Wathsala,
>>
>> Sorry to ask if this patchset is under development or stopped?
>>
>> PCIe Steer-tag provides a mechanism for precise data stash, which
>> delivers a positive performance gain and is therefore a valuable
>> feature I think.
>>
>> This patchset concludes with the statement: "the PMDs should only
>> enable TPH in device-specific mode", I don't think such restraints
>> should be made, the framework should be compatible with various
>> device capabilities:
>> 1. The PCIe protocol defines two modes: one is the interrupt-vector
>>     mode, and the other is the device-specific mode. A device may
>>     choose to support either one or both.
>> 2. If device support device-specific mode, it has a large degree of
>>     freedom to implement, such as locate ST table in self-defined
>>     place (just like '[PATCH v5 4/4] net/i40e: enable TPH in i40e'),
>>     and also support only stash part of data (e.g. only desc or header
>>     or even an offset data).
>> 3. If device only support interrupt-vector mode (which each TLP will
>>     use ST from an ST table entry), we could also support it, in this
>>     framework, it could only report basic stash capability.
>>
>> Thanks
>>
>> On 6/3/2025 6:38 AM, Wathsala Vithanage wrote:
>>> Today, DPDK applications benefit from Direct Cache Access (DCA) features
>>> like Intel DDIO and Arm's write-allocate-to-SLC. However, those features
>>> do not allow fine-grained control of direct cache access, such as
>>> stashing packets into upper-level caches (L2 caches) of a processor or
>>> the shared cache of a chiplet. PCIe TLP Processing Hints (TPH) addresses
>>> this need in a vendor-agnostic manner. TPH capability has existed since
>>> PCI Express Base Specification revision 3.0; today, numerous Network
>>> Interface Cards and interconnects from different vendors support TPH
>>> capability. TPH comprises a steering tag (ST) and a processing hint
>>> (PH). ST specifies the cache level of a CPU at which the data should be
>>> written to (or DCAed into), while PH is a hint provided by the PCIe
>>> requester to the completer on an upcoming traffic pattern. Some NIC
>>> vendors bundle TPH capability with fine-grained control over the type of
>>> objects that can be stashed into CPU caches, such as
>>>
>>> - Rx/Tx queue descriptors
>>> - Packet-headers
>>> - Packet-payloads
>>> - Data from a given offset from the start of a packet
>>>
>>> Note that stashable object types are outside the scope of the PCIe
>>> standard; therefore, vendors could support any combination of the above
>>> items as they see fit.
>>>
>>> To enable TPH and fine-grained packet stashing, this API extends the
>>> ethdev library and the PCI bus driver. In this design, the application
>>> provides hints to the PMD via the ethdev stashing API to indicate the
>>> underlying hardware at which CPU and cache level it prefers a packet to
>>> end up. Once the PMD receives a CPU and a cache-level combination (or a
>>> list of such combinations), it must extract the matching ST from the PCI
>>> bus driver for such combinations. The PCI bus driver implements the TPH
>>> functions in an OS specific way; for Linux, it depends on the TPH
>>> capabilities of the VFIO kernel driver.
>>>
>>> An application uses the cache stashing ethdev API by first calling the
>>> rte_eth_dev_stashing_capabilities_get() function to find out what object
>>> types can be stashed into a CPU cache by the NIC out of the object types
>>> in the bulleted list above. This function takes a port_id and a pointer
>>> to a uint16_t to report back the object type flags. PMD implements the
>>> stashing_capabilities_get function pointer in eth_dev_ops. If the
>>> underlying platform or the NIC does not support TPH, this function
>>> returns -ENOTSUP, and the application should consider any values stored
>>> in the object invalid.
>>>
>>> Once the application knows the supported object types that can be
>>> stashed, the next step is to set the steering tags for the packets
>>> associated with Rx and Tx queues via
>>> rte_eth_dev_stashing_{rx,tx}_config_set() ethdev library functions. Both
>>> functions have an identical signature, a port_id, a queue_id, and a
>>> config object. The port_id and the queue_id are used to locate the
>>> device and the queue. The config object is of type struct
>>> rte_eth_stashing_config, which specifies the lcore_id and the
>>> cache_level, indicating where objects from this queue should be stashed.
>>> The 'objects' field in the config sets the types of objects the
>>> application wishes to stash based on the capabilities found earlier.
>>> Note that if the 'objects' field includes the flag
>>> RTE_ETH_DEV_STASH_OBJECT_OFFSET, the 'offset' field must be used to set
>>> the desired offset. These functions invoke PMD implementations of the
>>> stashing functionality via the stashing_{rx,tx}_hints_set function
>>> callbacks in the eth_dev_ops, respectively.
>>>
>>> The PMD's implementation of the stashing_rx_hints_set() and
>>> stashing_tx_hints_set() functions is ultimately responsible for
>>> extracting the ST via the API provided by the PCI bus driver. Before
>>> extracting STs, the PMD should enable the TPH capability in the endpoint
>>> device by calling the rte_pci_tph_enable() function.  The application
>>> begins the ST extraction process by calling the rte_pci_tph_st_get()
>>> function in drivers/bus/pci/rte_bus_pci.h, which returns STs via the
>>> same rte_tph_info objects array passed into it as an argument.  Once PMD
>>> acquires ST, the stashing_{rx,tx}_hints_set callbacks implemented in the
>>> PMD are ready to set the ST as per the rte_eth_stashing_config object
>>> passed to them by the higher-level ethdev functions
>>> ret_eth_dev_stashing_{rx,tx}_hints(). As per the PCIe specification, STs
>>> can be placed on the MSI-X tables or in a device-specific location. For
>>> PMDs, setting the STs on queue contexts is the only viable way of using
>>> TPH. Therefore, the PMDs should only enable TPH in device-specific mode.
>>>
>>> V4->V5:
>>>   * Enable stashing-hints (TPH) in Intel i40e driver.
>>>   * Update exported symbol version from 25.03 to 25.07.
>>>   * Add TPH mode macros.
>>>
>>> V3->V4:
>>>   * Add VFIO IOCTL based ST extraction mechanism to Linux PCI bus driver
>>>   * Remove ST extraction via direct access to ACPI _DSM
>>>   * Replace rte_pci_extract_tph_st() with rte_pci_tph_st_get() in PCI
>>>     bus driver.
>>>
>>> Wathsala Vithanage (4):
>>>    pci: add non-merged Linux uAPI changes
>>>    bus/pci: introduce the PCIe TLP Processing Hints API
>>>    ethdev: introduce the cache stashing hints API
>>>    net/i40e: enable TPH in i40e
>>>
>>>   drivers/bus/pci/bsd/pci.c            |  43 +++++++
>>>   drivers/bus/pci/bus_pci_driver.h     |  52 ++++++++
>>>   drivers/bus/pci/linux/pci.c          | 100 ++++++++++++++++
>>>   drivers/bus/pci/linux/pci_init.h     |  14 +++
>>>   drivers/bus/pci/linux/pci_vfio.c     | 170 +++++++++++++++++++++++++++
>>>   drivers/bus/pci/private.h            |   8 ++
>>>   drivers/bus/pci/rte_bus_pci.h        |  67 +++++++++++
>>>   drivers/bus/pci/windows/pci.c        |  43 +++++++
>>>   drivers/net/intel/i40e/i40e_ethdev.c | 127 ++++++++++++++++++++
>>>   kernel/linux/uapi/linux/vfio_tph.h   | 102 ++++++++++++++++
>>>   lib/ethdev/ethdev_driver.h           |  66 +++++++++++
>>>   lib/ethdev/rte_ethdev.c              | 149 +++++++++++++++++++++++
>>>   lib/ethdev/rte_ethdev.h              | 158 +++++++++++++++++++++++++
>>>   lib/pci/rte_pci.h                    |  15 +++
>>>   14 files changed, 1114 insertions(+)
>>>   create mode 100644 kernel/linux/uapi/linux/vfio_tph.h
>>>

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2026-04-14 17:03 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20241021015246.304431-1-wathsala.vithanage@arm.com>
2025-05-17 15:17 ` [RFC PATCH v4 0/3] An API for Stashing Packets into CPU caches Wathsala Vithanage
2025-05-17 15:17   ` [RFC PATCH v4 1/3] pci: add non-merged Linux uAPI changes Wathsala Vithanage
2025-05-19  6:41     ` David Marchand
2025-05-19 17:55       ` Wathsala Wathawana Vithanage
2025-05-17 15:17   ` [RFC PATCH v4 2/3] bus/pci: introduce the PCIe TLP Processing Hints API Wathsala Vithanage
2025-05-19  6:44     ` David Marchand
2025-05-19 17:57       ` Wathsala Wathawana Vithanage
2025-05-17 15:17   ` [RFC PATCH v4 3/3] ethdev: introduce the cache stashing hints API Wathsala Vithanage
2025-05-20 13:53     ` Stephen Hemminger
2025-06-02 22:38 ` [PATCH v5 0/4] An API for Cache Stashing with TPH Wathsala Vithanage
2025-06-02 22:38   ` [PATCH v5 1/4] pci: add non-merged Linux uAPI changes Wathsala Vithanage
2025-06-02 23:11     ` Wathsala Wathawana Vithanage
2025-06-02 23:16       ` Wathsala Wathawana Vithanage
2025-06-04 20:43     ` Stephen Hemminger
2025-06-02 22:38   ` [PATCH v5 2/4] bus/pci: introduce the PCIe TLP Processing Hints API Wathsala Vithanage
2025-06-03  8:11     ` Morten Brørup
2025-06-04 16:54     ` Bruce Richardson
2025-06-04 22:52       ` Wathsala Wathawana Vithanage
2025-06-05  7:50         ` Bruce Richardson
2025-06-05 14:32           ` Wathsala Wathawana Vithanage
2025-06-05 10:18         ` Bruce Richardson
2025-06-05 14:25           ` Wathsala Wathawana Vithanage
2025-06-05 10:30     ` Bruce Richardson
2025-06-02 22:38   ` [PATCH v5 3/4] ethdev: introduce the cache stashing hints API Wathsala Vithanage
2025-06-03  8:43     ` Morten Brørup
2025-06-05 10:03     ` Bruce Richardson
2025-06-05 14:30       ` Wathsala Wathawana Vithanage
2025-06-02 22:38   ` [PATCH v5 4/4] net/i40e: enable TPH in i40e Wathsala Vithanage
2025-06-04 16:51   ` [PATCH v5 0/4] An API for Cache Stashing with TPH Stephen Hemminger
2025-06-04 22:24     ` Wathsala Wathawana Vithanage
2026-01-08  0:30   ` fengchengwen
2026-01-19  1:16     ` fengchengwen
2026-04-14 17:02       ` Wathsala Vithanage

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox