Netdev List
 help / color / mirror / Atom feed
* [PATCH v7 0/5] vfio/dma-buf: add TPH support for peer-to-peer access
@ 2026-06-11 16:11 Zhiping Zhang
  2026-06-11 16:11 ` [PATCH v7 1/5] net/mlx5: free mlx5_st_idx_data on final dealloc Zhiping Zhang
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Zhiping Zhang @ 2026-06-11 16:11 UTC (permalink / raw)
  To: netdev; +Cc: kvm, linux-rdma, linux-pci, dri-devel, Zhiping Zhang

This series adds TLP Processing Hints (TPH) support to the VFIO dma-buf
export path, allowing importing drivers (e.g. mlx5) to use the
exporter's steering tag when performing peer-to-peer DMA into a
VFIO-owned device.

There is no separate in-tree vendor kernel driver for the target device:
vfio-pci is the in-tree driver and the targeted device is managed
from userspace via VFIO passthrough. That is why the ST has to flow
through a uAPI: userspace owns the device and its ST table, so it is the
entity that can publish a meaningful value for a given dma-buf. The
kernel-visible participants are still in-tree: vfio-pci exports the
dma-buf and mlx5 imports it.

On the effect: the endpoint's PCIe ingress block uses the 8-bit ST as
an in-band instruction for the incoming P2P TLP -- selecting a target
cache partition and, on writes, an in-flight operation on the data
before it lands. The dma-buf callback keeps this opaque to the
framework -- only the producer (userspace owner of the VFIO device)
and the consumer (endpoint block) need to interpret the value. The
dma-buf get_tph callback itself is optional for workloads that depend
on the endpoint's in-flight operation that fallback does not produce
the same result.

The dma-buf hook is intentionally generic and discoverable rather than
a private side channel. The exporter owns the completing address
space for the dma-buf and decides whether it can provide a meaningful
ST/PH tuple for that completer; the dma-buf core keeps the tuple opaque,
and importers merely request the namespace they support and place the
returned value on generated TLPs. Exporters that cannot derive a
meaningful tuple simply return -EOPNOTSUPP.

Patch 1 is a pre-existing fix split out from the series:
mlx5_st_dealloc_index() removed the xarray entry but never freed the
backing struct, so repeated alloc/dealloc cycles leaked memory.
Patch 2 adds small PCI/TPH type helpers so drivers can query the enabled
TPH requester mode and the device's TPH Completer Supported field
without reaching into pci_dev internals (and so callers in
CONFIG_PCIE_TPH=n builds get a clean fallback).
Patch 3 adds the optional dma_buf_ops::get_tph callback plus the
dma_buf_get_tph() importer wrapper so importers can fetch TPH metadata
from an exporter under dmabuf->resv.
Patch 4 implements get_tph in vfio-pci and adds the new uAPI
(VFIO_DEVICE_FEATURE_DMA_BUF_TPH) for userspace to attach the metadata.
Patch 5 wires up the mlx5 RDMA driver as a consumer.

Build-tested with both CONFIG_PCIE_TPH=y and CONFIG_PCIE_TPH=n.
Functional validation on the target topology: PCIe analyzer captures
on the P2P TLPs confirm the ST emitted by mlx5 matches the value
published through VFIO_DEVICE_FEATURE_DMA_BUF_TPH, and the end-to-end
P2P workload only produces results consistent with the endpoint's
ST-selected in-flight operation. For example, with userspace
publishing 8-bit ST=0xf0 and PH=2, an analyzer capture of a peer-to-
peer MWr64 shows "STP MWr64 TC=0 OHC=2 ..." followed by "OHC-B
ST=F0h PH=2 HV=1":
(TLP Captures)
08000260 -> STP MWr64 TC=0 OHC=2 TS=0 Attr=0 L=8
F0000004 -> RID=4h:0h.0h EP- Tag=F0h
E0200000 -> AddrH=000020E0h
00080006 -> AddrL=06000800h
90F00000 -> OHC-B ST=F0h PH=2 HV=1 AMA=0 AV-

Previous link:
v6: https://lore.kernel.org/dri-devel/20260608185646.4085127-1-zhipingz@meta.com/
v5: https://lore.kernel.org/dri-devel/20260526144401.1485788-1-zhipingz@meta.com/
v4: https://lore.kernel.org/linux-pci/20260519201401.1558410-1-zhipingz@meta.com/
v3: https://lore.kernel.org/linux-pci/20260512184755.4137227-1-zhipingz@meta.com/
v2: https://lore.kernel.org/linux-pci/20260430200704.352228-1-zhipingz@meta.com/

Zhiping Zhang (5):
  net/mlx5: free mlx5_st_idx_data on final dealloc
  PCI/TPH: Add requester/completer type helpers
  dma-buf: add optional get_tph() callback
  vfio/pci: implement get_tph and DMA_BUF_TPH feature
  RDMA/mlx5: get tph for p2p access when registering dma-buf mr

 drivers/dma-buf/dma-buf.c                     |  25 ++++
 drivers/infiniband/core/frmr_pools.c          |  20 +++-
 drivers/infiniband/hw/mlx5/mr.c               | 111 +++++++++++++++++-
 .../net/ethernet/mellanox/mlx5/core/lib/st.c  |  50 ++++++--
 drivers/pci/tph.c                             |  43 +++++++
 drivers/vfio/pci/vfio_pci_core.c              |   3 +
 drivers/vfio/pci/vfio_pci_dmabuf.c            |  94 ++++++++++++++-
 drivers/vfio/pci/vfio_pci_priv.h              |  12 ++
 include/linux/dma-buf.h                       |  21 ++++
 include/linux/mlx5/driver.h                   |  12 ++
 include/linux/pci-tph.h                       |   8 ++
 include/rdma/frmr_pools.h                     |   5 +-
 include/uapi/linux/vfio.h                     |  37 ++++++
 13 files changed, 421 insertions(+), 20 deletions(-)

-- 
2.53.0-Meta

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-06-12 17:10 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-11 16:11 [PATCH v7 0/5] vfio/dma-buf: add TPH support for peer-to-peer access Zhiping Zhang
2026-06-11 16:11 ` [PATCH v7 1/5] net/mlx5: free mlx5_st_idx_data on final dealloc Zhiping Zhang
2026-06-11 16:11 ` [PATCH v7 2/5] PCI/TPH: Add requester/completer type helpers Zhiping Zhang
2026-06-12 16:52   ` Alex Williamson
2026-06-11 16:11 ` [PATCH v7 3/5] dma-buf: add optional get_tph() callback Zhiping Zhang
2026-06-11 16:11 ` [PATCH v7 4/5] vfio/pci: implement get_tph and DMA_BUF_TPH feature Zhiping Zhang
2026-06-12 17:10   ` Alex Williamson
2026-06-11 16:11 ` [PATCH v7 5/5] RDMA/mlx5: get tph for p2p access when registering dma-buf mr Zhiping Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox