From: Alex Williamson <alex@shazbot.org>
To: Zhiping Zhang <zhipingz@meta.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>, Leon Romanovsky <leon@kernel.org>,
Sumit Semwal <sumit.semwal@linaro.org>,
Christian Konig <christian.koenig@amd.com>,
Bjorn Helgaas <helgaas@kernel.org>, <kvm@vger.kernel.org>,
<linux-rdma@vger.kernel.org>, <linux-pci@vger.kernel.org>,
<netdev@vger.kernel.org>, <dri-devel@lists.freedesktop.org>,
Keith Busch <kbusch@kernel.org>, Yochai Cohen <yochai@nvidia.com>,
Yishai Hadas <yishaih@nvidia.com>,
alex@shazbot.org
Subject: Re: [PATCH v6 4/5] vfio/pci: implement get_tph and DMA_BUF_TPH feature
Date: Tue, 9 Jun 2026 15:46:34 -0600 [thread overview]
Message-ID: <20260609154634.46ee8328@shazbot.org> (raw)
In-Reply-To: <20260608185646.4085127-5-zhipingz@meta.com>
On Mon, 8 Jun 2026 11:56:41 -0700
Zhiping Zhang <zhipingz@meta.com> wrote:
> @@ -327,12 +358,71 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
> err_free_phys:
> kfree(priv->phys_vec);
> err_free_priv:
> + mutex_destroy(&priv->tph_lock);
> kfree(priv);
> err_free_ranges:
> kfree(dma_ranges);
> return ret;
> }
>
> +int vfio_pci_core_feature_dma_buf_tph(struct vfio_pci_core_device *vdev,
> + u32 flags,
> + struct vfio_device_feature_dma_buf_tph __user *arg,
> + size_t argsz)
> +{
> + struct vfio_device_feature_dma_buf_tph set_tph;
> + struct vfio_pci_dma_buf *priv;
> + struct dma_buf *dmabuf;
> + int ret;
> +
> + if (!pcie_tph_supported(vdev->pdev))
> + return -EOPNOTSUPP;
This tests for the TPH capability, but the TPH capability is only a
requirement for functions that generate TLPs with TPH, ie. a requester.
This feature is about providing TPH steering tags when the device is a
completer. Bits 13:12 of the Device Capabilities 2 register indicate
if the device is supported as a TPH completer.
Additionally these bits indicate if the device supports standard and
extended TPH, which means we should not only fail if the device reports
00b, but should reject extended steering tags unless the device reports
11b.
> +
> + ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET,
> + sizeof(set_tph));
> + if (ret != 1)
> + return ret;
> +
> + if (copy_from_user(&set_tph, arg, sizeof(set_tph)))
> + return -EFAULT;
> +
> + if (set_tph.flags & ~(VFIO_DMA_BUF_TPH_ST | VFIO_DMA_BUF_TPH_ST_EXT))
> + return -EINVAL;
> +
> + /* PCIe TLP Processing Hint is a 2-bit field. */
> + if (set_tph.ph & ~0x3)
> + return -EINVAL;
Sashiko notes what appears to be a false positive here, the uAPI states
ph is to be in the range [0, 3] and nowhere else says that it's allowed
to be garbage for a clear operation.
> +
> + dmabuf = dma_buf_get(set_tph.dmabuf_fd);
> + if (IS_ERR(dmabuf))
> + return PTR_ERR(dmabuf);
> +
> + if (dmabuf->ops != &vfio_pci_dmabuf_ops) {
> + ret = -EINVAL;
> + goto out_put;
> + }
> +
> + priv = dmabuf->priv;
> + if (priv->vdev != vdev) {
> + ret = -EINVAL;
> + goto out_put;
> + }
Sashiko notes this may need READ_ONCE()/WRITE_ONCE() semantics, but
that may get fixed as part of the resv lock usage.
> +
> + scoped_guard(mutex, &priv->tph_lock) {
> + priv->tph_st = set_tph.steering_tag;
> + priv->tph_st_ext = set_tph.steering_tag_ext;
> + priv->tph_ph = set_tph.ph;
> + priv->tph_st_valid = !!(set_tph.flags & VFIO_DMA_BUF_TPH_ST);
> + priv->tph_st_ext_valid =
> + !!(set_tph.flags & VFIO_DMA_BUF_TPH_ST_EXT);
> + }
> + ret = 0;
> +
> +out_put:
> + dma_buf_put(dmabuf);
> + return ret;
> +}
> +
> void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked)
> {
> struct vfio_pci_dma_buf *priv;
...
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 5de618a3a5ee..0ca26721849b 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -1534,6 +1534,51 @@ struct vfio_device_feature_dma_buf {
> */
> #define VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 12
>
> +/**
> + * Upon VFIO_DEVICE_FEATURE_SET associate TPH (TLP Processing Hints) metadata
> + * with a vfio-exported dma-buf. The dma-buf must have been created by
> + * VFIO_DEVICE_FEATURE_DMA_BUF on this device, and the device must expose the
> + * TPH Extended Capability (otherwise the ioctl returns -EOPNOTSUPP).
> + *
> + * dmabuf_fd is the file descriptor returned by VFIO_DEVICE_FEATURE_DMA_BUF.
> + *
> + * 8-bit ST (steering_tag) and 16-bit Extended ST (steering_tag_ext) are
> + * distinct namespaces in the PCIe TPH ST table and may both be present with
> + * different values. Userspace should populate the value(s) it has from the
> + * firmware ST table for this device and set the matching VFIO_DMA_BUF_TPH_ST /
> + * VFIO_DMA_BUF_TPH_ST_EXT bit in @flags. An importer requests a specific
> + * width and receives the matching value; if the requested width is not
> + * present, the importer is told TPH is unavailable for this dma-buf.
> + *
> + * This publishes the PCI SIG-defined ST/PH tuple for a VFIO-owned PCIe
> + * completer. The dma-buf core treats the tuple as opaque completer-owned
> + * metadata; an importer simply requests the namespace it supports and places
> + * the returned value on generated TLPs.
> + *
> + * @flags == 0 clears any previously published metadata.
This is overselling the invalidation. It only flags the fields as
invalid for future get_tph() requests, it does nothing to clear
previously published metadata from importers. Thanks,
Alex
> + *
> + * ph is the 2-bit TLP Processing Hint and must be in the range [0, 3].
> + *
> + * Userspace is responsible for setting TPH on the dma-buf before handing the
> + * fd to the importer. Calling SET again replaces the previously published
> + * values; racing a SET against an importer that is already consuming the
> + * dma-buf is a userspace ordering problem.
> + *
> + * Return: 0 on success, -errno on failure.
> + */
> +#define VFIO_DEVICE_FEATURE_DMA_BUF_TPH 13
> +
> +#define VFIO_DMA_BUF_TPH_ST (1 << 0) /* steering_tag valid */
> +#define VFIO_DMA_BUF_TPH_ST_EXT (1 << 1) /* steering_tag_ext valid */
> +
> +struct vfio_device_feature_dma_buf_tph {
> + __s32 dmabuf_fd;
> + __u32 flags;
> + __u16 steering_tag_ext;
> + __u8 steering_tag;
> + __u8 ph;
> +};
> +
> /* -------- API for Type1 VFIO IOMMU -------- */
>
> /**
next prev parent reply other threads:[~2026-06-09 21:46 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20260608185646.4085127-1-zhipingz@meta.com>
[not found] ` <20260608185646.4085127-4-zhipingz@meta.com>
2026-06-09 8:10 ` [PATCH v6 3/5] dma-buf: add optional get_tph() callback Christian König
2026-06-09 14:38 ` Zhiping Zhang
[not found] ` <20260608185646.4085127-5-zhipingz@meta.com>
2026-06-09 8:12 ` [PATCH v6 4/5] vfio/pci: implement get_tph and DMA_BUF_TPH feature Christian König
2026-06-09 14:39 ` Zhiping Zhang
2026-06-09 21:46 ` Alex Williamson [this message]
2026-06-09 23:48 ` Zhiping Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260609154634.46ee8328@shazbot.org \
--to=alex@shazbot.org \
--cc=christian.koenig@amd.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=helgaas@kernel.org \
--cc=jgg@ziepe.ca \
--cc=kbusch@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=leon@kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=sumit.semwal@linaro.org \
--cc=yishaih@nvidia.com \
--cc=yochai@nvidia.com \
--cc=zhipingz@meta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox