All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alex Williamson <alex@shazbot.org>
To: Chengwen Feng <fengchengwen@huawei.com>
Cc: <jgg@ziepe.ca>, <wathsala.vithanage@arm.com>,
	<helgaas@kernel.org>, <wei.huang2@amd.com>,
	<wangzhou1@hisilicon.com>, <wangyushan12@huawei.com>,
	<liuyonglong@huawei.com>, <kvm@vger.kernel.org>,
	<linux-pci@vger.kernel.org>,
	alex@shazbot.org
Subject: Re: [PATCH v11 5/5] vfio/pci: Add VFIO_DEVICE_FEATURE_TPH_ST for TPH ST entry management
Date: Thu, 21 May 2026 22:10:08 -0600	[thread overview]
Message-ID: <20260521221008.59ddf977@shazbot.org> (raw)
In-Reply-To: <20260518071701.25177-6-fengchengwen@huawei.com>

On Mon, 18 May 2026 15:17:01 +0800
Chengwen Feng <fengchengwen@huawei.com> wrote:

> Add VFIO_DEVICE_FEATURE_TPH_ST to allow userspace to manage PCIe TPH
> Steering Tag entries.
> 
> SET: Program contiguous ST entries when ST table resides in TPH Capability
>      or MSI-X table. U32_MAX CPU ID clears the corresponding ST entry.
> GET: Retrieve ST values per CPU ID, only available in DS mode.
> 
> Both operations are only valid when TPH is enabled on the device.

What?  That doesn't align with the hardware programming model.  The
specification even suggests that the device should be quiesced or TPH
disabled during programming of the steering tags for deterministic
behavior.

> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> ---
>  drivers/vfio/pci/vfio_pci_core.c | 91 ++++++++++++++++++++++++++++++++
>  include/uapi/linux/vfio.h        | 41 ++++++++++++++
>  2 files changed, 132 insertions(+)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 0d0140202280..13a22c1d6ea6 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -1558,6 +1558,95 @@ static int vfio_pci_core_feature_token(struct vfio_pci_core_device *vdev,
>  	return 0;
>  }
>  
> +static int vfio_pci_core_feature_tph_st(struct vfio_pci_core_device *vdev,
> +					u32 flags,
> +					struct vfio_device_feature_tph_st __user *arg,
> +					size_t argsz)
> +{
> +	bool is_set = !!(flags & VFIO_DEVICE_FEATURE_SET);
> +	struct vfio_device_feature_tph_st tph_st;
> +	struct pci_dev *pdev = vdev->pdev;
> +	enum tph_mem_type mtype;
> +	int i, j, ret;
> +	u32 *cpus;
> +	u16 st;
> +
> +	guard(mutex)(&vdev->tph_lock);

Wrong place for this lock.  Why do we even need it?  If a user races
SET/GET with enable/disable, that's their problem.

> +
> +	ret = vfio_check_feature(flags, argsz,
> +				 VFIO_DEVICE_FEATURE_GET |
> +				 VFIO_DEVICE_FEATURE_SET,
> +				 sizeof(tph_st));
> +	if (ret <= 0)
> +		return ret;
> +
> +	if (!enable_unsafe_tph ||

Why did we allow the feature to be probe if it can't be used?

> +		pcie_tph_enabled_mode(pdev) == PCI_TPH_ST_NS_MODE)

Here's that ambiguity of reporting NS_MODE for disabled, but as above
it's a bogus requirement.

> +		return -EOPNOTSUPP;
> +	if (!is_set && pcie_tph_enabled_mode(pdev) != PCI_TPH_ST_DS_MODE)

Why?  Let the interface be symmetric that GET works regardless of mode.

> +		return -EOPNOTSUPP;
> +	if (is_set && pcie_tph_get_st_table_loc(pdev) == PCI_TPH_LOC_NONE)
> +		return -EOPNOTSUPP;
> +
> +	if (copy_from_user(&tph_st, arg, sizeof(tph_st)))
> +		return -EFAULT;
> +
> +	if (tph_st.count == 0 || tph_st.count > VFIO_TPH_ST_MAX_COUNT ||
> +		tph_st.flags > VFIO_TPH_ST_MEM_TYPE_PM)

Wrong operation for a flags field.

> +		return -EINVAL;
> +	if (!is_set && tph_st.index != 0)

We can only GET index 0?  Why?

> +		return -EINVAL;
> +	if (is_set && (tph_st.index >= VFIO_TPH_ST_MAX_COUNT ||
> +		       tph_st.index + tph_st.count > VFIO_TPH_ST_MAX_COUNT))

DS can have more that 2048 steering tags.

> +		return -EINVAL;
> +
> +	cpus = memdup_array_user(&arg->data, tph_st.count, sizeof(*cpus));

Use an __aligned_u64 data_uptr like iommufd does and avoid copying this
into the kernel.  Isn't this steering tags, why's it named cpus?

> +	if (IS_ERR(cpus))
> +		return PTR_ERR(cpus);
> +
> +	mtype = tph_st.flags & VFIO_TPH_ST_MEM_TYPE_PM ? TPH_MEM_TYPE_PM :
> +							 TPH_MEM_TYPE_VM;

Then why did we use two flag bits for this?

> +	if (!is_set) {
> +		for (i = 0; i < tph_st.count; i++) {
> +			ret = pcie_tph_get_cpu_st(pdev, mtype, cpus[i], &st);

My understanding was that GET would return the translated values from a
previous SET, not translate on the fly.  I think in order to support
writing STs before enabling TPH, we're going to need to store them
somewhere anyway so that we can push them to the correct location when
TPH is enabled.  That means there a buffer for SET/GET regardless of
the mode that's enabled.

> +			if (ret)
> +				goto out;
> +			cpus[i] = st;
> +		}
> +		goto out;
> +	}
> +
> +	for (i = 0; i < tph_st.count; i++) {
> +		if (cpus[i] == U32_MAX) {
> +			ret = pcie_tph_set_st_entry(pdev, tph_st.index + i, 0);

Why do we need a magic value to set it to zero?  Hardware doesn't have
a "clear ST register value".

> +			if (ret)
> +				goto out;
> +			continue;
> +		}
> +
> +		ret = pcie_tph_get_cpu_st(pdev, mtype, cpus[i], &st);
> +		if (ret)
> +			goto out;
> +		ret = pcie_tph_set_st_entry(pdev, tph_st.index + i, st);
> +		if (ret)
> +			goto out;

Clearly broken for DS mode.

> +	}
> +
> +out:
> +	if (!is_set && !ret) {
> +		if (copy_to_user(&arg->data, cpus,
> +			tph_st.count * sizeof(*cpus)))
> +			ret = -EFAULT;
> +	}
> +	if (is_set && ret) {
> +		/* Roll back previously programmed entries to 0 */
> +		for (j = 0; j < i; j++)
> +			pcie_tph_set_st_entry(pdev, tph_st.index + j, 0);

The magic value is zero, not U32_MAX.  Thanks,

Alex

> +	}
> +	kfree(cpus);
> +	return ret;
> +}
> +
>  int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
>  				void __user *arg, size_t argsz)
>  {
> @@ -1576,6 +1665,8 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
>  		return vfio_pci_core_feature_token(vdev, flags, arg, argsz);
>  	case VFIO_DEVICE_FEATURE_DMA_BUF:
>  		return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz);
> +	case VFIO_DEVICE_FEATURE_TPH_ST:
> +		return vfio_pci_core_feature_tph_st(vdev, flags, arg, argsz);
>  	default:
>  		return -ENOTTY;
>  	}
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 5de618a3a5ee..aca39d4e5307 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -1534,6 +1534,47 @@ struct vfio_device_feature_dma_buf {
>   */
>  #define VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2  12
>  
> +/**
> + * VFIO_DEVICE_FEATURE_TPH_ST - Get/Set PCIe TPH Steering Tag (ST) entries
> + *
> + * Provides userspace interface to manage PCIe TPH ST table entries.
> + * This feature is only available when device TPH is enabled.
> + *
> + * Upon VFIO_DEVICE_FEATURE_SET:
> + *   Program contiguous ST table entries from the starting @index.
> + *   Valid only for hardware with ST table located in TPH Capability
> + *   space or MSI-X table. If an entry CPU ID is specified as U32_MAX,
> + *   the corresponding ST entry will be cleared. @index and @count define
> + *   the contiguous entry range to be programmed.
> + *   If any entry programming fails, the operation will roll back and
> + *   clear all entries that were successfully programmed before the error.
> + *
> + * Upon VFIO_DEVICE_FEATURE_GET:
> + *   Retrieve the ST value mapped to each given CPU ID in the @data array.
> + *   Userspace fills @data with CPU ID array, the interface returns each
> + *   CPU's corresponding ST value back in place.
> + *   Valid only when TPH DS mode is enabled.
> + *
> + * @flags: Operation flags (VFIO_TPH_ST_MEM_TYPE_*).
> + * @index: Starting ST entry index, only valid for FEATURE_SET.
> + * @count: Number of contiguous entries to access.
> + * @data: Array of CPU IDs for both SET and GET. On SET it programs ST for
> + *        each CPU; on GET it returns the mapped ST value of each CPU.
> + *
> + * This feature is gated by enable_unsafe_tph module parameter.
> + */
> +#define VFIO_DEVICE_FEATURE_TPH_ST	13
> +
> +struct vfio_device_feature_tph_st {
> +	__u32 flags;
> +#define VFIO_TPH_ST_MEM_TYPE_VM		(0U << 0)
> +#define VFIO_TPH_ST_MEM_TYPE_PM		(1U << 0)
> +	__u16 index;
> +	__u16 count;
> +#define VFIO_TPH_ST_MAX_COUNT		2048
> +	__u32 data[] __counted_by(count);
> +};
> +
>  /* -------- API for Type1 VFIO IOMMU -------- */
>  
>  /**


  parent reply	other threads:[~2026-05-22  4:10 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-18  7:16 [PATCH v11 0/5] vfio/pci: Add PCIe TPH support Chengwen Feng
2026-05-18  7:16 ` [PATCH v11 1/5] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction Chengwen Feng
2026-05-18  7:16 ` [PATCH v11 2/5] PCI/TPH: Export pcie_tph_get_st_modes() for external use Chengwen Feng
2026-05-18  7:57   ` sashiko-bot
2026-05-22  4:10   ` Alex Williamson
2026-05-18  7:16 ` [PATCH v11 3/5] PCI/TPH: Add pcie_tph_enabled_mode() helper Chengwen Feng
2026-05-18  8:14   ` sashiko-bot
2026-05-22  4:10   ` Alex Williamson
2026-05-22  9:18     ` fengchengwen
2026-05-22 14:00       ` Alex Williamson
2026-05-18  7:17 ` [PATCH v11 4/5] vfio/pci: Add PCIe TPH configuration space virtualization Chengwen Feng
2026-05-18  8:48   ` sashiko-bot
2026-05-22  4:10   ` Alex Williamson
2026-05-22  9:39     ` fengchengwen
2026-05-22 14:09       ` Alex Williamson
2026-05-18  7:17 ` [PATCH v11 5/5] vfio/pci: Add VFIO_DEVICE_FEATURE_TPH_ST for TPH ST entry management Chengwen Feng
2026-05-18  9:22   ` sashiko-bot
2026-05-22  4:10   ` Alex Williamson [this message]
2026-05-22 10:04     ` fengchengwen
2026-05-22 14:27       ` Alex Williamson
2026-05-26  6:29         ` fengchengwen
2026-05-22  1:31 ` [PATCH v11 0/5] vfio/pci: Add PCIe TPH support fengchengwen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260521221008.59ddf977@shazbot.org \
    --to=alex@shazbot.org \
    --cc=fengchengwen@huawei.com \
    --cc=helgaas@kernel.org \
    --cc=jgg@ziepe.ca \
    --cc=kvm@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=liuyonglong@huawei.com \
    --cc=wangyushan12@huawei.com \
    --cc=wangzhou1@hisilicon.com \
    --cc=wathsala.vithanage@arm.com \
    --cc=wei.huang2@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.