Re: [PATCH v14 8/8] vfio/pci: Virtualize PCIe TPH capability registers

Linux PCI subsystem development
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Chengwen Feng" <fengchengwen@huawei.com>
Cc: linux-pci@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH v14 8/8] vfio/pci: Virtualize PCIe TPH capability registers
Date: Thu, 28 May 2026 16:42:22 +0000	[thread overview]
Message-ID: <20260528164223.184121F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260528124649.14732-9-fengchengwen@huawei.com>

Thank you for your contribution! Sashiko AI review found 8 potential issue(s) to consider:
- [Critical] Unconditional overwrite of hardware ST Table destroys guest's native configuration.
- [High] Silent dropping of partial writes and ST table configuration in TPH config space.
- [High] Modifying ST Table entries while the TPH Requester is enabled.
- [High] Concurrent modification of `struct pci_dev` bitfields (`tph_enabled` vs. `msix_enabled`) without a shared lock.
- [Medium] Mixing `goto`-based error handling and `guard()` scope-based cleanup in the same function.
- [Low] Masking operations on an uninitialized stack variable when `vfio_direct_config_read()` fails.
- [Low] Destroying the `tph_lock` mutex before the final operation it serializes.
- [Low] Comment incorrectly identifies the `PCI_TPH_CTRL` offset.
--

commit 597ad56e8c1a479406f7f445961a108159c417c8
Author: Chengwen Feng <fengchengwen@huawei.com>

vfio/pci: Virtualize PCIe TPH capability registers

Implement read/write virtualization for PCIe TPH Extended Capability:
1. Read path: Mask out extended requester bit if hardware does not support
   extended ST, to hide unavailable capability from userspace.
2. Write path: Validate TPH mode and requester configuration, mediate TPH
   enable/disable requests. Sync shadow ST table to hardware after
   successful TPH enabling.

> diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
> index a10ed733f0e3a..bed3794e6033d 100644
> --- a/drivers/vfio/pci/vfio_pci_config.c
> +++ b/drivers/vfio/pci/vfio_pci_config.c

[ ... ]

> @@ -313,6 +317,78 @@ static int vfio_virt_config_read(struct vfio_pci_core_device *vdev, int pos,
>  	return count;
>  }
>  
> +static int vfio_pci_tph_config_read(struct vfio_pci_core_device *vdev, int pos,
> +				    int count, struct perm_bits *perm,
> +				    int offset, __le32 *val)
> +{
> +	const int target_byte = PCI_TPH_CAP + 1;
> +	struct pci_dev *pdev = vdev->pdev;
> +	const u8 ext_bit = BIT(0);
> +	u8 *data = (u8 *)val;
> +	int byte_off;
> +	int ret;
> +
> +	if (pdev->tph_ext_requester || offset > target_byte ||
> +		offset + count <= target_byte)
> +		return vfio_direct_config_read(vdev, pos, count, perm,
> +					       offset, val);
> +
> +	ret = vfio_direct_config_read(vdev, pos, count, perm, offset, val);
> +
> +	/* Mask out TPH requester capability */
> +	byte_off = target_byte - offset;
> +	/* Defensive check: prevent out-of-bounds access */
> +	if (byte_off < count)
> +		data[byte_off] &= ~ext_bit;

[Severity: Low]
If vfio_direct_config_read() returns a negative error code, does this execute
the bitwise mask on uninitialized stack data in val before returning? Even
though the upper layers discard the value on error, perhaps this should check
for ret < 0 before proceeding?

> +
> +	return ret;
> +}
> +
> +static int vfio_pci_tph_config_write(struct vfio_pci_core_device *vdev, int pos,
> +				     int count, struct perm_bits *perm,
> +				     int offset, __le32 val)
> +{
> +	struct pci_dev *pdev = vdev->pdev;
> +	u32 data = le32_to_cpu(val);
> +	enum tph_req_policy policy;
> +	u8 mode, req;
> +	int i, ret;
> +
> +	if (!enable_unsafe_tph)
> +		return count;
> +
> +	/*
> +	 * TPH_CTRL (offset 4) register layout:
> +	 *   Byte 0 (offset 4): Bits [2:0] - ST Mode selection
> +	 *   Byte 1 (offset 5): Bits [1:0] - TPH Requester enable

[Severity: Low]
This isn't a bug, but the comment states the TPH_CTRL register is at offset 4.
Isn't PCI_TPH_CTRL actually at offset 8, while offset 4 is the PCI_TPH_CAP
register?

> +	 *
> +	 * Only intercept writes with count >= 2 which cover both control
> +	 * fields. Single-byte partial writes do not affect combined
> +	 * configuration.
> +	 */
> +	if (offset != PCI_TPH_CTRL || count < 2)
> +		return count;

[Severity: High]
By returning count directly here, doesn't this silently drop all byte-wise
writes to the control register, as well as any writes to the ST Table if it is
located within the TPH capability space (PCI_TPH_LOC_CAP)? The guest might
assume the writes succeeded, leading to a permanent hardware state mismatch.
Should this fallback to writing the unhandled cases instead of returning count
early?

> +
> +	guard(mutex)(&vdev->tph_lock);
> +
> +	mode = FIELD_GET(PCI_TPH_CTRL_MODE_SEL_MASK, data);
> +	req = FIELD_GET(PCI_TPH_CTRL_REQ_EN_MASK, data);
> +	if (req == PCI_TPH_REQ_TPH_ONLY || req == PCI_TPH_REQ_EXT_TPH) {
> +		policy = (req == PCI_TPH_REQ_TPH_ONLY) ?
> +			 TPH_REQ_STANDARD : TPH_REQ_EXTENDED;
> +		ret = pcie_enable_tph(pdev, mode, policy);

[Severity: High]
Is there a data race when setting pdev->tph_enabled inside pcie_enable_tph()?
The tph_enabled flag shares the same unsigned int word in struct pci_dev as
msix_enabled. Since this path is only protected by vdev->tph_lock, if userspace
concurrently triggers VFIO_DEVICE_SET_IRQS (which mutates msix_enabled under
vdev->igate), could the compiler's read-modify-write on the shared bitfield
word silently corrupt these PCI device flags?

> +		if (ret == 0 && vdev->tph_st_shadow) {
> +			for (i = 0; i < vdev->tph_st_entries; i++)
> +				pcie_tph_set_st_entry(pdev, i,
> +						      vdev->tph_st_shadow[i]);
> +		}

[Severity: High]
The PCIe Base Specification (6.17.3) states that software must not modify the
Steering Tag Table while the TPH Requester Enable bit is 1. Since
pcie_enable_tph() sets the enable bit in hardware before this loop writes to
the ST Table, could this cause undefined hardware behavior or dropped TLPs?

[Severity: Critical]
Also, if a generic unmodified VM configures the ST Table natively via MMIO
without using the custom VFIO ioctl, tph_st_shadow will contain only its
default zeros. Won't this unconditional iteration wipe out the guest's valid
native ST tags with zeros at the exact moment TPH is enabled, permanently
corrupting the device state?

> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 851e4d0828f2d..2a5cb945275ea 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c

[ ... ]

> @@ -1605,6 +1608,9 @@ static void vfio_pci_tph_deinit(struct vfio_pci_core_device *vdev)
>  	vdev->tph_st_entries = 0;
>  
>  	mutex_destroy(&vdev->tph_lock);
> +
> +	/* Reset TPH status on session exit */
> +	pcie_disable_tph(vdev->pdev);
>  }

[Severity: Low]
Does it make sense to destroy the vdev->tph_lock mutex before calling
pcie_disable_tph()? Even though this runs sequentially during device teardown,
it seems semantically inverted to destroy the lock before manipulating the
hardware state it protects.

[ ... ]

> @@ -1645,6 +1651,13 @@ static int vfio_pci_core_feature_tph_st_config(
>  		config.reserved != 0)
>  		return -EINVAL;
>  
>  	guard(mutex)(&vdev->tph_lock);
>  
>  	uptr = u64_to_user_ptr(config.data_uptr);
>  	sts = memdup_array_user(uptr, config.count, sizeof(u16));
>  	sz = config.count * sizeof(u16);
>  	if (IS_ERR(sts))
>  		return PTR_ERR(sts);
>  
>  	if (pcie_tph_enabled_req_type(pdev) == PCI_TPH_REQ_DISABLE) {
>  		memcpy(&vdev->tph_st_shadow[config.index], sts, sz);
>  		kfree(sts);
>  		return 0;
>  	}
>  
>  	for (i = 0; i < config.count; i++) {
>  		idx = config.index + i;
>  		ret = pcie_tph_set_st_entry(pdev, idx, sts[i]);
>  		if (ret)
>  			goto rollback;
>  	}

[Severity: High]
This loop calls pcie_tph_set_st_entry() to modify hardware ST entries while
pcie_tph_enabled_req_type(pdev) != PCI_TPH_REQ_DISABLE. Doesn't this violate
the PCIe spec by modifying the Steering Tag Table while the TPH Requester Enable
bit is 1?

[Severity: Medium]
According to the cleanup subsystem guidelines, mixing goto-based error handling
labels with scope-based guard() declarations in the same function is not allowed.
Since guard(mutex)(&vdev->tph_lock) is used at the start of this block, and
goto rollback is used here for error handling, could this be refactored to
fully adopt scope-based cleanup like __free(kfree) for the sts array?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260528124649.14732-1-fengchengwen@huawei.com?part=8

next prev parent reply	other threads:[~2026-05-28 16:42 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-28 12:46 [PATCH v14 0/8] vfio/pci: Add PCIe TPH support Chengwen Feng
2026-05-28 12:46 ` [PATCH v14 1/8] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction Chengwen Feng
2026-05-28 12:46 ` [PATCH v14 2/8] PCI/TPH: Cache TPH requester capability at probe time Chengwen Feng
2026-05-28 12:46 ` [PATCH v14 3/8] PCI/TPH: Add requester selection policy to pcie_enable_tph() Chengwen Feng
2026-05-28 12:46 ` [PATCH v14 4/8] PCI/TPH: Add requester policy to pcie_tph_get_cpu_st() Chengwen Feng
2026-05-28 12:46 ` [PATCH v14 5/8] PCI/TPH: expose the enabled TPH requester type Chengwen Feng
2026-05-28 12:46 ` [PATCH v14 6/8] vfio/pci: Add VFIO_DEVICE_FEATURE_TPH_ST_CONFIG for PCIe TPH ST configuration Chengwen Feng
2026-05-28 15:17   ` sashiko-bot
2026-05-28 12:46 ` [PATCH v14 7/8] vfio/pci: Add VFIO_DEVICE_FEATURE_TPH_CPU_ST to query TPH steering tag Chengwen Feng
2026-05-28 12:46 ` [PATCH v14 8/8] vfio/pci: Virtualize PCIe TPH capability registers Chengwen Feng
2026-05-28 16:42   ` sashiko-bot [this message]
2026-06-01 15:58 ` [PATCH v14 0/8] vfio/pci: Add PCIe TPH support Alex Williamson
2026-06-02 14:46   ` fengchengwen
2026-06-02 23:08     ` Alex Williamson
2026-06-03  0:34       ` fengchengwen
2026-06-03  0:45         ` Jason Gunthorpe
2026-06-03  1:25           ` fengchengwen
2026-06-03 18:53           ` Alex Williamson
2026-06-04 18:33             ` Jason Gunthorpe
2026-06-04 20:46               ` Alex Williamson
2026-06-03 17:58         ` Alex Williamson
2026-06-04  1:58           ` fengchengwen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260528164223.184121F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=fengchengwen@huawei.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox