Linux PCI subsystem development
 help / color / mirror / Atom feed
From: Alex Williamson <alex@shazbot.org>
To: fengchengwen <fengchengwen@huawei.com>
Cc: <jgg@ziepe.ca>, <wathsala.vithanage@arm.com>,
	<helgaas@kernel.org>, <wei.huang2@amd.com>,
	<wangzhou1@hisilicon.com>, <wangyushan12@huawei.com>,
	<liuyonglong@huawei.com>, <kvm@vger.kernel.org>,
	<linux-pci@vger.kernel.org>,
	alex@shazbot.org
Subject: Re: [PATCH v8 4/7] vfio/pci: Add PCIe TPH interface with capability query
Date: Sun, 10 May 2026 22:36:26 -0600	[thread overview]
Message-ID: <20260510223626.0ba9dc0b@shazbot.org> (raw)
In-Reply-To: <f1f11651-c86d-47a9-974c-da0b3d423657@huawei.com>

On Sat, 9 May 2026 11:28:03 +0800
fengchengwen <fengchengwen@huawei.com> wrote:
> On 5/9/2026 6:40 AM, Alex Williamson wrote:
> > On Fri, 8 May 2026 14:40:50 +0800
> > Chengwen Feng <fengchengwen@huawei.com> wrote:
> >> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> >> index 0c771064c0b8..40bf5aa9fd0b 100644
> >> --- a/drivers/vfio/pci/vfio_pci.c
> >> +++ b/drivers/vfio/pci/vfio_pci.c
> >> @@ -60,6 +60,12 @@ static bool disable_denylist;
> >>  module_param(disable_denylist, bool, 0444);
> >>  MODULE_PARM_DESC(disable_denylist, "Disable use of device denylist. Disabling the denylist allows binding to devices with known errata that may lead to exploitable stability or security issues when accessed by untrusted users.");
> >>  
> >> +#ifdef CONFIG_PCIE_TPH
> >> +static bool enable_unsafe_tph_ds_mode;
> >> +module_param(enable_unsafe_tph_ds_mode, bool, 0444);
> >> +MODULE_PARM_DESC(enable_unsafe_tph_ds_mode, "Enable UNSAFE TPH device-specific (DS) mode. This mode provides weak isolation, cannot be safely used for virtual machines. If you do not know what this is for, step away. (default: false)");
> >> +#endif
> >> +  
> > 
> > Why is the "unsafe" aspect of this keyed on mode rather than storage
> > location?
> > 
> > Currently the user cannot enable TPH, the capability is read-only, but
> > the user does have direct access to the MSI-X table.  We rely on an
> > agreement that the user needs to use SET_IRQS to allocate host vectors
> > and we use interrupt remapping as protection against abuse, but there's
> > no mediation of STs written directly to the MSI-X table.  If the device
> > supports IV mode with ST in the MSI-X table, nothing prevents the user
> > from writing those ST entries directly to the MSI-X table.  Therefore
> > doesn't it have the same security concern as DS mode?  
> 
> 
> Agree, from this perspective, even if it is in MSI-X table, it is still unsafe.
> So TPH is unsafe as a whole, not just DS mode.
> 
> > 
> > Further, config space lives in the device and various devices are known
> > to have alternate means for accessing their config space.
> > Virtualization of config space is more to present the device in the VM
> > address space and bridge features between guest and host.  It's not
> > great as a security barrier.
> > 
> > Maybe it's really neither the mode nor storage location, and we need to
> > decide if TPH as a whole introduces any new security considerations.  
> 
> I will adjust the module parameter to control TPH globally instead of
> only DS mode.

I'm not convinced that's the right solution either.  It's a usage
barrier if the TPH capability isn't exposed R/W, but does it guarantee
the device won't make use of such TLPs anyway?  If the device has
config space backdoors or can otherwise be manipulated to send these
hints, a vfio-pci module option is just security theater.  It's also a
burden for users and for each variant driver for devices supporting TPH.

We do however need to consider how changing the behavior of the
capability affects existing users, like QEMU.  We may need to consider
two device features, one that only supports SET with no payload to
enable virtualized access to the TPH capability and another that
provides the ST handling interface.

> > It seems arguable whether we can actually prevent a device from
> > including arbitrary STs on TLPs in any case and maybe we're really
> > only exposing a curated programming interface.
> > 
> > ...  
> >> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> >> index 5de618a3a5ee..81da2bd0c21b 100644
> >> --- a/include/uapi/linux/vfio.h
> >> +++ b/include/uapi/linux/vfio.h  
> 
> ...
> 
> >> +#define VFIO_DEVICE_PCI_TPH	_IO(VFIO_TYPE, VFIO_BASE + 22)
> >> +  
> > 
> > This seems like the wrong shape to me and introduces yet another
> > ioctl multiplexer.  We already have that via the device feature
> > interface. I'd propose this only needs one new DEVICE_FEATURE
> > ioctl, TPH_ST.  The uAPI would look like:
> > 
> > struct vfio_device_feature_tph_st {
> > 	__u32 flags;
> > #define VFIO_TPH_ST_MEM_TYPE_PM	(1 << 0)
> > 	__u16 index;
> > 	__u16 count;
> > 	__u32 data[]; /* host CPU# on SET, ST value on GET */
> > }
> > 
> > The user can SET multiple STs at once that have the same mem_type
> > (assuming that's a reasonable limitation).  On SET, each {cpu#,  
> 
> Agree, using the same mem_type for a batch is a good idea.
> 
> Because it could set multiple index, so how about:
> 
> struct vfio_pci_tph_entry {
> 	__u32 cpu;
> 	__u16 val;	/* ST index on SET, ST value on GET */
> 	__u16 reserved;
> }

In the structure I proposed the user can set/get contiguous index
ranges according to index and count, where the data field can then just
be a u32 array.  Why does the user need to be able to set/get
arbitrary, non-contiguous indexes?

In a VM use case we'd likely be trapping individual writes, therefore
we'd be intercepting one index at a time.
 
> struct vfio_device_feature_tph_st {
> 	__u32 op;
> #define VFIO_TPH_OP_GET_ST	0
> #define VFIO_TPH_OP_SET_ST	1

The vfio device feature interface already handles set/get, we don't
need this.

> 	__u32 flags;
> #define VFIO_TPH_ST_MEM_TYPE_PM	(1 << 0)
> 	__u16 count;
> 	__u16 reserved1;
> 	struct vfio_pci_tph_entry ents[];
> }
> 
> > mem_type} is translated to a host value and stored internally.  A
> > GET  
> 
> Should we store internally? How about writing directly to the device?

Perhaps we should, but I think we need a flag to indicate whether we're
virtualizing a write to hardware (capability or MSI-X table) or SET'ing
an index that userspace will later retrieve for DS mode via GET.
Otherwise we don't know until the mode bits are written which location,
if any, the capability is actually using.  For example the device can
support either MSI-X or capability locations, but enable DS mode.  For
consistency though, it might make sense to write to an internal table
regardless.  Thanks,

Alex

  reply	other threads:[~2026-05-11  4:36 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-08  6:40 [PATCH v8 0/7] vfio/pci: Add PCIe TPH support Chengwen Feng
2026-05-08  6:40 ` [PATCH v8 1/7] PCI/TPH: Fix pcie_tph_get_st_table_loc() field extraction Chengwen Feng
2026-05-08  6:40 ` [PATCH v8 2/7] PCI/TPH: Export pcie_tph_get_st_modes() for external use Chengwen Feng
2026-05-08 19:02   ` sashiko-bot
2026-05-08  6:40 ` [PATCH v8 3/7] PCI/TPH: Fix pcie_tph_get_st_table_size() for MSI-X table location Chengwen Feng
2026-05-08 19:31   ` sashiko-bot
2026-05-08  6:40 ` [PATCH v8 4/7] vfio/pci: Add PCIe TPH interface with capability query Chengwen Feng
2026-05-08 20:03   ` sashiko-bot
2026-05-08 22:40   ` Alex Williamson
2026-05-09  3:28     ` fengchengwen
2026-05-11  4:36       ` Alex Williamson [this message]
2026-05-08  6:40 ` [PATCH v8 5/7] vfio/pci: Add PCIe TPH enable/disable support Chengwen Feng
2026-05-08 20:46   ` sashiko-bot
2026-05-08  6:40 ` [PATCH v8 6/7] vfio/pci: Add PCIe TPH GET_ST interface Chengwen Feng
2026-05-08  6:40 ` [PATCH v8 7/7] vfio/pci: Add PCIe TPH SET_ST interface Chengwen Feng
2026-05-08 21:49   ` sashiko-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260510223626.0ba9dc0b@shazbot.org \
    --to=alex@shazbot.org \
    --cc=fengchengwen@huawei.com \
    --cc=helgaas@kernel.org \
    --cc=jgg@ziepe.ca \
    --cc=kvm@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=liuyonglong@huawei.com \
    --cc=wangyushan12@huawei.com \
    --cc=wangzhou1@hisilicon.com \
    --cc=wathsala.vithanage@arm.com \
    --cc=wei.huang2@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox