From: Leon Romanovsky <leon@kernel.org>
To: fengchengwen <fengchengwen@huawei.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>,
Bjorn Helgaas <bhelgaas@google.com>,
linux-rdma@vger.kernel.org, linux-pci@vger.kernel.org,
netdev@vger.kernel.org, dri-devel@lists.freedesktop.org,
Keith Busch <kbusch@kernel.org>, Yochai Cohen <yochai@nvidia.com>,
Yishai Hadas <yishaih@nvidia.com>,
Zhiping Zhang <zhipingz@meta.com>
Subject: Re: [RFC] Proposal: Add sysfs interface for PCIe TPH Steering Tag retrieval and configuration
Date: Tue, 14 Apr 2026 13:35:47 +0300 [thread overview]
Message-ID: <20260414103547.GA361495@unreal> (raw)
In-Reply-To: <84bf119e-fa8c-4c97-9197-3377b7e2b250@huawei.com>
On Tue, Apr 14, 2026 at 05:30:09PM +0800, fengchengwen wrote:
> On 4/14/2026 4:57 PM, Leon Romanovsky wrote:
> > On Tue, Apr 14, 2026 at 09:07:23AM +0800, fengchengwen wrote:
> >> On 4/14/2026 3:19 AM, Leon Romanovsky wrote:
> >>> On Mon, Apr 13, 2026 at 08:04:10PM +0800, fengchengwen wrote:
> >>>> On 4/13/2026 6:01 PM, Leon Romanovsky wrote:
> >>>>> On Fri, Apr 10, 2026 at 10:30:52PM +0800, fengchengwen wrote:
> >>>>>> Hi all,
> >>>>>>
> >>>>>> I'm writing to propose adding a sysfs interface to expose and configure the
> >>>>>> PCIe TPH
> >>>>>> Steering Tag for PCIe devices, which is retrieved inside the kernel.
> >>>>>>
> >>>>>>
> >>>>>> Background: The TPH Steering Tag is tightly coupled with both a PCIe device
> >>>>>> (identified
> >>>>>> by its BDF) and a CPU core. It can only be obtained in kernel mode. To allow
> >>>>>> user-space
> >>>>>> applications to fetch and set this value securely and conveniently, we need
> >>>>>> a standard
> >>>>>> kernel-to-user interface.
> >>>>>>
> >>>>>>
> >>>>>> Proposed Solution: Add several sysfs attributes under each PCIe device's
> >>>>>> sysfs directory:
> >>>>>> 1. /sys/bus/pci/devices/<BDF>/tph_mode to query the TPH mode (interrupt or
> >>>>>> device specific)
> >>>>>> 2. /sys/bus/pci/devices/<BDF>/tph_enable to control the TPH feature
> >>>>>> 3. /sys/bus/pci/devices/<BDF>/tph_st to support both read and write
> >>>>>> operations, e.g.:
> >>>>>> Read operation:
> >>>>>> echo "cpu=3" > /sys/bus/pci/devices/0000:01:00.0/tph_st
> >>>>>> cat /sys/bus/pci/devices/0000:01:00.0/tph_st
> >>>>>> Write operation:
> >>>>>> echo "index=10 st=123" > /sys/bus/pci/devices/0000:01:00.0/tph_st
> >>>>>>
> >>>>>>
> >>>>>> The design strictly follows PCI subsystem sysfs standards and has the
> >>>>>> following key properties:
> >>>>>>
> >>>>>> 1. Dynamic Visibility: The sysfs attributes will only be present for PCIe
> >>>>>> devices that
> >>>>>> support TPH Steering Tag. Devices without TPH capability will not show
> >>>>>> these nodes,
> >>>>>> avoiding unnecessary user confusion.
> >>>>>>
> >>>>>> 2. Permission Control: The attributes will use 0600 file permissions,
> >>>>>> ensuring only
> >>>>>> privileged root users can read or write them, which satisfies security
> >>>>>> requirements
> >>>>>> for hardware configuration interfaces.
> >>>>>>
> >>>>>> 3. Standard Implementation Location: The interface will be implemented in
> >>>>>> drivers/pci/pci-sysfs.c, the canonical location for all PCI device sysfs
> >>>>>> attributes,
> >>>>>> ensuring consistency and maintainability within the PCI subsystem.
> >>>>>>
> >>>>>>
> >>>>>> Why sysfs instead of alternatives like VFIO-PCI ioctl:
> >>>>>>
> >>>>>> - Universality: sysfs does not require binding the device to a special
> >>>>>> driver such as
> >>>>>> vfio-pci. It is available to any privileged user-space component,
> >>>>>> including system
> >>>>>> utilities, daemons, and monitoring tools.
> >>>>>>
> >>>>>> - Simplicity: Both user-space usage (cat/echo) and kernel implementation are
> >>>>>> straightforward, reducing code complexity and long-term maintenance cost.
> >>>>>>
> >>>>>> - Design Alignment: TPH Steering Tag is a generic PCIe device feature, not
> >>>>>> specific to
> >>>>>> user-space drivers like DPDK or VFIO. Exposing it via sysfs matches the
> >>>>>> kernel's
> >>>>>> standard pattern for hardware capabilities.
> >>>>>>
> >>>>>>
> >>>>>> I look forward to your comments about this design before submitting the
> >>>>>> final patch.
> >>>>>
> >>>>> You need to explain more clearly why this write functionality is useful
> >>>>> and necessary outside the VFIO/RDMA context:
> >>>>> https://lore.kernel.org/all/20260324234615.3731237-1-zhipingz@meta.com/
> >>>>>
> >>>>> AFAIK, for non-VFIO TPH callers, kernel has enough knowledge to set
> >>>>> right ST values.
> >>>>>
> >>>>> There are several comments regarding the implementation, but those can wait
> >>>>> until the rationale behind the proposal is fully clarified.
> >>>>
> >>>> Thanks for your review and comments.
> >>>>
> >>>> Let me clarify the rationale behind this user-space sysfs interface:
> >>>>
> >>>> 1. VFIO is just one of the user-space device access frameworks.
> >>>> There are many other in-kernel frameworks that expose devices
> >>>> to user space, such as UIO, UACCE, etc., which may also require
> >>>> TPH Steering Tag support.
> >>>>
> >>>> 2. The kernel can automatically program Steering Tags only when
> >>>> the device provides a standard ST table in MSI-X or config space.
> >>>> However, many devices implement vendor-specific or platform-specific
> >>>> Steering Tag programming methods that cannot be fully handled
> >>>> by the generic kernel code.
> >>>>
> >>>> 3. For such devices, user-space applications or framework drivers
> >>>> need to retrieve and configure TPH Steering Tags directly.
> >>>> A unified sysfs interface allows all user-space frameworks
> >>>> (not just VFIO) to use a common, standard way to manage
> >>>> TPH Steering Tags, rather than implementing duplicated logic
> >>>> in each subsystem.
> >>>>
> >>>> This interface provides a uniform method for any user-space
> >>>> device access solution to work with TPH, which is why I believe
> >>>> it is useful and necessary beyond the VFIO/RDMA case.
> >>>
> >>> I understand the rationale for providing a read interface, for example for
> >>> debugging, but I do not see any justification for a write interface.
> >>
> >> Thank you for the comment!
> >>
> >> As I explained, read interface is not only for debugging. It was used to
> >> such device who don't declare ST location in MSI-X or config-space, the following
> >> is Intel X710 NIC device's lspci output (only TPH part):
> >>
> >> Capabilities: [1a0 v1] Transaction Processing Hints
> >> Device specific mode supported
> >> No steering table available
> >>
> >> So we could not config the ST for device on kernel because it's vendor specific.
> >> But we could configure ST by it's vendor user-space driver, in this case, we
> >> should get ST from kernel to user-space.
> >
> > Vendor-specific, in the context of the PCI specification, does not mean the
> > kernel cannot configure it. It simply means that the ST values are not
> > stored in the ST table.
>
> Thank you for the clarification!
>
> I agree with your interpretation of "vendor-specific" in PCI spec terms—it
> does not prevent the kernel from handling TPH in principle. However, the
> real problem is that the kernel has no standardized way to know where or
> how to program those vendor-specific ST values.
No one here is opposed to you implementing the appropriate callbacks or
extending the existing in-kernel API to support a device‑specific mode.
>
> When a device reports "No steering table available" and operates in
> device-specific mode, the method used to set ST values is entirely
> device-specific and not covered by the PCI specification. If the device
> is taken over to user-space by UIO framework (e.g. VFIO or IGB_UIO), the
> generic kernel cannot infer the proper programming sequence or registers
> for each vendor-specific implementation.
>
> In these cases, the configuration must be done by the vendor’s
> user-space driver, which is aware of the device’s private programming
> model. But such a user-space driver still needs to obtain valid,
> platform-provided ST values (from ACPI _DSM), which it cannot do
> without a kernel interface.
The objection applies to this point. The PCI device exists in kernel space,
and the kernel is responsible for managing its internal state.
>
> This is why a read-only interface to retrieve ST values is still
> needed: the kernel holds the valid platform tags, while the user-space
> driver handles the device-specific programming.
>
> Thanks
>
> >
> > Thanks
>
next prev parent reply other threads:[~2026-04-14 10:35 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-10 14:30 [RFC] Proposal: Add sysfs interface for PCIe TPH Steering Tag retrieval and configuration fengchengwen
2026-04-13 10:01 ` Leon Romanovsky
2026-04-13 12:04 ` fengchengwen
2026-04-13 19:19 ` Leon Romanovsky
2026-04-14 1:07 ` fengchengwen
2026-04-14 8:57 ` Leon Romanovsky
2026-04-14 9:30 ` fengchengwen
2026-04-14 10:35 ` Leon Romanovsky [this message]
[not found] ` <11eaea26-ec10-264a-db1e-951f6b46078d@huawei.com>
2026-04-14 15:11 ` Jason Gunthorpe
2026-04-15 1:47 ` fengchengwen
2026-04-13 13:35 ` Jason Gunthorpe
2026-04-14 1:33 ` fengchengwen
2026-04-14 11:57 ` Jason Gunthorpe
[not found] ` <284350ea-e398-12da-c3e2-e156a1e6d127@huawei.com>
2026-04-14 15:08 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260414103547.GA361495@unreal \
--to=leon@kernel.org \
--cc=bhelgaas@google.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=fengchengwen@huawei.com \
--cc=jgg@ziepe.ca \
--cc=kbusch@kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=yishaih@nvidia.com \
--cc=yochai@nvidia.com \
--cc=zhipingz@meta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox