From: Donald Dutile <ddutile@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>,
Bjorn Helgaas <bhelgaas@google.com>,
iommu@lists.linux.dev, Joerg Roedel <joro@8bytes.org>,
linux-pci@vger.kernel.org, Robin Murphy <robin.murphy@arm.com>,
Will Deacon <will@kernel.org>,
Lu Baolu <baolu.lu@linux.intel.com>,
galshalom@nvidia.com, Joerg Roedel <jroedel@suse.de>,
Kevin Tian <kevin.tian@intel.com>,
kvm@vger.kernel.org, maorg@nvidia.com, patches@lists.linux.dev,
tdave@nvidia.com, Tony Zhu <tony.zhu@intel.com>
Subject: Re: [PATCH v3 00/11] Fix incorrect iommu_groups with PCIe ACS
Date: Tue, 30 Sep 2025 11:23:06 -0400 [thread overview]
Message-ID: <8e679dad-b16d-45e3-b844-fa30b5a574ee@redhat.com> (raw)
In-Reply-To: <20250923162337.5ab1fe89.alex.williamson@redhat.com>
note: I removed Joerg's suse email address & Tony's intel address as I keep getting numerous undeliverable emails when those are included in cc:
On 9/23/25 6:23 PM, Alex Williamson wrote:
> On Mon, 22 Sep 2025 22:42:37 -0400
> Donald Dutile <ddutile@redhat.com> wrote:
>
>> On 9/22/25 10:06 PM, Alex Williamson wrote:
>>> On Mon, 22 Sep 2025 21:44:27 -0400
>>> Donald Dutile <ddutile@redhat.com> wrote:
>>>
>>>> On 9/22/25 6:39 PM, Alex Williamson wrote:
>>>>> On Fri, 5 Sep 2025 15:06:15 -0300
>>>>> Jason Gunthorpe <jgg@nvidia.com> wrote:
>>>>>
>>>>>> The series patches have extensive descriptions as to the problem and
>>>>>> solution, but in short the ACS flags are not analyzed according to the
>>>>>> spec to form the iommu_groups that VFIO is expecting for security.
>>>>>>
>>>>>> ACS is an egress control only. For a path the ACS flags on each hop only
>>>>>> effect what other devices the TLP is allowed to reach. It does not prevent
>>>>>> other devices from reaching into this path.
>>>>>>
>>>>>> For VFIO if device A is permitted to access device B's MMIO then A and B
>>>>>> must be grouped together. This says that even if a path has isolating ACS
>>>>>> flags on each hop, off-path devices with non-isolating ACS can still reach
>>>>>> into that path and must be grouped gother.
>>>>>>
>>>>>> For switches, a PCIe topology like:
>>>>>>
>>>>>> -- DSP 02:00.0 -> End Point A
>>>>>> Root 00:00.0 -> USP 01:00.0 --|
>>>>>> -- DSP 02:03.0 -> End Point B
>>>>>>
>>>>>> Will generate unique single device groups for every device even if ACS is
>>>>>> not enabled on the two DSP ports. It should at least group A/B together
>>>>>> because no ACS means A can reach the MMIO of B. This is a serious failure
>>>>>> for the VFIO security model.
>>>>>>
>>>>>> For multi-function-devices, a PCIe topology like:
>>>>>>
>>>>>> -- MFD 00:1f.0 ACS not supported
>>>>>> Root 00:00.00 --|- MFD 00:1f.2 ACS not supported
>>>>>> |- MFD 00:1f.6 ACS = REQ_ACS_FLAGS
>>>>>>
>>>>>> Will group [1f.0, 1f.2] and 1f.6 gets a single device group. However from
>>>>>> a spec perspective each device should get its own group, because ACS not
>>>>>> supported can assume no loopback is possible by spec.
>>>>>
>>>>> I just dug through the thread with Don that I think tries to justify
>>>>> this, but I have a lot of concerns about this. I think the "must be
>>>>> implemented by Functions that support peer-to-peer traffic with other
>>>>> Functions" language is specifying that IF the device implements an ACS
>>>>> capability AND does not implement the specific ACS P2P flag being
>>>>> described, then and only then can we assume that form of P2P is not
>>>>> supported. OTOH, we cannot assume anything regarding internal P2P of an
>>>>> MFD that does not implement an ACS capability at all.
>>>>>
>>>> The first, non-IF'd, non-AND'd req in PCIe spec 7.0, section 6.12.1.2 is:
>>>> "ACS P2P Request Redirect: must be implemented by Functions that
>>>> support peer-to-peer traffic with other Functions. This includes
>>>> SR-IOV Virtual Functions (VFs)." There is not further statement about
>>>> control of peer-to-peer traffic, just the ability to do so, or not.
>>>>
>>>> Note: ACS P2P Request Redirect.
>>>>
>>>> Later in that section it says:
>>>> ACS P2P Completion Redirect: must be implemented by Functions that
>>>> implement ACS P2P Request Redirect.
>>>>
>>>> That can be read as an 'IF Request-Redirect is implemented, than ACS
>>>> Completion Request must be implemented. IOW, the Completion Direct
>>>> control is required if Request Redirect is implemented, and not
>>>> necessary if Request Redirect is omitted.
>>>>
>>>> If ACS P2P Require Redirect isn't implemented, than per the first
>>>> requirement for MFDs, the PCIe device does not support peer-to-peer
>>>> traffic amongst its function or virtual functions.
>>>>
>>>> It goes on...
>>>> ACS Direct Translated P2P: must be implemented if the Function
>>>> supports Address Translation Services (ATS) and also peer-to-peer
>>>> traffic with other Functions.
>>>>
>>>> If an MFD does not do peer-to-peer, and P2P Request Redirect would be
>>>> implemented if it did, than this ACS control does not have to be
>>>> implemented either.
>>>>
>>>> Egress control structures are either optional or dependent on Request
>>>> Redirect &/or Direct Translated P2P control, which have been
>>>> addressed above as not needed if on peer-to-peer btwn functions in an
>>>> MFD (and their VFs).
>>>>
>>>>
>>>> Now, if previous PCIe spec versions (which I didn't read & re-read &
>>>> re-read like the 6.12 section of PCIe spec 7.0) had more IF and ANDs,
>>>> than that could be cause for less than clear specmanship enabling
>>>> vendors of MFDs to yield a non-PCIe-7.0 conformant MFD wrt ACS
>>>> structures. I searched section 6.12.1.2 for if/IF and AND/and, and
>>>> did not yield any conditions not stated above.
>>>
>>> Back up to 6.12.1:
>>>
>>> ACS functionality is reported and managed via ACS Extended Capability
>>> structures. PCI Express components are permitted to implement ACS
>>> Extended Capability structures in some, none, or all of their
>>> applicable Functions. The extent of what is implemented is
>>> communicated through capability bits in each ACS Extended Capability
>>> structure. A given Function with an ACS Extended Capability structure
>>> may be required or forbidden to implement certain capabilities,
>>> depending upon the specific type of the Function and whether it is
>>> part of a Multi-Function Device.
>>>
>> Right, depending on type of function or part of MFD.
>> Maybe I mis-understood your point, or vice-versa:
>> section 6.12.1.2 is for MFDs, and I was only discussing MFD ACS structs.
>> I did not mean to imply the sections I was quoting was for anything but an MFD.
>
> I'm really going after the first half of that last sentence rather than
> any specific device type:
>
> A given Function with an ACS Extended Capability structure may be
> required or forbidden to implement certain capabilities...
>
> "...[WITH] an ACS Extended Capbility structure..."
>
> "implement certain capabilities" is referring to the capabilities
> exposed within the capability register of the overall ACS extended
> capability structure.
>
> Therefore when section 6.12.1.2 goes on to say:
>
> ACS P2P Request Redirect: must be implemented by Functions that
> support peer-to-peer traffic with other Functions.
>
> This is saying this type of function _with_ an ACS extended capability
> (carrying forward from 6.12.1) must implement the p2p RR bit of the ACS
> capability register (a specific bit within the register, not the ACS
> extended capability) if it is capable of p2p traffic with other
> functions. We can only infer the function is not capable of p2p traffic
> with other functions if it both implements an ACS extended capability
> and the p2p RR bit of the capability register is zero.
>
>>> What you're quoting are the requirements for the individual p2p
>>> capabilities IF the ACS extended capability is implemented.
>>>
>> No, I'm not. I'm quoting 6.12.1.2, which is MFD-specific.
>>
>>> Section 6.12.1.1 describing ACS for downstream ports begins:
>>>
>>> This section applies to Root Ports and Switch Downstream Ports
>>> that implement an ACS Extended Capability structure.
>>>
>>> Section 6.12.1.2 for SR-IOV, SIOV and MFDs begins:
>>>
>>> This section applies to Multi-Function Device ACS Functions,
>>> with the exception of Downstream Port Functions, which are
>>> covered in the preceding section.
>>>
>> Right. I wasn't discussing Downstream port functions.
>>
>>> While not as explicit, what is a Multi-Function Device ACS Function
>>> if not a function of a MFD that implements ACS?
>>>
>> I think you are inferring too much into that less-than-optimally
>> worded section.
>>
>>>>> I believe we even reached agreement with some NIC vendors in the
>>>>> early days of IOMMU groups that they needed to implement an
>>>>> "empty" ACS capability on their multifunction NICs such that
>>>>> they could describe in this way that internal P2P is not
>>>>> supported by the device. Thanks,
>>>> In the early days -- gen1->gen3 (2009->2015) I could see that
>>>> happening. I think time (a decade) has closed those defaults to
>>>> less-common quirks. If 'empty ACS' is how they liked to do it back
>>>> than, sure. [A definition of empty ACS may be needed to fully
>>>> appreciate that statement, though.] If this patch series needs to
>>>> support an 'empty ACS' for this older case, let's add it now, or
>>>> follow-up with another fix.
>>>
>>> An "empty" ACS capability is an ACS extended capability where the
>>> ACS capability register reads as zero, precisely to match the
>>> spec in indicating that the device does not support p2p. Again,
>>> I don't see how time passing plays a role here. A MFD must
>>> implement ACS to infer anything about internal p2p behavior.
>>>
>> Again, I don't read the 'must' in the spec.
>> Although I'll agree that your definition of an empty ACS makes it
>> unambiguous.
>>
>>>> In summary, I still haven't found the IF and AND you refer to in
>>>> section 6.12.1.2 for MFDs, so if you want to quote those sections I
>>>> mis-read, or mis-interpreted their (subtle?) existence, than I'm
>>>> not immovable on the spec interpretation.
>>>
>>> As above, I think it's covered by 6.12.1 and the introductory
>>> sentence of 6.12.1.2 defining the requirements for ACS functions.
>>> Thanks,
>> 6.12.1 is not specific enough about what MFDs must or must not
>> support; it's a broad description of ACS in different PCIe
>> functions. As for 6.12.1.2, I stand by the statement that ACS P2P
>> Request Redirect must be implemented if peer-to-peer is implemented
>> in an MFD. It's not inferred, it's not unambiguous.
>> You are intepreting the first sentence in 6.12.1.2 as indirectly
>> saying that the reqs only apply to an MFD with ACS. The title of
>> the section is: "ACS Functions in SR-IOV, SIOV, and Multi-Function
>> Devices" not ACS requirements for ACS-controlled SR-IOV, SIOV, and
>> Multi-Function Devices", in which case, I could agree with the
>> interpretation you gave of that first sentence.
>>
>> I think it's time to reach out to the PCI-SIG, and the authors of
>> this section to dissect these interpretations and get some clarity.
>
> You're welcome to. I think it's sufficiently clear.
>
> The specification is stating that if a function exposes an ACS extended
> capability and the function supports p2p with other functions, it must
> implement that specific bit in the ACS capability register of the ACS
> extended capability.
>
> Therefore if the function implements an ACS extended capability and
> does not implement this bit in the ACS capability register, we can
> infer that the device does is not capable of p2p with other functions.
> It would violate the spec otherwise.
>
> However, if the function does not implement an ACS extended capability,
> we can infer nothing.
>
> It's logically impossible for the specification to add an optional
> extended capability where the lack of a function implementing the
> extended capability implies some specific behavior. It's not backwards
> compatible, which is a fundamental requirement of the PCI spec. Thanks,
>
> Alex
>
This is boiling down to an interpretation of the spec.
If the latest PCI-v7 spec is not backward compatible, that a function within an MFD
not having an ACS struct must not be isolated from other functions within the MFD without an ACS struct,
then the current Linux implementation/interpretation, and the need for the acs quirks
when hw vendors improperly omit the ACS structure, is the safest/secure route to go.
Historical/legacy implementations of MFDs without ACS structs have bolstered that
position/interpretation/agreed-to-required-acs-quicks implementation.
The small number of acs quirks also seems to support that past interpretation.
I believe a definitive answer from the PCI-SIG would be best, especially wrt
backward compatibility. Such a review & feedback is likely to take quite some
time. Thus, taking the current-conservative approach for omitted ACS structs for
MFD functions would allow this series to progress with the numerous other bug-fixes
that are needed, with a minor change to the MFD iommu-group check/creation function
to use acs-quirks to create better isolation groups if a hw vendor interprets and
implements no ACS as having no p2p connectivity.
- Don
next prev parent reply other threads:[~2025-09-30 15:23 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-05 18:06 [PATCH v3 00/11] Fix incorrect iommu_groups with PCIe ACS Jason Gunthorpe
2025-09-05 18:06 ` [PATCH v3 01/11] PCI: Move REQ_ACS_FLAGS into pci_regs.h as PCI_ACS_ISOLATED Jason Gunthorpe
2025-09-09 4:08 ` Donald Dutile
2025-09-05 18:06 ` [PATCH v3 02/11] PCI: Add pci_bus_isolated() Jason Gunthorpe
2025-09-09 4:09 ` Donald Dutile
2025-09-09 19:54 ` Bjorn Helgaas
2025-09-09 21:21 ` Jason Gunthorpe
2025-09-05 18:06 ` [PATCH v3 03/11] iommu: Compute iommu_groups properly for PCIe switches Jason Gunthorpe
2025-09-09 4:14 ` Donald Dutile
2025-09-09 12:18 ` Jason Gunthorpe
2025-09-09 19:33 ` Donald Dutile
2025-09-09 20:27 ` Bjorn Helgaas
2025-09-09 21:21 ` Jason Gunthorpe
2025-09-05 18:06 ` [PATCH v3 04/11] iommu: Organize iommu_group by member size Jason Gunthorpe
2025-09-09 4:16 ` Donald Dutile
2025-09-05 18:06 ` [PATCH v3 05/11] PCI: Add pci_reachable_set() Jason Gunthorpe
2025-09-09 21:03 ` Bjorn Helgaas
2025-09-10 16:13 ` Jason Gunthorpe
2025-09-11 19:56 ` Donald Dutile
2025-09-15 13:38 ` Jason Gunthorpe
2025-09-15 14:32 ` Donald Dutile
2025-09-05 18:06 ` [PATCH v3 06/11] iommu: Compute iommu_groups properly for PCIe MFDs Jason Gunthorpe
2025-09-09 4:57 ` Donald Dutile
2025-09-09 13:31 ` Jason Gunthorpe
2025-09-09 19:55 ` Donald Dutile
2025-09-09 21:24 ` Bjorn Helgaas
2025-09-09 23:20 ` Jason Gunthorpe
2025-09-10 1:59 ` Donald Dutile
2025-09-10 17:43 ` Jason Gunthorpe
2025-09-05 18:06 ` [PATCH v3 07/11] iommu: Validate that pci_for_each_dma_alias() matches the groups Jason Gunthorpe
2025-09-09 5:00 ` Donald Dutile
2025-09-09 15:35 ` Jason Gunthorpe
2025-09-09 19:58 ` Donald Dutile
2025-09-05 18:06 ` [PATCH v3 08/11] PCI: Add the ACS Enhanced Capability definitions Jason Gunthorpe
2025-09-09 5:01 ` Donald Dutile
2025-09-05 18:06 ` [PATCH v3 09/11] PCI: Enable ACS Enhanced bits for enable_acs and config_acs Jason Gunthorpe
2025-09-09 5:01 ` Donald Dutile
2025-09-05 18:06 ` [PATCH v3 10/11] PCI: Check ACS DSP/USP redirect bits in pci_enable_pasid() Jason Gunthorpe
2025-09-09 5:02 ` Donald Dutile
2025-09-09 21:43 ` Bjorn Helgaas
2025-09-10 17:34 ` Jason Gunthorpe
2025-09-11 19:50 ` Donald Dutile
2026-01-20 18:08 ` Keith Busch
2025-09-05 18:06 ` [PATCH v3 11/11] PCI: Check ACS Extended flags for pci_bus_isolated() Jason Gunthorpe
2025-09-09 5:04 ` Donald Dutile
2025-09-15 9:41 ` [PATCH v3 00/11] Fix incorrect iommu_groups with PCIe ACS Cédric Le Goater
2025-09-22 22:39 ` Alex Williamson
2025-09-23 1:44 ` Donald Dutile
2025-09-23 2:06 ` Alex Williamson
2025-09-23 2:42 ` Donald Dutile
2025-09-23 22:23 ` Alex Williamson
2025-09-30 15:23 ` Donald Dutile [this message]
2025-09-30 16:21 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8e679dad-b16d-45e3-b844-fa30b5a574ee@redhat.com \
--to=ddutile@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=baolu.lu@linux.intel.com \
--cc=bhelgaas@google.com \
--cc=galshalom@nvidia.com \
--cc=iommu@lists.linux.dev \
--cc=jgg@nvidia.com \
--cc=joro@8bytes.org \
--cc=jroedel@suse.de \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=maorg@nvidia.com \
--cc=patches@lists.linux.dev \
--cc=robin.murphy@arm.com \
--cc=tdave@nvidia.com \
--cc=tony.zhu@intel.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox