Archive-only list for patches
 help / color / mirror / Atom feed
From: Donald Dutile <ddutile@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	iommu@lists.linux.dev, Joerg Roedel <joro@8bytes.org>,
	linux-pci@vger.kernel.org, Robin Murphy <robin.murphy@arm.com>,
	Will Deacon <will@kernel.org>,
	Lu Baolu <baolu.lu@linux.intel.com>,
	galshalom@nvidia.com, Joerg Roedel <jroedel@suse.de>,
	Kevin Tian <kevin.tian@intel.com>,
	kvm@vger.kernel.org, maorg@nvidia.com, patches@lists.linux.dev,
	tdave@nvidia.com, Tony Zhu <tony.zhu@intel.com>
Subject: Re: [PATCH v3 00/11] Fix incorrect iommu_groups with PCIe ACS
Date: Tue, 30 Sep 2025 11:23:06 -0400	[thread overview]
Message-ID: <8e679dad-b16d-45e3-b844-fa30b5a574ee@redhat.com> (raw)
In-Reply-To: <20250923162337.5ab1fe89.alex.williamson@redhat.com>


note: I removed Joerg's suse email address & Tony's intel address as I keep getting numerous undeliverable emails when those are included in cc:

On 9/23/25 6:23 PM, Alex Williamson wrote:
> On Mon, 22 Sep 2025 22:42:37 -0400
> Donald Dutile <ddutile@redhat.com> wrote:
> 
>> On 9/22/25 10:06 PM, Alex Williamson wrote:
>>> On Mon, 22 Sep 2025 21:44:27 -0400
>>> Donald Dutile <ddutile@redhat.com> wrote:
>>>    
>>>> On 9/22/25 6:39 PM, Alex Williamson wrote:
>>>>> On Fri,  5 Sep 2025 15:06:15 -0300
>>>>> Jason Gunthorpe <jgg@nvidia.com> wrote:
>>>>>       
>>>>>> The series patches have extensive descriptions as to the problem and
>>>>>> solution, but in short the ACS flags are not analyzed according to the
>>>>>> spec to form the iommu_groups that VFIO is expecting for security.
>>>>>>
>>>>>> ACS is an egress control only. For a path the ACS flags on each hop only
>>>>>> effect what other devices the TLP is allowed to reach. It does not prevent
>>>>>> other devices from reaching into this path.
>>>>>>
>>>>>> For VFIO if device A is permitted to access device B's MMIO then A and B
>>>>>> must be grouped together. This says that even if a path has isolating ACS
>>>>>> flags on each hop, off-path devices with non-isolating ACS can still reach
>>>>>> into that path and must be grouped gother.
>>>>>>
>>>>>> For switches, a PCIe topology like:
>>>>>>
>>>>>>                                   -- DSP 02:00.0 -> End Point A
>>>>>>     Root 00:00.0 -> USP 01:00.0 --|
>>>>>>                                   -- DSP 02:03.0 -> End Point B
>>>>>>
>>>>>> Will generate unique single device groups for every device even if ACS is
>>>>>> not enabled on the two DSP ports. It should at least group A/B together
>>>>>> because no ACS means A can reach the MMIO of B. This is a serious failure
>>>>>> for the VFIO security model.
>>>>>>
>>>>>> For multi-function-devices, a PCIe topology like:
>>>>>>
>>>>>>                      -- MFD 00:1f.0 ACS not supported
>>>>>>      Root 00:00.00 --|- MFD 00:1f.2 ACS not supported
>>>>>>                      |- MFD 00:1f.6 ACS = REQ_ACS_FLAGS
>>>>>>
>>>>>> Will group [1f.0, 1f.2] and 1f.6 gets a single device group. However from
>>>>>> a spec perspective each device should get its own group, because ACS not
>>>>>> supported can assume no loopback is possible by spec.
>>>>>
>>>>> I just dug through the thread with Don that I think tries to justify
>>>>> this, but I have a lot of concerns about this.  I think the "must be
>>>>> implemented by Functions that support peer-to-peer traffic with other
>>>>> Functions" language is specifying that IF the device implements an ACS
>>>>> capability AND does not implement the specific ACS P2P flag being
>>>>> described, then and only then can we assume that form of P2P is not
>>>>> supported.  OTOH, we cannot assume anything regarding internal P2P of an
>>>>> MFD that does not implement an ACS capability at all.
>>>>>       
>>>> The first, non-IF'd, non-AND'd req in PCIe spec 7.0, section 6.12.1.2 is:
>>>> "ACS P2P Request Redirect: must be implemented by Functions that
>>>> support peer-to-peer traffic with other Functions. This includes
>>>> SR-IOV Virtual Functions (VFs)." There is not further statement about
>>>> control of peer-to-peer traffic, just the ability to do so, or not.
>>>>
>>>> Note: ACS P2P Request Redirect.
>>>>
>>>> Later in that section it says:
>>>> ACS P2P Completion Redirect: must be implemented by Functions that
>>>> implement ACS P2P Request Redirect.
>>>>
>>>> That can be read as an 'IF Request-Redirect is implemented, than ACS
>>>> Completion Request must be implemented. IOW, the Completion Direct
>>>> control is required if Request Redirect is implemented, and not
>>>> necessary if Request Redirect is omitted.
>>>>
>>>> If ACS P2P Require Redirect isn't implemented, than per the first
>>>> requirement for MFDs, the PCIe device does not support peer-to-peer
>>>> traffic amongst its function or virtual functions.
>>>>
>>>> It goes on...
>>>> ACS Direct Translated P2P: must be implemented if the Function
>>>> supports Address Translation Services (ATS) and also peer-to-peer
>>>> traffic with other Functions.
>>>>
>>>> If an MFD does not do peer-to-peer, and P2P Request Redirect would be
>>>> implemented if it did, than this ACS control does not have to be
>>>> implemented either.
>>>>
>>>> Egress control structures are either optional or dependent on Request
>>>> Redirect &/or Direct Translated P2P control, which have been
>>>> addressed above as not needed if on peer-to-peer btwn functions in an
>>>> MFD (and their VFs).
>>>>
>>>>
>>>> Now, if previous PCIe spec versions (which I didn't read & re-read &
>>>> re-read like the 6.12 section of PCIe spec 7.0) had more IF and ANDs,
>>>> than that could be cause for less than clear specmanship enabling
>>>> vendors of MFDs to yield a non-PCIe-7.0 conformant MFD wrt ACS
>>>> structures. I searched section 6.12.1.2 for if/IF and AND/and, and
>>>> did not yield any conditions not stated above.
>>>
>>> Back up to 6.12.1:
>>>
>>>     ACS functionality is reported and managed via ACS Extended Capability
>>>     structures. PCI Express components are permitted to implement ACS
>>>     Extended Capability structures in some, none, or all of their
>>>     applicable Functions. The extent of what is implemented is
>>>     communicated through capability bits in each ACS Extended Capability
>>>     structure. A given Function with an ACS Extended Capability structure
>>>     may be required or forbidden to implement certain capabilities,
>>>     depending upon the specific type of the Function and whether it is
>>>     part of a Multi-Function Device.
>>>    
>> Right, depending on type of function or part of MFD.
>> Maybe I mis-understood your point, or vice-versa:
>> section 6.12.1.2 is for MFDs, and I was only discussing MFD ACS structs.
>> I did not mean to imply the sections I was quoting was for anything but an MFD.
> 
> I'm really going after the first half of that last sentence rather than
> any specific device type:
> 
>    A given Function with an ACS Extended Capability structure may be
>    required or forbidden to implement certain capabilities...
> 
> "...[WITH] an ACS Extended Capbility structure..."
> 
> "implement certain capabilities" is referring to the capabilities
> exposed within the capability register of the overall ACS extended
> capability structure.
> 
> Therefore when section 6.12.1.2 goes on to say:
> 
>    ACS P2P Request Redirect: must be implemented by Functions that
>    support peer-to-peer traffic with other Functions.
> 
> This is saying this type of function _with_ an ACS extended capability
> (carrying forward from 6.12.1) must implement the p2p RR bit of the ACS
> capability register (a specific bit within the register, not the ACS
> extended capability) if it is capable of p2p traffic with other
> functions.  We can only infer the function is not capable of p2p traffic
> with other functions if it both implements an ACS extended capability
> and the p2p RR bit of the capability register is zero.
> 
>>> What you're quoting are the requirements for the individual p2p
>>> capabilities IF the ACS extended capability is implemented.
>>>    
>> No, I'm not.  I'm quoting 6.12.1.2, which is MFD-specific.
>>
>>> Section 6.12.1.1 describing ACS for downstream ports begins:
>>>
>>>     This section applies to Root Ports and Switch Downstream Ports
>>>    that implement an ACS Extended Capability structure.
>>>
>>> Section 6.12.1.2 for SR-IOV, SIOV and MFDs begins:
>>>
>>>     This section applies to Multi-Function Device ACS Functions,
>>>    with the exception of Downstream Port Functions, which are
>>>    covered in the preceding section.
>>>    
>> Right.  I wasn't discussing Downstream port functions.
>>
>>> While not as explicit, what is a Multi-Function Device ACS Function
>>>    if not a function of a MFD that implements ACS?
>>>    
>> I think you are inferring too much into that less-than-optimally
>>    worded section.
>>
>>>>> I believe we even reached agreement with some NIC vendors in the
>>>>> early days of IOMMU groups that they needed to implement an
>>>>>    "empty" ACS capability on their multifunction NICs such that
>>>>>    they could describe in this way that internal P2P is not
>>>>>    supported by the device.  Thanks,
>>>> In the early days -- gen1->gen3 (2009->2015) I could see that
>>>> happening. I think time (a decade) has closed those defaults to
>>>> less-common quirks. If 'empty ACS' is how they liked to do it back
>>>> than, sure. [A definition of empty ACS may be needed to fully
>>>> appreciate that statement, though.] If this patch series needs to
>>>> support an 'empty ACS' for this older case, let's add it now, or
>>>> follow-up with another fix.
>>>
>>> An "empty" ACS capability is an ACS extended capability where the
>>>    ACS capability register reads as zero, precisely to match the
>>>    spec in indicating that the device does not support p2p.  Again,
>>>    I don't see how time passing plays a role here.  A MFD must
>>>    implement ACS to infer anything about internal p2p behavior.
>>>      
>> Again, I don't read the 'must' in the spec.
>> Although I'll agree that your definition of an empty ACS makes it
>>    unambiguous.
>>
>>>> In summary, I still haven't found the IF and AND you refer to in
>>>> section 6.12.1.2 for MFDs, so if you want to quote those sections I
>>>> mis-read, or mis-interpreted their (subtle?) existence, than I'm
>>>>    not immovable on the spec interpretation.
>>>
>>> As above, I think it's covered by 6.12.1 and the introductory
>>>    sentence of 6.12.1.2 defining the requirements for ACS functions.
>>>     Thanks,
>> 6.12.1 is not specific enough about what MFDs must or must not
>>    support; it's a broad description of ACS in different PCIe
>>    functions. As for 6.12.1.2, I stand by the statement that ACS P2P
>>    Request Redirect must be implemented if peer-to-peer is implemented
>>    in an MFD. It's not inferred, it's not unambiguous.
>> You are intepreting the first sentence in 6.12.1.2 as indirectly
>>    saying that the reqs only apply to an MFD with ACS.  The title of
>>    the section is: "ACS Functions in SR-IOV, SIOV, and Multi-Function
>>    Devices"  not ACS requirements for ACS-controlled SR-IOV, SIOV, and
>>    Multi-Function Devices", in which case, I could agree with the
>>    interpretation you gave of that first sentence.
>>
>> I think it's time to reach out to the PCI-SIG, and the authors of
>>    this section to dissect these interpretations and get some clarity.
> 
> You're welcome to.  I think it's sufficiently clear.
> 
> The specification is stating that if a function exposes an ACS extended
> capability and the function supports p2p with other functions, it must
> implement that specific bit in the ACS capability register of the ACS
> extended capability.
> 
> Therefore if the function implements an ACS extended capability and
> does not implement this bit in the ACS capability register, we can
> infer that the device does is not capable of p2p with other functions.
> It would violate the spec otherwise.
> 
> However, if the function does not implement an ACS extended capability,
> we can infer nothing.
> 
> It's logically impossible for the specification to add an optional
> extended capability where the lack of a function implementing the
> extended capability implies some specific behavior.  It's not backwards
> compatible, which is a fundamental requirement of the PCI spec.  Thanks,
> 
> Alex
> 
This is boiling down to an interpretation of the spec.

If the latest PCI-v7 spec is not backward compatible, that a function within an MFD
not having an ACS struct must not be isolated from other functions within the MFD without an ACS struct,
then the current Linux implementation/interpretation, and the need for the acs quirks
when hw vendors improperly omit the ACS structure, is the safest/secure route to go.

Historical/legacy implementations of MFDs without ACS structs have bolstered that
position/interpretation/agreed-to-required-acs-quicks implementation.
The small number of acs quirks also seems to support that past interpretation.

I believe a definitive answer from the PCI-SIG would be best, especially wrt
backward compatibility.  Such a review & feedback is likely to take quite some
time.  Thus, taking the current-conservative approach for omitted ACS structs for
MFD functions would allow this series to progress with the numerous other bug-fixes
that are needed, with a minor change to the MFD iommu-group check/creation function
to use acs-quirks to create better isolation groups if a hw vendor interprets and
implements no ACS as having no p2p connectivity.

- Don



  reply	other threads:[~2025-09-30 15:23 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-05 18:06 [PATCH v3 00/11] Fix incorrect iommu_groups with PCIe ACS Jason Gunthorpe
2025-09-05 18:06 ` [PATCH v3 01/11] PCI: Move REQ_ACS_FLAGS into pci_regs.h as PCI_ACS_ISOLATED Jason Gunthorpe
2025-09-09  4:08   ` Donald Dutile
2025-09-05 18:06 ` [PATCH v3 02/11] PCI: Add pci_bus_isolated() Jason Gunthorpe
2025-09-09  4:09   ` Donald Dutile
2025-09-09 19:54   ` Bjorn Helgaas
2025-09-09 21:21     ` Jason Gunthorpe
2025-09-05 18:06 ` [PATCH v3 03/11] iommu: Compute iommu_groups properly for PCIe switches Jason Gunthorpe
2025-09-09  4:14   ` Donald Dutile
2025-09-09 12:18     ` Jason Gunthorpe
2025-09-09 19:33       ` Donald Dutile
2025-09-09 20:27   ` Bjorn Helgaas
2025-09-09 21:21     ` Jason Gunthorpe
2025-09-05 18:06 ` [PATCH v3 04/11] iommu: Organize iommu_group by member size Jason Gunthorpe
2025-09-09  4:16   ` Donald Dutile
2025-09-05 18:06 ` [PATCH v3 05/11] PCI: Add pci_reachable_set() Jason Gunthorpe
2025-09-09 21:03   ` Bjorn Helgaas
2025-09-10 16:13     ` Jason Gunthorpe
2025-09-11 19:56     ` Donald Dutile
2025-09-15 13:38       ` Jason Gunthorpe
2025-09-15 14:32         ` Donald Dutile
2025-09-05 18:06 ` [PATCH v3 06/11] iommu: Compute iommu_groups properly for PCIe MFDs Jason Gunthorpe
2025-09-09  4:57   ` Donald Dutile
2025-09-09 13:31     ` Jason Gunthorpe
2025-09-09 19:55       ` Donald Dutile
2025-09-09 21:24   ` Bjorn Helgaas
2025-09-09 23:20     ` Jason Gunthorpe
2025-09-10  1:59     ` Donald Dutile
2025-09-10 17:43       ` Jason Gunthorpe
2025-09-05 18:06 ` [PATCH v3 07/11] iommu: Validate that pci_for_each_dma_alias() matches the groups Jason Gunthorpe
2025-09-09  5:00   ` Donald Dutile
2025-09-09 15:35     ` Jason Gunthorpe
2025-09-09 19:58       ` Donald Dutile
2025-09-05 18:06 ` [PATCH v3 08/11] PCI: Add the ACS Enhanced Capability definitions Jason Gunthorpe
2025-09-09  5:01   ` Donald Dutile
2025-09-05 18:06 ` [PATCH v3 09/11] PCI: Enable ACS Enhanced bits for enable_acs and config_acs Jason Gunthorpe
2025-09-09  5:01   ` Donald Dutile
2025-09-05 18:06 ` [PATCH v3 10/11] PCI: Check ACS DSP/USP redirect bits in pci_enable_pasid() Jason Gunthorpe
2025-09-09  5:02   ` Donald Dutile
2025-09-09 21:43   ` Bjorn Helgaas
2025-09-10 17:34     ` Jason Gunthorpe
2025-09-11 19:50       ` Donald Dutile
2026-01-20 18:08   ` Keith Busch
2025-09-05 18:06 ` [PATCH v3 11/11] PCI: Check ACS Extended flags for pci_bus_isolated() Jason Gunthorpe
2025-09-09  5:04   ` Donald Dutile
2025-09-15  9:41 ` [PATCH v3 00/11] Fix incorrect iommu_groups with PCIe ACS Cédric Le Goater
2025-09-22 22:39 ` Alex Williamson
2025-09-23  1:44   ` Donald Dutile
2025-09-23  2:06     ` Alex Williamson
2025-09-23  2:42       ` Donald Dutile
2025-09-23 22:23         ` Alex Williamson
2025-09-30 15:23           ` Donald Dutile [this message]
2025-09-30 16:21             ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8e679dad-b16d-45e3-b844-fa30b5a574ee@redhat.com \
    --to=ddutile@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=bhelgaas@google.com \
    --cc=galshalom@nvidia.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=jroedel@suse.de \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=maorg@nvidia.com \
    --cc=patches@lists.linux.dev \
    --cc=robin.murphy@arm.com \
    --cc=tdave@nvidia.com \
    --cc=tony.zhu@intel.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox