linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/16] Fix incorrect iommu_groups with PCIe ACS
@ 2025-07-09 14:52 Jason Gunthorpe
  2025-07-09 14:52 ` [PATCH v2 01/16] PCI: Move REQ_ACS_FLAGS into pci_regs.h as PCI_ACS_ISOLATED Jason Gunthorpe
                   ` (17 more replies)
  0 siblings, 18 replies; 46+ messages in thread
From: Jason Gunthorpe @ 2025-07-09 14:52 UTC (permalink / raw)
  To: Bjorn Helgaas, iommu, Joerg Roedel, linux-pci, Robin Murphy,
	Will Deacon
  Cc: Alex Williamson, Lu Baolu, galshalom, Joerg Roedel, Kevin Tian,
	kvm, maorg, patches, tdave, Tony Zhu

The series patches have extensive descriptions as to the problem and
solution, but in short the ACS flags are not analyzed according to the
spec to form the iommu_groups that VFIO is expecting for security.

ACS is an egress control only. For a path the ACS flags on each hop only
effect what other devices the TLP is allowed to reach. It does not prevent
other devices from reaching into this path.

For VFIO if device A is permitted to access device B's MMIO then A and B
must be grouped together. This says that even if a path has isolating ACS
flags on each hop, off-path devices with non-isolating ACS can still reach
into that path and must be grouped gother.

For switches, a PCIe topology like:

                               -- DSP 02:00.0 -> End Point A
 Root 00:00.0 -> USP 01:00.0 --|
                               -- DSP 02:03.0 -> End Point B

Will generate unique single device groups for every device even if ACS is
not enabled on the two DSP ports. It should at least group A/B together
because no ACS means A can reach the MMIO of B. This is a serious failure
for the VFIO security model.

For multi-function-devices, a PCIe topology like:

                  -- MFD 00:1f.0 ACS != REQ_ACS_FLAGS
  Root 00:00.00 --|- MFD 00:1f.2 ACS != REQ_ACS_FLAGS
                  |- MFD 00:1f.6 ACS = REQ_ACS_FLAGS

Will group [1f.0, 1f.2] and 1f.6 gets a single device group. In many cases
we suspect that the MFD actually doesn't need ACS, so this is probably not
as important a security failure, but from a spec perspective the correct
answer is one group of [1f.0, 1f.2, 1f.6] beacuse 1f.0/2 have no ACS
preventing them from reaching the MMIO of 1f.6.

There is also some confusing spec language about how ACS and SRIOV works
which this series does not address.

This entire series goes further and makes some additional improvements to
the ACS validation found while studying this problem. The groups around a
PCIe to PCI bridge are shrunk to not include the PCIe bridge.

The last patches implement "ACS Enhanced" on top of it. Due to how ACS
Enhanced was defined as a non-backward compatible feature it is important
to get SW support out there.

Due to the potential of iommu_groups becoming winder and thus non-usable
for VFIO this should go to a linux-next tree to give it some more
exposure.

I have now tested this a few systems I could get:

 - Various Intel client systems:
   * Raptor Lake, with VMD enabled and using the real_dev mechanism
   * 6/7th generation 100 Series/C320
   * 5/6th generation 100 Series/C320 with a NIC MFD quirk
   * Tiger Lake
   * 5/6th generation Sunrise Point
  No change in grouping on any of these systems

 - NVIDIA Grace system with 5 different PCI switches from two vendors
   Bug fix widening the iommu_groups works as expected here

 - AMD Milan Starship/Matisse
   * Groups are similar, this series generates narrow groups because the
     dummy host bridges always get their own groups. Something forcibly
     disables ACS SV on one bridge which correctly causes one larger
     group.

This is on github: https://github.com/jgunthorpe/linux/commits/pcie_switch_groups

v2:
 - Revise comments and commit messages
 - Rename struct pci_alias_set to pci_reachable_set
 - Make more sense of the special bus->self = NULL case for SRIOV
 - Add pci_group_alloc_non_isolated() for readability
 - Rename BUS_DATA_PCI_UNISOLATED to BUS_DATA_PCI_NON_ISOLATED
 - Propogate BUS_DATA_PCI_NON_ISOLATED downstream from a MFD in case a MFD
   function is a bridge
 - New patches to add pci_mfd_isolation() to retain more cases of narrow
   groups on MFDs with missing ACS.
 - Redescribe the MFD related change as a bug fix. For a MFD to be
   isolated all functions must have egress control on their P2P.
v1: https://patch.msgid.link/r/0-v1-74184c5043c6+195-pcie_switch_groups_jgg@nvidia.com

Jason Gunthorpe (16):
  PCI: Move REQ_ACS_FLAGS into pci_regs.h as PCI_ACS_ISOLATED
  PCI: Add pci_bus_isolation()
  iommu: Compute iommu_groups properly for PCIe switches
  iommu: Organize iommu_group by member size
  PCI: Add pci_reachable_set()
  PCI: Remove duplication in calling pci_acs_ctrl_enabled()
  PCI: Use pci_quirk_mf_endpoint_acs() for pci_quirk_amd_sb_acs()
  PCI: Use pci_acs_ctrl_isolated() for pci_quirk_al_acs()
  PCI: Widen the acs_flags to u32 within the quirk callback
  PCI: Add pci_mfd_isolation()
  iommu: Compute iommu_groups properly for PCIe MFDs
  iommu: Validate that pci_for_each_dma_alias() matches the groups
  PCI: Add the ACS Enhanced Capability definitions
  PCI: Enable ACS Enhanced bits for enable_acs and config_acs
  PCI: Check ACS DSP/USP redirect bits in pci_enable_pasid()
  PCI: Check ACS Extended flags for pci_bus_isolated()

 drivers/iommu/iommu.c         | 486 +++++++++++++++++++++++-----------
 drivers/pci/ats.c             |   4 +-
 drivers/pci/pci.c             |  73 ++++-
 drivers/pci/pci.h             |   5 +
 drivers/pci/quirks.c          | 137 ++++++----
 drivers/pci/search.c          | 294 ++++++++++++++++++++
 include/linux/pci.h           |  50 ++++
 include/uapi/linux/pci_regs.h |  18 ++
 8 files changed, 846 insertions(+), 221 deletions(-)


base-commit: e04c78d86a9699d136910cfc0bdcf01087e3267e
-- 
2.43.0


^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2025-08-20 17:21 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-09 14:52 [PATCH v2 00/16] Fix incorrect iommu_groups with PCIe ACS Jason Gunthorpe
2025-07-09 14:52 ` [PATCH v2 01/16] PCI: Move REQ_ACS_FLAGS into pci_regs.h as PCI_ACS_ISOLATED Jason Gunthorpe
2025-07-09 14:52 ` [PATCH v2 02/16] PCI: Add pci_bus_isolation() Jason Gunthorpe
2025-07-09 14:52 ` [PATCH v2 03/16] iommu: Compute iommu_groups properly for PCIe switches Jason Gunthorpe
2025-07-17 22:03   ` Donald Dutile
2025-07-18 18:09     ` Jason Gunthorpe
2025-07-18 19:00       ` Donald Dutile
2025-07-18 20:19         ` Jason Gunthorpe
2025-07-18 21:41           ` Donald Dutile
2025-07-18 22:52             ` Jason Gunthorpe
2025-07-09 14:52 ` [PATCH v2 04/16] iommu: Organize iommu_group by member size Jason Gunthorpe
2025-07-09 14:52 ` [PATCH v2 05/16] PCI: Add pci_reachable_set() Jason Gunthorpe
2025-07-17 22:04   ` Donald Dutile
2025-07-18 17:49     ` Jason Gunthorpe
2025-07-18 19:10       ` Donald Dutile
2025-07-09 14:52 ` [PATCH v2 06/16] PCI: Remove duplication in calling pci_acs_ctrl_enabled() Jason Gunthorpe
2025-07-09 14:52 ` [PATCH v2 07/16] PCI: Use pci_quirk_mf_endpoint_acs() for pci_quirk_amd_sb_acs() Jason Gunthorpe
2025-07-09 14:52 ` [PATCH v2 08/16] PCI: Use pci_acs_ctrl_isolated() for pci_quirk_al_acs() Jason Gunthorpe
2025-07-09 14:52 ` [PATCH v2 09/16] PCI: Widen the acs_flags to u32 within the quirk callback Jason Gunthorpe
2025-07-09 14:52 ` [PATCH v2 10/16] PCI: Add pci_mfd_isolation() Jason Gunthorpe
2025-08-20 17:21   ` Keith Busch
2025-07-09 14:52 ` [PATCH v2 11/16] iommu: Compute iommu_groups properly for PCIe MFDs Jason Gunthorpe
2025-07-28  9:47   ` Cédric Le Goater
2025-07-28 13:58     ` Jason Gunthorpe
2025-07-09 14:52 ` [PATCH v2 12/16] iommu: Validate that pci_for_each_dma_alias() matches the groups Jason Gunthorpe
2025-07-17 22:07   ` Donald Dutile
2025-07-09 14:52 ` [PATCH v2 13/16] PCI: Add the ACS Enhanced Capability definitions Jason Gunthorpe
2025-07-09 14:52 ` [PATCH v2 14/16] PCI: Enable ACS Enhanced bits for enable_acs and config_acs Jason Gunthorpe
2025-07-09 14:52 ` [PATCH v2 15/16] PCI: Check ACS DSP/USP redirect bits in pci_enable_pasid() Jason Gunthorpe
2025-07-17 22:17   ` Donald Dutile
2025-07-18 17:52     ` Jason Gunthorpe
2025-08-05  4:39   ` Askar Safin
2025-07-09 14:52 ` [PATCH v2 16/16] PCI: Check ACS Extended flags for pci_bus_isolated() Jason Gunthorpe
2025-07-18 21:29 ` [PATCH v2 00/16] Fix incorrect iommu_groups with PCIe ACS Alex Williamson
2025-07-18 22:59   ` Jason Gunthorpe
2025-08-02  1:45 ` Ethan Zhao
2025-08-02 15:18   ` Jason Gunthorpe
2025-08-05  3:43     ` Ethan Zhao
2025-08-05 12:35       ` Jason Gunthorpe
2025-08-05 14:41         ` Ethan Zhao
2025-08-05 14:43           ` Jason Gunthorpe
2025-08-06  2:22             ` Ethan Zhao
2025-08-06  2:41               ` Baolu Lu
2025-08-06 13:40                 ` Jason Gunthorpe
2025-08-07  1:36                 ` Ethan Zhao
2025-08-08  7:56                 ` Ethan Zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).