From: Jason Gunthorpe <jgg@nvidia.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
iommu@lists.linux.dev, Joerg Roedel <joro@8bytes.org>,
linux-pci@vger.kernel.org, Robin Murphy <robin.murphy@arm.com>,
Will Deacon <will@kernel.org>,
Lu Baolu <baolu.lu@linux.intel.com>,
galshalom@nvidia.com, Joerg Roedel <jroedel@suse.de>,
Kevin Tian <kevin.tian@intel.com>,
kvm@vger.kernel.org, maorg@nvidia.com, patches@lists.linux.dev,
tdave@nvidia.com, Tony Zhu <tony.zhu@intel.com>
Subject: Re: [PATCH 00/11] Fix incorrect iommu_groups with PCIe switches
Date: Thu, 3 Jul 2025 21:37:09 -0300 [thread overview]
Message-ID: <20250704003709.GJ1209783@nvidia.com> (raw)
In-Reply-To: <20250701154826.75a7aba6.alex.williamson@redhat.com>
On Tue, Jul 01, 2025 at 03:48:26PM -0600, Alex Williamson wrote:
> 00:1c. are all grouped together. Here 1c.0 does not report ACS, but
> the other root ports do:
I dug an older Intel system out of my closet and got it to run this
kernel, it has another odd behavior, maybe related to what you are
seeing..
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers (rev 05)
00:01.0 PCI bridge: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) (rev 05)
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 630 (rev 04)
00:14.0 USB controller: Intel Corporation 100 Series/C230 Series Chipset Family USB 3.0 xHCI Controller (rev 31)
00:14.2 Signal processing controller: Intel Corporation 100 Series/C230 Series Chipset Family Thermal Subsystem (rev 31)
00:16.0 Communication controller: Intel Corporation 100 Series/C230 Series Chipset Family MEI Controller #1 (rev 31)
00:17.0 SATA controller: Intel Corporation Q170/Q150/B150/H170/H110/Z170/CM236 Chipset SATA Controller [AHCI Mode] (rev 31)
00:1b.0 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #17 (rev f1)
00:1f.0 ISA bridge: Intel Corporation C236 Chipset LPC/eSPI Controller (rev 31)
00:1f.2 Memory controller: Intel Corporation 100 Series/C230 Series Chipset Family Power Management Controller (rev 31)
00:1f.3 Audio device: Intel Corporation 100 Series/C230 Series Chipset Family HD Audio Controller (rev 31)
00:1f.4 SMBus: Intel Corporation 100 Series/C230 Series Chipset Family SMBus (rev 31)
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-LM (rev 31)
00:01.0/01:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
00:01.0/01:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
00:1b.0/02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961/SM963
And here we are interested in this group:
00:1f.0 ISA bridge: Intel Corporation C236 Chipset LPC/eSPI Controller (rev 31)
00:1f.2 Memory controller: Intel Corporation 100 Series/C230 Series Chipset Family Power Management Controller (rev 31)
00:1f.3 Audio device: Intel Corporation 100 Series/C230 Series Chipset Family HD Audio Controller (rev 31)
00:1f.4 SMBus: Intel Corporation 100 Series/C230 Series Chipset Family SMBus (rev 31)
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-LM (rev 31)
Which the current code puts into two groups
[00:1f.0 00:1f.2 00:1f.3 00:1f.4]
[00:1f.6]
While this series puts them all in one group.
No device in the MFD 00:1f has an ACS capability however only 00:1f.6 has a quirk:
{ PCI_VENDOR_ID_INTEL, 0x15b7, pci_quirk_mf_endpoint_acs },
/*
* SV, TB, and UF are not relevant to multifunction endpoints.
*
* Multifunction devices are only required to implement RR, CR, and DT
* in their ACS capability if they support peer-to-peer transactions.
* Devices matching this quirk have been verified by the vendor to not
* perform peer-to-peer with other functions, allowing us to mask out
* these bits as if they were unimplemented in the ACS capability.
*/
Giving these ACS results:
pci 0000:00:1f.0: pci_acs_enabled:3693 result=0 1d
pci 0000:00:1f.2: pci_acs_enabled:3693 result=0 1d
pci 0000:00:1f.3: pci_acs_enabled:3693 result=0 1d
pci 0000:00:1f.4: pci_acs_enabled:3693 result=0 1d
pci 0000:00:1f.6: pci_acs_enabled:3693 result=1 1d
Which shows the logic here:
static struct iommu_group *get_pci_function_alias_group(struct pci_dev *pdev,
unsigned long *devfns)
{
if (!pdev->multifunction || pci_acs_enabled(pdev, REQ_ACS_FLAGS))
return NULL;
Is causing the grouping difference. When it checks 00:1f.6 it sees
pci_acs_enabled = true and then ignores the rest of the MFD. This is
basically part of my issue #2 that off-path ACS is not considered.
AFAIK ACS is a per-function egress property (eg it is why it is called
the ACS Egress Vector). Meaning if 01f.4 sends a P2P DMA targetting
MMIO in 1f.6 it is the ACS of 01f.4 as the egress that is responsible
to block it. The ACS of 1f.6 as the ingress is not considered.
By our rules if 01f.4 can DMA into 01f.6 they should be in the same
group.
I point to "Table 6-10 ACS P2P Request Redirect and ACS P2P Egress
Control Interactions" as supporting this. None of these options are
'block incoming request' - they are all talking about how to route the
original outgoing request.
So I think the above is a bug in the current kernel, the logic should
require that all functions in the MFD have ACS on, otherwise they need
to share a single group. It is what is implemented in this series, and
I think it is why you saw other cases where a single bad ACS "spoils"
the MFD?
It seems the qurking should have included all the functions in this
MFD, not just the NIC.
Does this seem right to you?
Jason
next prev parent reply other threads:[~2025-07-04 0:37 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-30 22:28 [PATCH 00/11] Fix incorrect iommu_groups with PCIe switches Jason Gunthorpe
2025-06-30 22:28 ` [PATCH 01/11] PCI: Move REQ_ACS_FLAGS into pci_regs.h as PCI_ACS_ISOLATED Jason Gunthorpe
2025-06-30 22:28 ` [PATCH 02/11] PCI: Add pci_bus_isolation() Jason Gunthorpe
2025-07-01 19:28 ` Alex Williamson
2025-07-02 1:00 ` Jason Gunthorpe
2025-07-03 15:30 ` Jason Gunthorpe
2025-07-03 22:17 ` Alex Williamson
2025-07-03 23:08 ` Alex Williamson
2025-07-03 23:21 ` Jason Gunthorpe
2025-07-03 23:15 ` Jason Gunthorpe
2025-06-30 22:28 ` [PATCH 03/11] iommu: Compute iommu_groups properly for PCIe switches Jason Gunthorpe
2025-07-01 19:29 ` Alex Williamson
2025-07-02 1:04 ` Jason Gunthorpe
2025-07-17 19:25 ` Donald Dutile
2025-07-17 20:27 ` Jason Gunthorpe
2025-07-18 2:31 ` Donald Dutile
2025-07-18 13:32 ` Jason Gunthorpe
2025-06-30 22:28 ` [PATCH 04/11] iommu: Organize iommu_group by member size Jason Gunthorpe
2025-06-30 22:28 ` [PATCH 05/11] PCI: Add pci_reachable_set() Jason Gunthorpe
2025-06-30 22:28 ` [PATCH 06/11] iommu: Use pci_reachable_set() in pci_device_group() Jason Gunthorpe
2025-06-30 22:28 ` [PATCH 07/11] iommu: Validate that pci_for_each_dma_alias() matches the groups Jason Gunthorpe
2025-06-30 22:28 ` [PATCH 08/11] PCI: Add the ACS Enhanced Capability definitions Jason Gunthorpe
2025-06-30 22:28 ` [PATCH 09/11] PCI: Enable ACS Enhanced bits for enable_acs and config_acs Jason Gunthorpe
2025-06-30 22:28 ` [PATCH 10/11] PCI: Check ACS DSP/USP redirect bits in pci_enable_pasid() Jason Gunthorpe
2025-06-30 22:28 ` [PATCH 11/11] PCI: Check ACS Extended flags for pci_bus_isolated() Jason Gunthorpe
2025-07-01 21:48 ` [PATCH 00/11] Fix incorrect iommu_groups with PCIe switches Alex Williamson
2025-07-02 1:47 ` Jason Gunthorpe
2025-07-04 0:37 ` Jason Gunthorpe [this message]
2025-07-11 14:55 ` Alex Williamson
2025-07-11 16:08 ` Jason Gunthorpe
2025-07-08 20:47 ` Jason Gunthorpe
2025-07-11 15:40 ` Alex Williamson
2025-07-11 16:14 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250704003709.GJ1209783@nvidia.com \
--to=jgg@nvidia.com \
--cc=alex.williamson@redhat.com \
--cc=baolu.lu@linux.intel.com \
--cc=bhelgaas@google.com \
--cc=galshalom@nvidia.com \
--cc=iommu@lists.linux.dev \
--cc=joro@8bytes.org \
--cc=jroedel@suse.de \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=maorg@nvidia.com \
--cc=patches@lists.linux.dev \
--cc=robin.murphy@arm.com \
--cc=tdave@nvidia.com \
--cc=tony.zhu@intel.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).