From: Jason Gunthorpe <jgg@nvidia.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Donald Dutile <ddutile@redhat.com>,
Bjorn Helgaas <bhelgaas@google.com>,
iommu@lists.linux.dev, Joerg Roedel <joro@8bytes.org>,
linux-pci@vger.kernel.org, Robin Murphy <robin.murphy@arm.com>,
Will Deacon <will@kernel.org>,
Lu Baolu <baolu.lu@linux.intel.com>,
galshalom@nvidia.com, Joerg Roedel <jroedel@suse.de>,
Kevin Tian <kevin.tian@intel.com>,
kvm@vger.kernel.org, maorg@nvidia.com, patches@lists.linux.dev,
tdave@nvidia.com, Tony Zhu <tony.zhu@intel.com>
Subject: Re: [PATCH 03/11] iommu: Compute iommu_groups properly for PCIe switches
Date: Tue, 23 Sep 2025 10:03:41 -0300 [thread overview]
Message-ID: <20250923130341.GJ1391379@nvidia.com> (raw)
In-Reply-To: <20250922191029.7a000d64.alex.williamson@redhat.com>
On Mon, Sep 22, 2025 at 07:10:29PM -0600, Alex Williamson wrote:
> On Mon, 22 Sep 2025 20:15:41 -0300
> Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> > On Mon, Sep 22, 2025 at 04:32:00PM -0600, Alex Williamson wrote:
> > > The ACS capability was only introduced in PCIe 2.0 and vendors have
> > > only become more diligent about implementing it as it's become
> > > important for device isolation and assignment.
> >
> > IDK about this, I have very new systems and they still not have ACS
> > flags according to this interpretation.
>
> But how can we assume that lack of a non-required capability means
> anything at all??
>
> > > IMO, we can't assume anything at all about a multifunction device
> > > that does not implement ACS.
> >
> > Yeah this is all true.
> >
> > But we are already assuming. Today we assume MFDs without caps must
> > have internal loopback in some cases, and then in other cases we
> > assume they don't.
>
> Where? Is this in reference to our handling of multi-function
> endpoints vs whether downstream switch ports are represented as
> multi-function vs multi-slot?
If you have a MFD Linux with no ACS it will group the whole MFD if any
of it lacks ACS caps because it assumes there is an internal loopback
between functions.
If the MFD has a single function with ACS then only that function is
removed from the group. The only way we can understand this as correct
by our grouping definition is to require the MFD have no internal
loopback. ACS is an egress control, not an ingress control.
If a MFD function is a bridge/port then the group doesn't propogate
the group downstream of the bridge - again this requires assuming
there is no internal loopback between functions.
It is taking the undefined behavior in the spec and selectively making
both interpretations at once.
> > Assuming the MFD does not have internal loopback, while not entirely
> > satisfactory, is the one that gives the least practical breakage.
>
> Seems like it's fixing one gap and opening another. I don't see that we
> can implement ingress and egress isolation without breakage.
Yeah, either we risk more insecurities or we risk large group sizes.
> We may need an opt-in to continue egress only isolation.
It isn't "egress only isolation" - the thing is I can't really
articulate what the current rules even fully are..
I'm not keen on an opt in. I'd rather find some rules we can live
with.
How about we answer the question "does this MFD have internal
loopback" as:
- NO if any function has an appropriate ACS cap or quirk.
- NO if any function is bridge/port
- YES otherwise - all functions are end functions and no ACS declared
As above this is quite a bit closer to what Linux is doing now. It is
a practical estimation of the undefined spec behavior based on the
historical security posture of Linux.
> And hardware vendors are going to volunteer that they lack p2p
> isolation and we need to add a quirk to reduce the isolation...
> dynamics are not in our favor. Hardware vendors have no incentive to
> do the right thing.
They do, otherwise they have major security holes in
virtualization. In an enterprise setting I have no doubt it is already
being done right, and has been for a decade.
I think the above rules will broadly be pessimistic toward add in
cards and optimistic toward the root complex.
Jason
next prev parent reply other threads:[~2025-09-23 13:03 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-30 22:28 [PATCH 00/11] Fix incorrect iommu_groups with PCIe switches Jason Gunthorpe
2025-06-30 22:28 ` [PATCH 01/11] PCI: Move REQ_ACS_FLAGS into pci_regs.h as PCI_ACS_ISOLATED Jason Gunthorpe
2025-06-30 22:28 ` [PATCH 02/11] PCI: Add pci_bus_isolation() Jason Gunthorpe
2025-07-01 19:28 ` Alex Williamson
2025-07-02 1:00 ` Jason Gunthorpe
2025-07-03 15:30 ` Jason Gunthorpe
2025-07-03 22:17 ` Alex Williamson
2025-07-03 23:08 ` Alex Williamson
2025-07-03 23:21 ` Jason Gunthorpe
2025-07-03 23:15 ` Jason Gunthorpe
2025-06-30 22:28 ` [PATCH 03/11] iommu: Compute iommu_groups properly for PCIe switches Jason Gunthorpe
2025-07-01 19:29 ` Alex Williamson
2025-07-02 1:04 ` Jason Gunthorpe
2025-07-17 19:25 ` Donald Dutile
2025-07-17 20:27 ` Jason Gunthorpe
2025-07-18 2:31 ` Donald Dutile
2025-07-18 13:32 ` Jason Gunthorpe
2025-09-22 22:32 ` Alex Williamson
2025-09-22 23:15 ` Jason Gunthorpe
2025-09-23 0:51 ` Donald Dutile
2025-09-23 1:17 ` Alex Williamson
2025-09-23 1:10 ` Alex Williamson
2025-09-23 2:26 ` Donald Dutile
2025-09-23 2:50 ` Alex Williamson
2025-09-23 12:32 ` Jason Gunthorpe
2025-09-23 12:58 ` Alex Williamson
2025-09-23 13:03 ` Jason Gunthorpe [this message]
2025-09-23 21:29 ` Alex Williamson
2025-09-25 12:20 ` Jason Gunthorpe
2025-06-30 22:28 ` [PATCH 04/11] iommu: Organize iommu_group by member size Jason Gunthorpe
2025-06-30 22:28 ` [PATCH 05/11] PCI: Add pci_reachable_set() Jason Gunthorpe
2025-06-30 22:28 ` [PATCH 06/11] iommu: Use pci_reachable_set() in pci_device_group() Jason Gunthorpe
2025-06-30 22:28 ` [PATCH 07/11] iommu: Validate that pci_for_each_dma_alias() matches the groups Jason Gunthorpe
2025-06-30 22:28 ` [PATCH 08/11] PCI: Add the ACS Enhanced Capability definitions Jason Gunthorpe
2025-06-30 22:28 ` [PATCH 09/11] PCI: Enable ACS Enhanced bits for enable_acs and config_acs Jason Gunthorpe
2025-06-30 22:28 ` [PATCH 10/11] PCI: Check ACS DSP/USP redirect bits in pci_enable_pasid() Jason Gunthorpe
2025-06-30 22:28 ` [PATCH 11/11] PCI: Check ACS Extended flags for pci_bus_isolated() Jason Gunthorpe
2025-07-01 21:48 ` [PATCH 00/11] Fix incorrect iommu_groups with PCIe switches Alex Williamson
2025-07-02 1:47 ` Jason Gunthorpe
2025-07-04 0:37 ` Jason Gunthorpe
2025-07-11 14:55 ` Alex Williamson
2025-07-11 16:08 ` Jason Gunthorpe
2025-07-08 20:47 ` Jason Gunthorpe
2025-07-11 15:40 ` Alex Williamson
2025-07-11 16:14 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250923130341.GJ1391379@nvidia.com \
--to=jgg@nvidia.com \
--cc=alex.williamson@redhat.com \
--cc=baolu.lu@linux.intel.com \
--cc=bhelgaas@google.com \
--cc=ddutile@redhat.com \
--cc=galshalom@nvidia.com \
--cc=iommu@lists.linux.dev \
--cc=joro@8bytes.org \
--cc=jroedel@suse.de \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=maorg@nvidia.com \
--cc=patches@lists.linux.dev \
--cc=robin.murphy@arm.com \
--cc=tdave@nvidia.com \
--cc=tony.zhu@intel.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox