Linux PCI subsystem development
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: Donald Dutile <ddutile@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	iommu@lists.linux.dev, Joerg Roedel <joro@8bytes.org>,
	linux-pci@vger.kernel.org, Robin Murphy <robin.murphy@arm.com>,
	Will Deacon <will@kernel.org>,
	Lu Baolu <baolu.lu@linux.intel.com>,
	galshalom@nvidia.com, Joerg Roedel <jroedel@suse.de>,
	Kevin Tian <kevin.tian@intel.com>,
	kvm@vger.kernel.org, maorg@nvidia.com, patches@lists.linux.dev,
	tdave@nvidia.com, Tony Zhu <tony.zhu@intel.com>
Subject: Re: [PATCH 03/11] iommu: Compute iommu_groups properly for PCIe switches
Date: Mon, 22 Sep 2025 20:50:27 -0600	[thread overview]
Message-ID: <20250922205027.229614fa.alex.williamson@redhat.com> (raw)
In-Reply-To: <066e288e-8421-4daf-ae62-f24e54f8be68@redhat.com>

On Mon, 22 Sep 2025 22:26:26 -0400
Donald Dutile <ddutile@redhat.com> wrote:

> On 9/22/25 9:10 PM, Alex Williamson wrote:
> > On Mon, 22 Sep 2025 20:15:41 -0300
> > Jason Gunthorpe <jgg@nvidia.com> wrote:
> >   
> >> On Mon, Sep 22, 2025 at 04:32:00PM -0600, Alex Williamson wrote:  
> >>> The ACS capability was only introduced in PCIe 2.0 and vendors have
> >>> only become more diligent about implementing it as it's become
> >>> important for device isolation and assignment.  
> >>
> >> IDK about this, I have very new systems and they still not have ACS
> >> flags according to this interpretation.  
> > 
> > But how can we assume that lack of a non-required capability means
> > anything at all??
> >     
> ok, I'll bite on the the dumb answer...
> lots of non-support is represented by lack of a control structure.
> ... should we assume there are hidden VFs b/c there is a lack of a vf cap structure?
> ... <insert your favorite dumb answer here> :-)

This is not how an additive specification works.  We start with a base
specification.  We add capabilities to describe features of the device.
If a device doesn't support an SR-IOV capability, it doesn't support
VFs.  But likewise we cannot add an optional capability and
retroactively declare that anything that does not support this
capability must have some specific behavior.

That's not what the spec is doing.  We're misinterpreting it.  The
sections of the spec you're quoting are saying that if a MFD function
supports ACS it must support this specific p2p set of capability and
control bits unless the device does not support internal p2p.

> I can certainly see why a hw vendor would -not- put a control structure
> into a piece of hw that is not needed, as the spec states.
> For every piece of hw one creates, one has to invest resources to verify
> it is working correctly, and if verification is done correctly, verify it
> doesn't cause unexpected errors.  I've seen this resource req back in
> my HDL days (developers design w/HDL; hw verification engineers are the
> QE equivalent to sw, verifying the hw does and does not do what it is spec'd).

As previously noted, an "empty" ACS capability serves this purpose with
minimal verification.

> Penalizing a hw vendor for following the spec, and saving resources,
> seems wrong to me, to require them to quirk their spec-correct device.

IMO, we're clearly conflating the implementation of the ACS p2p
capability bits with the implementation of the ACS extended capability
itself.

> I suspect section 6.12.1.2 was written by hw vendors, looking to reduce
> their hw design & verification efforts.  If written by sw vendors, it
> would have likely required 'empty ACS' structs as you have mentioned in other thread(s).

We've had NIC vendors implement an empty ACS capability to convey the
fact that the device does not support internal p2p.  There is precedent
for the interpretation I'm describing.

> >>> IMO, we can't assume anything at all about a multifunction device
> >>> that does not implement ACS.  
> >>
> >> Yeah this is all true.
> >>
> >> But we are already assuming. Today we assume MFDs without caps must
> >> have internal loopback in some cases, and then in other cases we
> >> assume they don't.  
> > 
> > Where?  Is this in reference to our handling of multi-function
> > endpoints vs whether downstream switch ports are represented as
> > multi-function vs multi-slot?
> > 
> > I believe we consider multifunction endpoints and root ports to lack
> > isolation if they do not expose an ACS capability and an "empty" ACS
> > capability on a multifunction endpoint is sufficient to declare that
> > the device does not support internal p2p.  Everything else is quirks.
> >   
> >> I've sent and people have tested various different rules - please tell
> >> me what you can live with.  
> > 
> > I think this interpretation that lack of an ACS capability implies
> > anything is wrong.  Lack of a specific p2p capability within an ACS
> > capability does imply lack of p2p support.
> >   
> >> Assuming the MFD does not have internal loopback, while not entirely
> >> satisfactory, is the one that gives the least practical breakage.  
> > 
> > Seems like it's fixing one gap and opening another.  I don't see that we
> > can implement ingress and egress isolation without breakage.  We may
> > need an opt-in to continue egress only isolation.
> >   
> >> I think it most accurately reflects the majority of real hardware out
> >> there.
> >>
> >> We can quirk to fix the remainder.
> >>
> >> This is the best plan I've got..  
> > 
> > And hardware vendors are going to volunteer that they lack p2p
> > isolation and we need to add a quirk to reduce the isolation... the
> > dynamics are not in our favor.  Hardware vendors have no incentive to
> > do the right thing.  Thanks,
> >   
> I gave an example above why hw vendors have every incentive not to
> add an ACS structure if they don't need it. Not doing so, when they
> can do p2p, is a clear PCIe spec violation.  Punishing the correct
> implementations for the incorrect ones is not appropriate, and is
> further incentive to continue to be incorrect.
> 
> Don't we have the hooks with kernel cmdline disable_acs_redir &
> config_acs params to solve the insecure cases that may (would) be
> found, so breaking the isolation is relatively easy to fix vs adding
> quirks as is done today for proper spec interpretation?

Are we going to expect users to opt-in to securing their system?  This
is just doubling down on an incorrect spec interpretation.  Lack of an
optional extended capability cannot convey anything about the p2p
capabilities of the device <full stop>.  Thanks,

Alex


  reply	other threads:[~2025-09-23  2:50 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-30 22:28 [PATCH 00/11] Fix incorrect iommu_groups with PCIe switches Jason Gunthorpe
2025-06-30 22:28 ` [PATCH 01/11] PCI: Move REQ_ACS_FLAGS into pci_regs.h as PCI_ACS_ISOLATED Jason Gunthorpe
2025-06-30 22:28 ` [PATCH 02/11] PCI: Add pci_bus_isolation() Jason Gunthorpe
2025-07-01 19:28   ` Alex Williamson
2025-07-02  1:00     ` Jason Gunthorpe
2025-07-03 15:30     ` Jason Gunthorpe
2025-07-03 22:17       ` Alex Williamson
2025-07-03 23:08         ` Alex Williamson
2025-07-03 23:21           ` Jason Gunthorpe
2025-07-03 23:15         ` Jason Gunthorpe
2025-06-30 22:28 ` [PATCH 03/11] iommu: Compute iommu_groups properly for PCIe switches Jason Gunthorpe
2025-07-01 19:29   ` Alex Williamson
2025-07-02  1:04     ` Jason Gunthorpe
2025-07-17 19:25       ` Donald Dutile
2025-07-17 20:27         ` Jason Gunthorpe
2025-07-18  2:31           ` Donald Dutile
2025-07-18 13:32             ` Jason Gunthorpe
2025-09-22 22:32               ` Alex Williamson
2025-09-22 23:15                 ` Jason Gunthorpe
2025-09-23  0:51                   ` Donald Dutile
2025-09-23  1:17                     ` Alex Williamson
2025-09-23  1:10                   ` Alex Williamson
2025-09-23  2:26                     ` Donald Dutile
2025-09-23  2:50                       ` Alex Williamson [this message]
2025-09-23 12:32                         ` Jason Gunthorpe
2025-09-23 12:58                           ` Alex Williamson
2025-09-23 13:03                     ` Jason Gunthorpe
2025-09-23 21:29                       ` Alex Williamson
2025-09-25 12:20                         ` Jason Gunthorpe
2025-06-30 22:28 ` [PATCH 04/11] iommu: Organize iommu_group by member size Jason Gunthorpe
2025-06-30 22:28 ` [PATCH 05/11] PCI: Add pci_reachable_set() Jason Gunthorpe
2025-06-30 22:28 ` [PATCH 06/11] iommu: Use pci_reachable_set() in pci_device_group() Jason Gunthorpe
2025-06-30 22:28 ` [PATCH 07/11] iommu: Validate that pci_for_each_dma_alias() matches the groups Jason Gunthorpe
2025-06-30 22:28 ` [PATCH 08/11] PCI: Add the ACS Enhanced Capability definitions Jason Gunthorpe
2025-06-30 22:28 ` [PATCH 09/11] PCI: Enable ACS Enhanced bits for enable_acs and config_acs Jason Gunthorpe
2025-06-30 22:28 ` [PATCH 10/11] PCI: Check ACS DSP/USP redirect bits in pci_enable_pasid() Jason Gunthorpe
2025-06-30 22:28 ` [PATCH 11/11] PCI: Check ACS Extended flags for pci_bus_isolated() Jason Gunthorpe
2025-07-01 21:48 ` [PATCH 00/11] Fix incorrect iommu_groups with PCIe switches Alex Williamson
2025-07-02  1:47   ` Jason Gunthorpe
2025-07-04  0:37   ` Jason Gunthorpe
2025-07-11 14:55     ` Alex Williamson
2025-07-11 16:08       ` Jason Gunthorpe
2025-07-08 20:47   ` Jason Gunthorpe
2025-07-11 15:40     ` Alex Williamson
2025-07-11 16:14       ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250922205027.229614fa.alex.williamson@redhat.com \
    --to=alex.williamson@redhat.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=bhelgaas@google.com \
    --cc=ddutile@redhat.com \
    --cc=galshalom@nvidia.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=jroedel@suse.de \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=maorg@nvidia.com \
    --cc=patches@lists.linux.dev \
    --cc=robin.murphy@arm.com \
    --cc=tdave@nvidia.com \
    --cc=tony.zhu@intel.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox