From: Jason Gunthorpe <jgg@nvidia.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: Vidya Sagar <vidyas@nvidia.com>,
"corbet@lwn.net" <corbet@lwn.net>,
"bhelgaas@google.com" <bhelgaas@google.com>,
Gal Shalom <galshalom@nvidia.com>,
Leon Romanovsky <leonro@nvidia.com>,
Thierry Reding <treding@nvidia.com>,
Jon Hunter <jonathanh@nvidia.com>,
Masoud Moshref Javadi <mmoshrefjava@nvidia.com>,
Shahaf Shuler <shahafs@nvidia.com>,
Vikram Sethi <vsethi@nvidia.com>,
Shanker Donthineni <sdonthineni@nvidia.com>,
Jiandi An <jan@nvidia.com>, Tushar Dave <tdave@nvidia.com>,
"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Krishna Thota <kthota@nvidia.com>,
Manikanta Maddireddy <mmaddireddy@nvidia.com>,
"sagar.tv@gmail.com" <sagar.tv@gmail.com>,
Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>,
Robin Murphy <robin.murphy@arm.com>,
"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
Alex Williamson <alex.williamson@redhat.com>
Subject: Re: [PATCH V3] PCI: Extend ACS configurability
Date: Wed, 12 Jun 2024 20:23:01 -0300 [thread overview]
Message-ID: <20240612232301.GB19897@nvidia.com> (raw)
In-Reply-To: <20240612212903.GA1037897@bhelgaas>
On Wed, Jun 12, 2024 at 04:29:03PM -0500, Bjorn Helgaas wrote:
> [+cc Alex since VFIO entered the conversation; thread at
> https://lore.kernel.org/r/20240523063528.199908-1-vidyas@nvidia.com]
>
> On Mon, Jun 10, 2024 at 08:38:49AM -0300, Jason Gunthorpe wrote:
> > On Fri, Jun 07, 2024 at 02:30:55PM -0500, Bjorn Helgaas wrote:
> > > "Correctly" is not quite the right word here; it's just a fact that
> > > the ACS settings determined at boot time result in certain IOMMU
> > > groups. If the user desires different groups, it's not that something
> > > is "incorrect"; it's just that the user may have to accept less
> > > isolation to get the desired IOMMU groups.
> >
> > That is not quite accurate.. There are HW configurations where ACS
> > needs to be a certain way for the HW to work with P2P at all. It isn't
> > just an optimization or the user accepts something, if they want P2P
> > at all they must get a ACS configuration appropriate for their system.
>
> The current wording of "For iommu_groups to form correctly, the ACS
> settings in the PCIe fabric need to be setup early" suggests that the
> way we currently configure ACS is incorrect in general, regardless of
> P2PDMA.
Yes, I'd agree with this. We don't have enough information to
configurate it properly in the kernel in an automatic way. We don't
know if pairs of devices even have SW enablement to do P2P in the
kernel and we don't accurately know what issues the root complex
has. All of this information goes into choosing the right ACS bits.
> But my impression is that there's a trade-off between isolation and
> the ability to do P2PDMA, and users have different requirements, and
> the preference for less isolation/more P2PDMA is no more "correct"
> than a preference for more isolation/less P2PDMA.
Sure, that makes sense
> Maybe something like this:
>
> PCIe ACS settings determine how devices are put into iommu_groups.
> The iommu_groups in turn determine which devices can be passed
> through to VMs and whether P2PDMA between them is possible. The
> iommu_groups are built at enumeration-time and are currently static.
Not quite, the iommu_groups don't have alot to do with the P2P. Even
devices in the same kernel group can still have non working P2P.
Maybe:
PCIe ACS settings control the level of isolation and the possible P2P
paths between devices. With greater isolation the kernel will create
smaller iommu_groups and with less isolation there is more HW that
can achieve P2P transfers. From a virtualization perspective all
devices in the same iommu_group must be assigned to the same VM as
they lack security isolation.
There is no way for the kernel to automatically know the correct
ACS settings for any given system and workload. Existing command line
options allow only for large scale change, disabling all
isolation, but this is not sufficient for more complex cases.
Add a kernel command-line option to directly control all the ACS bits
for specific devices, which allows the operator to setup the right
level of isolation to achieve the desired P2P configuration. The
definition is future proof, when new ACS bits are added to the spec
the open syntax can be extended.
ACS needs to be setup early in the kernel boot as the ACS settings
effect how iommu_groups are formed. iommu_group formation is a one
time event during initial device discovery, changing ACS bits after
kernel boot can result in an inaccurate view of the iommu_groups
compared to the current isolation configuration.
ACS applies to PCIe Downstream Ports and multi-function devices.
The default ACS settings are strict and deny any direct traffic
between two functions. This results in the smallest iommu_group the
HW can support. Frequently these values result in slow or
non-working P2PDMA.
ACS offers a range of security choices controlling how traffic is
allowed to go directly between two devices. Some popular choices:
- Full prevention
- Translated requests can be direct, with various options
- Asymetric direct traffic, A can reach B but not the reverse
- All traffic can be direct
Along with some other less common ones for special topologies.
The intention is that this option would be used with expert knowledge
of the HW capability and workload to achieve the desired
configuration.
Jason
next prev parent reply other threads:[~2024-06-12 23:23 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-21 11:09 [PATCH V2] PCI: Extend ACS configurability Vidya Sagar
2024-05-21 15:44 ` kernel test robot
2024-05-23 6:35 ` [PATCH V3] " Vidya Sagar
2024-05-23 14:59 ` Bjorn Helgaas
2024-05-23 15:16 ` Jason Gunthorpe
2024-06-03 7:50 ` Vidya Sagar
2024-06-07 19:30 ` Bjorn Helgaas
2024-06-10 11:38 ` Jason Gunthorpe
2024-06-12 21:29 ` Bjorn Helgaas
2024-06-12 23:23 ` Jason Gunthorpe [this message]
2024-06-13 22:05 ` Bjorn Helgaas
2024-06-13 23:36 ` Jason Gunthorpe
2024-06-13 22:38 ` Alex Williamson
2024-06-12 12:19 ` Jason Gunthorpe
2024-06-25 15:31 ` [PATCH V4] " Vidya Sagar
2024-06-25 16:26 ` Lukas Wunner
2024-06-25 16:39 ` Jason Gunthorpe
2024-06-26 6:02 ` Leon Romanovsky
2024-06-26 7:40 ` Tian, Kevin
2024-06-26 11:50 ` Jason Gunthorpe
2024-07-08 14:39 ` Jason Gunthorpe
2024-07-12 21:57 ` Bjorn Helgaas
2024-09-25 5:06 ` Jiri Slaby
2024-09-25 5:29 ` Jiri Slaby
2024-09-25 5:49 ` Jiri Slaby
2024-10-01 19:33 ` Jason Gunthorpe
2024-10-07 16:36 ` Steffen Dirkwinkel
2024-10-07 20:43 ` Bjorn Helgaas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240612232301.GB19897@nvidia.com \
--to=jgg@nvidia.com \
--cc=alex.williamson@redhat.com \
--cc=bhelgaas@google.com \
--cc=corbet@lwn.net \
--cc=galshalom@nvidia.com \
--cc=helgaas@kernel.org \
--cc=iommu@lists.linux.dev \
--cc=jan@nvidia.com \
--cc=jonathanh@nvidia.com \
--cc=joro@8bytes.org \
--cc=kthota@nvidia.com \
--cc=leonro@nvidia.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=mmaddireddy@nvidia.com \
--cc=mmoshrefjava@nvidia.com \
--cc=robin.murphy@arm.com \
--cc=sagar.tv@gmail.com \
--cc=sdonthineni@nvidia.com \
--cc=shahafs@nvidia.com \
--cc=tdave@nvidia.com \
--cc=treding@nvidia.com \
--cc=vidyas@nvidia.com \
--cc=vsethi@nvidia.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).