From: Nicolin Chen <nicolinc@nvidia.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Robin Murphy <robin.murphy@arm.com>, <joro@8bytes.org>,
<will@kernel.org>, <bhelgaas@google.com>, <iommu@lists.linux.dev>,
<linux-kernel@vger.kernel.org>, <linux-pci@vger.kernel.org>,
<patches@lists.linux.dev>, <pjaroszynski@nvidia.com>,
<vsethi@nvidia.com>
Subject: Re: [PATCH RFC v1 0/2] iommu&pci: Disable ATS during FLR resets
Date: Tue, 10 Jun 2025 13:36:45 -0700 [thread overview]
Message-ID: <aEiXXYdhyeqcNiHX@nvidia.com> (raw)
In-Reply-To: <20250610163045.GI543171@nvidia.com>
On Tue, Jun 10, 2025 at 01:30:45PM -0300, Jason Gunthorpe wrote:
> On Tue, Jun 10, 2025 at 04:37:58PM +0100, Robin Murphy wrote:
> > On 2025-06-09 7:45 pm, Nicolin Chen wrote:
> > > Hi all,
> > >
> > > Per PCIe r6.3, sec 10.3.1 IMPLEMENTATION NOTE, software should disable ATS
> > > before initiating a Function Level Reset, and then ensure no invalidation
> > > requests being issued to a device when its ATS capability is disabled.
> >
> > Not really - what it says is that software should not expect to receive
> > invalidate completions from a function which is in the process of being
> > reset or powered off, and if software doesn't want to be confused by that
> > then it should take care to wait for completion or timeout of all
> > outstanding requests, and avoid issuing new requests, before initiating such
> > a reset or power transition.
>
> The commit message can be more precise, but I agree with the
> conclusion that the right direction for Linux is to disable and block
> ATS, instead of trying to ignore completion time out events, or trying
> to block page table mutations. Ie do what the implementation note
> says..
>
> Maybe:
>
> PCIe permits a device to ignore ATS invalidation TLPs while it is
> processing FLR. This creates a problem visible to the OS where ATS
> invalidation commands will time out. For instance a SVA domain will
> have no coordination with a FLR event and can racily issue ATC
> invalidations into a resetting device.
>
> The OS should do something to mitigate this as we do not want
> production systems to be reporting critical ATS failures, especially
> in a hypervisor environment. Broadly the OS could arrange to ignore
> the timeout, block page table mutations to prevent invalidations, or
> disable and block ATS.
>
> The PCIe spec in sec 10.3.1 IMPLEMENTATION NOTE recommends to disable
> and block ATS, and we already have iommu driver support to implement
> something like this. Implement this approach in the iommu core.
>
> Provide a callback from the PCI subsystem that will enclose the FLR
> and have the iommu core temporarily change all the domain attachments
> into BLOCKED. When attaching a BLOCKED domain IOMMU drivers should
> fence any incoming ATS queries, synchronously stop issuing new ATS
> invalidations, and synchronously wait for all ATS invalidations to
> complete. This will avoid any ATS invaliation time outs.
>
> IOMMU drivers may also disable ATS in PCI config space, but it is not
> required to solve the completion timeout problem. The PCI FLR logic
> will put all the iommu owned config space bits back before completing.
>
> During this period holding the group mutex will not allow new domains
> to be attached to prevent any new ATS invalidations.
Will pick this writing.
> > I guess I can see how messing with the domain attachment
> > underneath the rest of the group manages to prevent new invalidate requests
> > from group->domain being issued to the given function, but it's pretty
> > horrid - leaving the mutex blocked might be just about tolerable for an FLR
> > that's supposed to take no longer than 100ms, but what if we do want to do
> > this for suspend/resume as well?
>
> I don't view this a problem for FLR, we can hold a mutex for a long
> time. It principally delays domain changes which are kind of nonsense
> to be doing concurrently with FLR in the first place..
>
> However, for suspend, we probably want to leave a marker in the group
IIUIC, the thing for suspend/resume is that it would result in a
long hold of the mutex, which can be a problem?
> that the group is force-blocked and all domain attach/detach logic
> will only update the group tracking structures and not call into the
> iommu driver. When the resume happens the core will set the current
> group domain list to the iommu driver. No need for a long lived lock
> this way.
Yea, what we don't want is driver re-enabling ATS. So, bypassing
it at the core level should work. Then, iommu_dev_reset_prepare
and iommu_dev_reset_done will only mutex the flag.
Thanks
Nicolin
next prev parent reply other threads:[~2025-06-10 20:37 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-09 18:45 [PATCH RFC v1 0/2] iommu&pci: Disable ATS during FLR resets Nicolin Chen
2025-06-09 18:45 ` [PATCH RFC v1 1/2] iommu: Introduce iommu_dev_reset_prepare() and iommu_dev_reset_done() Nicolin Chen
2025-06-10 4:26 ` Baolu Lu
2025-06-10 7:07 ` Nicolin Chen
2025-06-10 13:04 ` Jason Gunthorpe
2025-06-10 14:40 ` Robin Murphy
2025-06-10 15:36 ` Jason Gunthorpe
2025-06-10 16:31 ` Robin Murphy
2025-06-10 16:43 ` Jason Gunthorpe
2025-06-10 20:19 ` Nicolin Chen
2025-06-10 23:41 ` Jason Gunthorpe
2025-06-10 11:13 ` kernel test robot
2025-06-09 18:45 ` [PATCH RFC v1 2/2] pci: Suspend ATS before doing FLR Nicolin Chen
2025-06-10 4:27 ` Baolu Lu
2025-06-10 6:55 ` Nicolin Chen
2025-06-10 15:37 ` [PATCH RFC v1 0/2] iommu&pci: Disable ATS during FLR resets Robin Murphy
2025-06-10 16:30 ` Jason Gunthorpe
2025-06-10 20:36 ` Nicolin Chen [this message]
2025-06-10 23:43 ` Jason Gunthorpe
2025-06-13 19:27 ` Bjorn Helgaas
2025-06-13 21:10 ` Nicolin Chen
2025-06-16 13:09 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aEiXXYdhyeqcNiHX@nvidia.com \
--to=nicolinc@nvidia.com \
--cc=bhelgaas@google.com \
--cc=iommu@lists.linux.dev \
--cc=jgg@nvidia.com \
--cc=joro@8bytes.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=patches@lists.linux.dev \
--cc=pjaroszynski@nvidia.com \
--cc=robin.murphy@arm.com \
--cc=vsethi@nvidia.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.