All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Nicolin Chen <nicolinc@nvidia.com>
Cc: joro@8bytes.org, afael@kernel.org, bhelgaas@google.com,
	alex@shazbot.org, will@kernel.org, robin.murphy@arm.com,
	lenb@kernel.org, kevin.tian@intel.com, baolu.lu@linux.intel.com,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-pci@vger.kernel.org, kvm@vger.kernel.org,
	patches@lists.linux.dev, pjaroszynski@nvidia.com,
	vsethi@nvidia.com, helgaas@kernel.org, etzhao1900@gmail.com
Subject: Re: [PATCH v7 4/5] iommu: Introduce pci_dev_reset_iommu_prepare/done()
Date: Tue, 25 Nov 2025 15:27:32 -0400	[thread overview]
Message-ID: <20251125192732.GF520526@nvidia.com> (raw)
In-Reply-To: <31486e8017284e547b04d2be5110522a777d8379.1763775108.git.nicolinc@nvidia.com>

On Fri, Nov 21, 2025 at 05:57:31PM -0800, Nicolin Chen wrote:
> PCIe permits a device to ignore ATS invalidation TLPs while processing a
> reset. This creates a problem visible to the OS where an ATS invalidation
> command will time out. E.g. an SVA domain will have no coordination with a
> reset event and can racily issue ATS invalidations to a resetting device.
> 
> The OS should do something to mitigate this as we do not want production
> systems to be reporting critical ATS failures, especially in a hypervisor
> environment. Broadly, OS could arrange to ignore the timeouts, block page
> table mutations to prevent invalidations, or disable and block ATS.
> 
> The PCIe r6.0, sec 10.3.1 IMPLEMENTATION NOTE recommends SW to disable and
> block ATS before initiating a Function Level Reset. It also mentions that
> other reset methods could have the same vulnerability as well.
> 
> Provide a callback from the PCI subsystem that will enclose the reset and
> have the iommu core temporarily change all the attached RID/PASID domains
> group->blocking_domain so that the IOMMU hardware would fence any incoming
> ATS queries. And IOMMU drivers should also synchronously stop issuing new
> ATS invalidations and wait for all ATS invalidations to complete. This can
> avoid any ATS invaliation timeouts.
> 
> However, if there is a domain attachment/replacement happening during an
> ongoing reset, ATS routines may be re-activated between the two function
> calls. So, introduce a new resetting_domain in the iommu_group structure
> to reject any concurrent attach_dev/set_dev_pasid call during a reset for
> a concern of compatibility failure. Since this changes the behavior of an
> attach operation, update the uAPI accordingly.
> 
> Note that there are two corner cases:
>  1. Devices in the same iommu_group
>     Since an attachment is always per iommu_group, this means that any
>     sibling devices in the iommu_group cannot change domain, to prevent
>     race conditions.
>  2. An SR-IOV PF that is being reset while its VF is not
>     In such case, the VF itself is already broken. So, there is no point
>     in preventing PF from going through the iommu reset.
> 
> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>  include/linux/iommu.h     |  13 +++
>  include/uapi/linux/vfio.h |   4 +
>  drivers/iommu/iommu.c     | 173 ++++++++++++++++++++++++++++++++++++++
>  3 files changed, 190 insertions(+)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

  reply	other threads:[~2025-11-25 19:27 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-22  1:57 [PATCH v7 0/5] Disable ATS via iommu during PCI resets Nicolin Chen
2025-11-22  1:57 ` [PATCH v7 1/5] iommu: Lock group->mutex in iommu_deferred_attach() Nicolin Chen
2025-11-26 12:55   ` Srivastava, Dheeraj Kumar
2025-11-26 16:16     ` Nicolin Chen
2025-11-26 16:19       ` Dheeraj Kumar Srivastava
2025-11-22  1:57 ` [PATCH v7 2/5] iommu: Tidy domain for iommu_setup_dma_ops() Nicolin Chen
2025-11-25 19:19   ` Jason Gunthorpe
2025-11-22  1:57 ` [PATCH v7 3/5] iommu: Add iommu_driver_get_domain_for_dev() helper Nicolin Chen
2025-11-25 19:21   ` Jason Gunthorpe
2025-11-22  1:57 ` [PATCH v7 4/5] iommu: Introduce pci_dev_reset_iommu_prepare/done() Nicolin Chen
2025-11-25 19:27   ` Jason Gunthorpe [this message]
2025-11-22  1:57 ` [PATCH v7 5/5] PCI: Suspend iommu function prior to resetting a device Nicolin Chen
2025-11-25 19:28   ` Jason Gunthorpe
2025-11-26 21:43   ` Bjorn Helgaas
2025-11-26 21:51     ` Nicolin Chen
2025-11-25 14:22 ` [PATCH v7 0/5] Disable ATS via iommu during PCI resets Jörg Rödel
2025-11-25 18:30   ` Nicolin Chen
2025-11-26 16:21 ` Srivastava, Dheeraj Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251125192732.GF520526@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=afael@kernel.org \
    --cc=alex@shazbot.org \
    --cc=baolu.lu@linux.intel.com \
    --cc=bhelgaas@google.com \
    --cc=etzhao1900@gmail.com \
    --cc=helgaas@kernel.org \
    --cc=iommu@lists.linux.dev \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=nicolinc@nvidia.com \
    --cc=patches@lists.linux.dev \
    --cc=pjaroszynski@nvidia.com \
    --cc=robin.murphy@arm.com \
    --cc=vsethi@nvidia.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.