All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Nicolin Chen <nicolinc@nvidia.com>
Cc: joro@8bytes.org, kevin.tian@intel.com, will@kernel.org,
	robin.murphy@arm.com, baolu.lu@linux.intel.com,
	iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
	xueshuai@linux.alibaba.com
Subject: Re: [PATCH rc v6] iommu: Fix nested pci_dev_reset_iommu_prepare/done()
Date: Tue, 14 Apr 2026 11:20:57 -0300	[thread overview]
Message-ID: <20260414142057.GA2643091@nvidia.com> (raw)
In-Reply-To: <20260407194644.171304-1-nicolinc@nvidia.com>

On Tue, Apr 07, 2026 at 12:46:44PM -0700, Nicolin Chen wrote:
> Shuai found that cxl_reset_bus_function() calls pci_reset_bus_function()
> internally while both are calling pci_dev_reset_iommu_prepare/done().
> 
> As pci_dev_reset_iommu_prepare() doesn't support re-entry, the inner call
> will trigger a WARN_ON and return -EBUSY, resulting in failing the entire
> device reset.
> 
> On the other hand, removing the outer calls in the PCI callers is unsafe.
> As pointed out by Kevin, device-specific quirks like reset_hinic_vf_dev()
> execute custom firmware waits after their inner pcie_flr() completes. If
> the IOMMU protection relies solely on the inner reset, the IOMMU will be
> unblocked prematurely while the device is still resetting.
> 
> Instead, fix this by making pci_dev_reset_iommu_prepare/done() reentrant.
> 
> Given the IOMMU core tracks the resetting state per iommu_group while the
> reset is per device, this has to track at the group_device level as well.
> 
> Introduce a 'reset_depth' and a 'blocked' flag to struct group_device, to
> handle the re-entries on the same device. This allows multi-device groups
> to isolate concurrent device resets independently.
> 
> Note that iommu_deferred_attach() and iommu_driver_get_domain_for_dev()
> both now check the per-device 'gdev->blocked' flag instead of a per-group
> flag like 'group->resetting_domain'. This is actually more precise. Also,
> this 'gdev->blocked' will be useful in the future work to flag the device
> blocked by an ongoing/failed reset or quarantine.
> 
> As the reset routine is per gdev, it cannot clear group->resetting_domain
> without iterating over the device list to ensure no other device is being
> reset. Simplify it by replacing the resetting_domain with a 'recovery_cnt'
> in the struct iommu_group.
> 
> Since both helpers are now per gdev, call the per-device set_dev_pasid op
> to recover PASID domains. And add 'max_pasids > 0' checks in both helpers.
> 
> Fixes: c279e83953d9 ("iommu: Introduce pci_dev_reset_iommu_prepare/done()")
> Cc: stable@vger.kernel.org
> Reported-by: Shuai Xue <xueshuai@linux.alibaba.com>
> Closes: https://lore.kernel.org/all/absKsk7qQOwzhpzv@Asurada-Nvidia/
> Suggested-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
> Changelog
>  v6:
>   * Update inline comments and commit message
>   * Add "max_pasids > 0" condition in both helpers
>  v5:
>   https://lore.kernel.org/all/20260404050243.141366-1-nicolinc@nvidia.com/
>   * Add 'blocked' to fix iommu_driver_get_domain_for_dev() return.
>  v4:
>   https://lore.kernel.org/all/20260324014056.36103-1-nicolinc@nvidia.com/
>   * Rename 'reset_cnt' to 'recovery_cnt'
>  v3:
>   https://lore.kernel.org/all/20260321223930.10836-1-nicolinc@nvidia.com/
>   * Turn prepare()/done() to be per-gdev
>   * Use reset_depth to track nested re-entries
>   * Replace group->resetting_domain with a reset_cnt
>  v2:
>   https://lore.kernel.org/all/20260319043135.1153534-1-nicolinc@nvidia.com/
>   * Fix in the helpers by allowing re-entry
>  v1:
>   https://lore.kernel.org/all/20260318220028.1146905-1-nicolinc@nvidia.com/
> 
>  drivers/iommu/iommu.c | 148 +++++++++++++++++++++++++++++++-----------
>  1 file changed, 110 insertions(+), 38 deletions(-)

This looks reasonable to me.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

  reply	other threads:[~2026-04-14 14:21 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-07 19:46 [PATCH rc v6] iommu: Fix nested pci_dev_reset_iommu_prepare/done() Nicolin Chen
2026-04-14 14:20 ` Jason Gunthorpe [this message]
2026-04-16  7:48 ` Shuai Xue
2026-04-17  8:24 ` Tian, Kevin
2026-04-17 21:44   ` Nicolin Chen
2026-04-18  4:56     ` Nicolin Chen
2026-04-21  7:01       ` Tian, Kevin
2026-04-21 17:43         ` Nicolin Chen
2026-04-21  6:58     ` Tian, Kevin
2026-04-21 18:00       ` Nicolin Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260414142057.GA2643091@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=iommu@lists.linux.dev \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nicolinc@nvidia.com \
    --cc=robin.murphy@arm.com \
    --cc=will@kernel.org \
    --cc=xueshuai@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.