All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nicolin Chen <nicolinc@nvidia.com>
To: <joro@8bytes.org>, <kevin.tian@intel.com>, <jgg@nvidia.com>
Cc: <will@kernel.org>, <robin.murphy@arm.com>,
	<baolu.lu@linux.intel.com>, <iommu@lists.linux.dev>,
	<linux-kernel@vger.kernel.org>, <xueshuai@linux.alibaba.com>
Subject: [PATCH rc v8 0/8] iommu: Fix pci_dev_reset_iommu_prepare/done()
Date: Fri, 24 Apr 2026 18:15:19 -0700	[thread overview]
Message-ID: <cover.1777074513.git.nicolinc@nvidia.com> (raw)

Shuai and Kevin found a few bugs in the pci_dev_reset_iommu_prepare/done()
helpers when used to handle some corner cases:
 - Nested callbacks
 - Multi-device groups
 - WARN_ON/UAF due to concurrent detach

This needs some substantial rework by tracking device reset states on a per
gdev basis. This series includes a few patches addressing them. Most of the
patches are reviewed previously in a single patch v6. As we found more bugs
during the reviews, I split that v6 to smaller patches so each of them will
be cleaner.

This is on Github:
https://github.com/nicolinc/iommufd/commits/fix_iommu_reset-v8

Note that concurrent reset of two DMA alias siblings (sharing the same RID)
might prematurely unblock when one device is done while the other is still
resetting. And it's a bit convoluted to support this case. Given that it's
unclear whether real ATS devices might share RID, for now, add a warning in
the done(). A future work can fix it properly if someone hits it.

Changelog
v8:
 * Add Reviewed-by tags
 * Fix NULL group->domain in done()
 * Tidy goto cleanup when using guard()
 * Update patch subject and commit message
 * Add warning on premature unblocking in DMA alias cases
 * Drop unreachable skip in __iommu_group_set_domain_internal() error path
v7:
 https://lore.kernel.org/all/cover.1776551790.git.nicolinc@nvidia.com/
 * Add Reviewed-by tags
 * Split v6 into smaller patches
 * Add one patch to fix UAF during detach()
 * Add one patch to fix unnecessary ATS invalidation
v6:
 https://lore.kernel.org/all/20260407194644.171304-1-nicolinc@nvidia.com/
 * Update inline comments and commit message
 * Add "max_pasids > 0" condition in both helpers
v5:
 https://lore.kernel.org/all/20260404050243.141366-1-nicolinc@nvidia.com/
 * Add 'blocked' to fix iommu_driver_get_domain_for_dev() return.
v4:
 https://lore.kernel.org/all/20260324014056.36103-1-nicolinc@nvidia.com/
 * Rename 'reset_cnt' to 'recovery_cnt'
v3:
 https://lore.kernel.org/all/20260321223930.10836-1-nicolinc@nvidia.com/
 * Turn prepare()/done() to be per-gdev
 * Use reset_depth to track nested re-entries
 * Replace group->resetting_domain with a reset_cnt
v2:
 https://lore.kernel.org/all/20260319043135.1153534-1-nicolinc@nvidia.com/
 * Fix in the helpers by allowing re-entry
v1:
 https://lore.kernel.org/all/20260318220028.1146905-1-nicolinc@nvidia.com/

Nicolin Chen (8):
  iommu: Fix NULL group->domain dereference in
    pci_dev_reset_iommu_done()
  iommu: Fix kdocs of pci_dev_reset_iommu_done()
  iommu: Replace per-group resetting_domain with per-gdev blocked flag
  iommu: Fix pasid attach in pci_dev_reset_iommu_prepare/done()
  iommu: Fix nested pci_dev_reset_iommu_prepare/done()
  iommu: Fix ATS invalidation timeouts during
    __iommu_remove_group_pasid()
  iommu: Fix WARN_ON in __iommu_group_set_domain_nofail() due to reset
  iommu: Warn on premature unblock during DMA aliased sibling reset

 drivers/iommu/iommu.c | 223 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 181 insertions(+), 42 deletions(-)

-- 
2.43.0


             reply	other threads:[~2026-04-25  1:15 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-25  1:15 Nicolin Chen [this message]
2026-04-25  1:15 ` [PATCH rc v8 1/8] iommu: Fix NULL group->domain dereference in pci_dev_reset_iommu_done() Nicolin Chen
2026-04-27  8:32   ` Baolu Lu
2026-04-25  1:15 ` [PATCH rc v8 2/8] iommu: Fix kdocs of pci_dev_reset_iommu_done() Nicolin Chen
2026-04-25  1:15 ` [PATCH rc v8 3/8] iommu: Replace per-group resetting_domain with per-gdev blocked flag Nicolin Chen
2026-04-25  1:15 ` [PATCH rc v8 4/8] iommu: Fix pasid attach in pci_dev_reset_iommu_prepare/done() Nicolin Chen
2026-04-27  8:34   ` Baolu Lu
2026-04-25  1:15 ` [PATCH rc v8 5/8] iommu: Fix nested pci_dev_reset_iommu_prepare/done() Nicolin Chen
2026-04-25  1:15 ` [PATCH rc v8 6/8] iommu: Fix ATS invalidation timeouts during __iommu_remove_group_pasid() Nicolin Chen
2026-04-27  8:37   ` Baolu Lu
2026-04-25  1:15 ` [PATCH rc v8 7/8] iommu: Fix WARN_ON in __iommu_group_set_domain_nofail() due to reset Nicolin Chen
2026-04-25  1:15 ` [PATCH rc v8 8/8] iommu: Warn on premature unblock during DMA aliased sibling reset Nicolin Chen
2026-05-07  7:49   ` Tian, Kevin
2026-05-11  8:13 ` [PATCH rc v8 0/8] iommu: Fix pci_dev_reset_iommu_prepare/done() Jörg Rödel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1777074513.git.nicolinc@nvidia.com \
    --to=nicolinc@nvidia.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=will@kernel.org \
    --cc=xueshuai@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.