All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nicolin Chen <nicolinc@nvidia.com>
To: "Tian, Kevin" <kevin.tian@intel.com>
Cc: "joro@8bytes.org" <joro@8bytes.org>,
	"jgg@nvidia.com" <jgg@nvidia.com>,
	"will@kernel.org" <will@kernel.org>,
	"robin.murphy@arm.com" <robin.murphy@arm.com>,
	"baolu.lu@linux.intel.com" <baolu.lu@linux.intel.com>,
	"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"xueshuai@linux.alibaba.com" <xueshuai@linux.alibaba.com>
Subject: Re: [PATCH rc v6] iommu: Fix nested pci_dev_reset_iommu_prepare/done()
Date: Fri, 17 Apr 2026 21:56:42 -0700	[thread overview]
Message-ID: <aeMO0whi7pPKE7FH@nvidia.com> (raw)
In-Reply-To: <aeKpxLiykT8O4k1X@Asurada-Nvidia>

On Fri, Apr 17, 2026 at 02:44:41PM -0700, Nicolin Chen wrote:
> On Fri, Apr 17, 2026 at 08:24:27AM +0000, Tian, Kevin wrote:
> > one is that iommu_detach_device_pasid() is not blocked which can trigger
> > devtlb invalidation in middle of reset. but it cannot fail. so the right fix is
> > to skip the blocked device in __iommu_remove_group_pasid().
> 
> Yea, squashing this:
> @@ -3556,3 +3559,4 @@ static void __iommu_remove_group_pasid(struct iommu_group *group,
>         for_each_group_device(group, device) {
> -               if (device->dev->iommu->max_pasids > 0)
> +               /* Device might be already detached for a device recovery */
> +               if (!device->blocked && device->dev->iommu->max_pasids > 0)
>                         iommu_remove_dev_pasid(device->dev, pasid, domain);
> 
> > another is a use-after-free concern upon iommu_detach_device() in
> > middle of reset. In my thinking it will trigger WARN_ON before any UAF:
> > 
> > static void __iommu_group_set_domain_nofail(struct iommu_group *group,
> >                                             struct iommu_domain *new_domain)
> > {
> >         WARN_ON(__iommu_group_set_domain_internal(
> >                 group, new_domain, IOMMU_SET_DOMAIN_MUST_SUCCEED));
> > }
> 
> Yes.
> 
> > but I haven't got time to think about the fix carefully. 
> 
> I think we could squash this:
> 
> @@ -2469,9 +2469,2 @@ static int __iommu_group_set_domain_internal(struct iommu_group *group,
> 
> -       /*
> -        * This is a concurrent attach during device recovery. Reject it until
> -        * pci_dev_reset_iommu_done() attaches the device to group->domain.
> -        */
> -       if (group->recovery_cnt)
> -               return -EBUSY;
> -

On a second thought, we may not simply drop this -- IIRC, we added
it particularly to fence a case where gdevs share the same RID or
some corner case like that?

In a conservative way, we can still reject concurrent attach while
allowing the detach case:

+	/*
+	 * This is a concurrent attach during device recovery. Reject it until
+	 * pci_dev_reset_iommu_done() attaches the device to group->domain.
+	 *
+	 * Note: still allow MUST_SUCCEED callers (detach/teardown) through to
+	 * avoid UAF on domain release paths.
+	 */
+	if (group->recovery_cnt && !(flags & IOMMU_SET_DOMAIN_MUST_SUCCEED))
+		return -EBUSY;
+

In the detach path, it'll move forward and skip per gdev->blocked
inside the for_each_group_device() and defer the attach to done().

Thanks
Nicolin

> @@ -2484,2 +2477,10 @@ static int __iommu_group_set_domain_internal(struct iommu_group *group,
>         for_each_group_device(group, gdev) {
> +               /*
> +                * Skip devices under recovery: they are already attached to
> +                * group->blocking_domain at the hardware level. When their
> +                * reset completes, pci_dev_reset_iommu_done() will re-attach
> +                * them to the updated group->domain.
> +                */
> +               if (gdev->blocked)
> +                       continue;
>                 ret = __iommu_device_set_domain(group, gdev->dev, new_domain,
> @@ -2513,2 +2514,4 @@ static int __iommu_group_set_domain_internal(struct iommu_group *group,
>                         break;
> +               if (gdev->blocked)
> +                       continue;
>                 /*
> 
> 
> > the last one is trivial that goto and guard() shouldn't be mixed in one
> > function according to the cleanup guidelines.
> 
> I don't think this is mixing. The guard is protecting the entire
> routine including those goto paths. So there isn't any goto path
> that is outside the mutex.
> 
> > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> 
> Thanks!
> Nicolin

  reply	other threads:[~2026-04-18  4:57 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-07 19:46 [PATCH rc v6] iommu: Fix nested pci_dev_reset_iommu_prepare/done() Nicolin Chen
2026-04-14 14:20 ` Jason Gunthorpe
2026-04-16  7:48 ` Shuai Xue
2026-04-17  8:24 ` Tian, Kevin
2026-04-17 21:44   ` Nicolin Chen
2026-04-18  4:56     ` Nicolin Chen [this message]
2026-04-21  7:01       ` Tian, Kevin
2026-04-21 17:43         ` Nicolin Chen
2026-04-21  6:58     ` Tian, Kevin
2026-04-21 18:00       ` Nicolin Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aeMO0whi7pPKE7FH@nvidia.com \
    --to=nicolinc@nvidia.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=will@kernel.org \
    --cc=xueshuai@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.