llvm.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Robin Murphy <robin.murphy@arm.com>
Cc: iommu@lists.linux.dev, Joerg Roedel <joro@8bytes.org>,
	llvm@lists.linux.dev, Nathan Chancellor <nathan@kernel.org>,
	Nick Desaulniers <ndesaulniers@google.com>,
	Miguel Ojeda <ojeda@kernel.org>, Tom Rix <trix@redhat.com>,
	Will Deacon <will@kernel.org>,
	Lu Baolu <baolu.lu@linux.intel.com>,
	Kevin Tian <kevin.tian@intel.com>,
	Nicolin Chen <nicolinc@nvidia.com>
Subject: Re: [PATCH v3 03/17] iommu: Make __iommu_group_set_domain() handle error unwind
Date: Thu, 6 Apr 2023 16:09:34 -0300	[thread overview]
Message-ID: <ZC8Y7g99aAQURbtj@nvidia.com> (raw)
In-Reply-To: <26039535-5f78-74f1-7864-978ad6c2810d@arm.com>

On Thu, Apr 06, 2023 at 07:08:59PM +0100, Robin Murphy wrote:

> > +		if (ret) {
> > +			result = ret;
> > +			/*
> > +			 * Keep trying the other devices in the group. If a
> > +			 * driver fails attach to an otherwise good domain, and
> > +			 * does not support blocking domains, it should at least
> > +			 * drop its reference on the current domain so we don't
> > +			 * UAF.
> > +			 */
> > +			if (flags & IOMMU_SET_DOMAIN_MUST_SUCCEED)
> > +				continue;
> 
> I know it's called out above, but my brain truly can't cope with a flag
> named "MUST_SUCCEED" which actually means "can fail, but you can't reason
> about the state after failure". As before, I think the actual flag itself
> can go away, but the behaviour itself will still be no nicer for being
> implicit :/

What I saw is that a bunch of drivers tend to implement attach_dev
with first some kind of detach, which often cannot fail, and then can
fail the new attach.

So, this is intended to go along with that design pattern. We call
every device's attach and hope that the outcome is either to release
the current iommu_domain or to attach to the new one.

If a driver doesn't follow this pattern then we'll UAF.

The primary purpose of the MUST_SUCCEED flag is to prevent the code
from trying to go back to the old domain. This really should only
depend on the caller's context and ability to handle errors, so I
don't see what would make it implicit?

We definitely do not want to attach back the old domain if we are
about to unconditionally free it.

> TBH I still think the responsibility should be on the drivers to handle
> rollback of their internal per-device state if returning failure from an
> .attach_dev call.

It helps the driver stay consistent, but I don't think it helps the
core code too much.

With the two patches to add the rodata IDENTITY domain to drivers I
think we should go in that direction.

Lets have an "ops->emergency_domain" that is something the driver can
always attach. Ideally it will be the BLOCKING domain, but IDENTITY
could work too.

When this function gets into trouble it will put the entire group on
'emergency_domain' and things will be safe, no UAF.

This seems pretty simple for drivers since the attach_dev can just
leave behind whatever state is convenient, so long as their
emergency_domain cannot fail.

> But what would be really really really good, now that .detach_dev has
> truly gone, is to pass *both* "to" and "from" domains to .attach_dev (and a
> "from" domain to .release_device), so that drivers don't have to squirrel
> away all these internal references to their own notions of "current" domains
> which are the root of this whole class of horrid UAF and consistency
> bugs.

Usually the IOMMU HW will be touching the IOPTEs via DMA enclosed by
the iommu_domain, so even if we could remove the iommu_domain pointers
from the drivers it still retains the HW programming to the
memory. Freeing the domain will still UAF.

Refcounts would be another way to address the UAF, then we can leak
the iommu_domain if things go bad. I think this is more complex on drivers.

Jason

  reply	other threads:[~2023-04-06 19:09 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-05 23:38 [PATCH v3 00/17] Consolidate the error handling around device attachment Jason Gunthorpe
2023-04-05 23:38 ` [PATCH v3 01/17] iommu: Replace iommu_group_device_count() with list_count_nodes() Jason Gunthorpe
2023-04-05 23:38 ` [PATCH v3 02/17] iommu: Add for_each_group_device() Jason Gunthorpe
2023-04-05 23:38 ` [PATCH v3 03/17] iommu: Make __iommu_group_set_domain() handle error unwind Jason Gunthorpe
2023-04-06 18:08   ` Robin Murphy
2023-04-06 19:09     ` Jason Gunthorpe [this message]
2023-04-13 16:47       ` Robin Murphy
2023-04-13 18:39         ` Jason Gunthorpe
2023-04-05 23:38 ` [PATCH v3 04/17] iommu: Use __iommu_group_set_domain() for __iommu_attach_group() Jason Gunthorpe
2023-04-05 23:38 ` [PATCH v3 05/17] iommu: Use __iommu_group_set_domain() in iommu_change_dev_def_domain() Jason Gunthorpe
2023-04-05 23:38 ` [PATCH v3 06/17] iommu: Replace __iommu_group_dma_first_attach() with set_domain Jason Gunthorpe
2023-04-06 15:58   ` Robin Murphy
2023-04-06 19:17     ` Jason Gunthorpe
2023-04-05 23:38 ` [PATCH v3 07/17] iommu: Make iommu_group_do_dma_first_attach() simpler Jason Gunthorpe
2023-04-06  3:28   ` Baolu Lu
2023-04-06 15:44   ` Robin Murphy
2023-04-06 16:09     ` Jason Gunthorpe
2023-04-06 19:05       ` Robin Murphy
2023-04-05 23:38 ` [PATCH v3 08/17] iommu: Make iommu_group_do_dma_first_attach() work with owned groups Jason Gunthorpe
2023-04-06 16:37   ` Robin Murphy
2023-04-06 19:34     ` Jason Gunthorpe
2023-04-05 23:38 ` [PATCH v3 09/17] iommu: Fix iommu_probe_device() to attach the right domain Jason Gunthorpe
2023-04-05 23:38 ` [PATCH v3 10/17] iommu: Do iommu_group_create_direct_mappings() before attach Jason Gunthorpe
2023-04-06  3:14   ` Baolu Lu
2023-04-05 23:38 ` [PATCH v3 11/17] iommu: Remove the assignment of group->domain during default domain alloc Jason Gunthorpe
2023-04-05 23:38 ` [PATCH v3 12/17] iommu: Consolidate the code to calculate the target default domain type Jason Gunthorpe
2023-04-05 23:38 ` [PATCH v3 13/17] iommu: Revise iommu_group_alloc_default_domain() Jason Gunthorpe
2023-04-06  3:40   ` Baolu Lu
2023-04-06 11:34     ` Jason Gunthorpe
2023-04-05 23:38 ` [PATCH v3 14/17] iommu: Consolidate the default_domain setup to one function Jason Gunthorpe
2023-04-06  4:43   ` Baolu Lu
2023-04-05 23:38 ` [PATCH v3 15/17] iommu: Allow IOMMU_RESV_DIRECT to work on ARM Jason Gunthorpe
2023-04-06  3:53   ` Baolu Lu
2023-04-05 23:38 ` [PATCH v3 16/17] iommu: Remove __iommu_group_for_each_dev() Jason Gunthorpe
2023-04-05 23:38 ` [PATCH v3 17/17] iommu: Tidy the control flow in iommu_group_store_type() Jason Gunthorpe
2023-04-07 11:02 ` [PATCH v3 00/17] Consolidate the error handling around device attachment Naresh Kamboju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZC8Y7g99aAQURbtj@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=iommu@lists.linux.dev \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=llvm@lists.linux.dev \
    --cc=nathan@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=nicolinc@nvidia.com \
    --cc=ojeda@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=trix@redhat.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).