llvm.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Robin Murphy <robin.murphy@arm.com>
Cc: iommu@lists.linux.dev, Joerg Roedel <joro@8bytes.org>,
	llvm@lists.linux.dev, Nathan Chancellor <nathan@kernel.org>,
	Nick Desaulniers <ndesaulniers@google.com>,
	Miguel Ojeda <ojeda@kernel.org>, Tom Rix <trix@redhat.com>,
	Will Deacon <will@kernel.org>,
	Lu Baolu <baolu.lu@linux.intel.com>,
	Kevin Tian <kevin.tian@intel.com>,
	Nicolin Chen <nicolinc@nvidia.com>
Subject: Re: [PATCH v3 03/17] iommu: Make __iommu_group_set_domain() handle error unwind
Date: Thu, 13 Apr 2023 15:39:53 -0300	[thread overview]
Message-ID: <ZDhMed+RDDcvmNle@nvidia.com> (raw)
In-Reply-To: <eb151466-1357-846a-a365-1517508fea1c@arm.com>

On Thu, Apr 13, 2023 at 05:47:14PM +0100, Robin Murphy wrote:
> On 06/04/2023 8:09 pm, Jason Gunthorpe wrote:
> > On Thu, Apr 06, 2023 at 07:08:59PM +0100, Robin Murphy wrote:
> > 
> > > > +		if (ret) {
> > > > +			result = ret;
> > > > +			/*
> > > > +			 * Keep trying the other devices in the group. If a
> > > > +			 * driver fails attach to an otherwise good domain, and
> > > > +			 * does not support blocking domains, it should at least
> > > > +			 * drop its reference on the current domain so we don't
> > > > +			 * UAF.
> > > > +			 */
> > > > +			if (flags & IOMMU_SET_DOMAIN_MUST_SUCCEED)
> > > > +				continue;
> > > 
> > > I know it's called out above, but my brain truly can't cope with a flag
> > > named "MUST_SUCCEED" which actually means "can fail, but you can't reason
> > > about the state after failure". As before, I think the actual flag itself
> > > can go away, but the behaviour itself will still be no nicer for being
> > > implicit :/
> > 
> > What I saw is that a bunch of drivers tend to implement attach_dev
> > with first some kind of detach, which often cannot fail, and then can
> > fail the new attach.
> > 
> > So, this is intended to go along with that design pattern. We call
> > every device's attach and hope that the outcome is either to release
> > the current iommu_domain or to attach to the new one.
> 
> It's that design pattern which is most of the problem, though.

Yes, this is definately true

> It's a left-over from the old pre-default-domain era (quite
> literally, in most cases, where drivers were just minimally bodged
> to simulate the previous core call of ops->detach_dev). In the
> modern always-attached-to-something paradigm, drivers should
> absolutely be able to figure out if an attach will fail *before*
> they throw away any existing device state.

That is fine

I think I muddled the description I gave last time.

The loop is doing this:

	for_each_group_device(group, gdev) {
		ret = __iommu_device_set_domain(group, gdev->dev, new_domain,
						flags);
		if (ret) {
			result = ret;
			/*
			 * Keep trying the other devices in the group. If a
			 * driver fails attach to an otherwise good domain, and
			 * does not support blocking domains, it should at least
			 * drop its reference on the current domain so we don't
			 * UAF.
			 */
			if (flags & IOMMU_SET_DOMAIN_MUST_SUCCEED)
				continue;
			goto err_revert;

And then in __iommu_device_set_domain:

	ret = __iommu_attach_device(new_domain, dev);
	if (ret) {
		/*
		 * If we have a blocking domain then try to attach that in hopes
		 * of avoiding a UAF. Modern drivers should implement blocking
		 * domains as global statics that cannot fail.
		 */
		if ((flags & IOMMU_SET_DOMAIN_MUST_SUCCEED) &&
		    group->blocking_domain &&
		    group->blocking_domain != new_domain)
			__iommu_attach_device(group->blocking_domain, dev);

So, for "old" drivers the ops->attach_dev will disconnect the domain while
it fails.

"new" drivers will leave the domain attached and the above will then
go on an explicitly use the blocking_domain.

So the purpose of the loop is to go over every device and detach the
domain. Either with the sketchy old driver implicit detach or the new
driver explicit blocking domain attach.

The blocking domain part isn't 100% finished yet, as
group->blocking_domain is often NULL, and might be an UNMANAGED
domain, but from a failure management perspecitive this is what I am
proposing.

> If we reach the point of adding core code that only serves to help drivers
> emulate the new API using the old API because we'd rather undermine the new
> API design than fix the drivers, then frankly that seems like an argument
> for just reverting back to the old API.

I don't see it as undermining, what a new high quality driver should
implement is:

 - Do all attach checks before changing any state. Once state changing
   happens it should succeed

 - Thus leave the old domain in place on attach failure

 - Provide a blocking domain with an attach_dev that cannot fail

 - If attach must fail and cannot go back to the old domain, it must
   go to a blocking domain

 - Re-attach of a previously attached domain should succeed

 - [missing] provide the blocking domain pointer through the device
   ops so it is always available.

 - Do not fail when attaching the blocking domain

Effectively the blocking domain is filling the error handling role
left by detach_dev. What we are doing is unmuddling the mixture of
detach_dev behaviors where some drivers set IDENTITY and some drivers
set BLOCKING.

> > The primary purpose of the MUST_SUCCEED flag is to prevent the code
> > from trying to go back to the old domain. This really should only
> > depend on the caller's context and ability to handle errors, so I
> > don't see what would make it implicit?
> 
> It could be implicit *because* it's contextual; the domain types already
> carry as much information as the flag would. Attaching to a non-core
> (unmanaged or SVA) domain type is the one case where core code has clear
> reason to unwind if an attach fails mid-group. 

It is not universal, eg the "change default domain" flow looks like this:

	/* We must set default_domain early for __iommu_device_set_domain */
	group->default_domain = dom;
	if (!group->domain) {
          ..
	} else {
		ret = __iommu_group_set_domain(group, dom);
		if (ret) {

In this case even though we are attaching the (new) default domain
what we want is to unroll back to group->domain and leave the old
default_domain in place.

I find it much clearer to directly specify the error unwind expecation
of the caller based on the caller's logic and ability to handle the
returned error than to try an infer it.

> Attaching from non-core to core should represent a repeat of something which
> previously succeeded for the given device+domain combination, so whether we
> permit that to fail-to-limbo or mandate that drivers must be able to repeat
> a previous success mostly depends on what we think is a reasonable
> expectation of how to design a driver

I think most drivers are OK to repeat, but some like s390 have FW
interfaces that are not reliable, so repeat can fail there. I would
say a drivers should strive for a design that repeat should work.

The repeat safe drivers never hit these error cases anyhow..

When thinking about this error handling I was looking at cases like
s390. For their flow the first thing attach does is a FW call to set a
blocking behavior, if they have a failure from that point they can
leave the device blocking. When the core code asks for blocking during
the error handling it is NOP.

Jason

  reply	other threads:[~2023-04-13 18:39 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-05 23:38 [PATCH v3 00/17] Consolidate the error handling around device attachment Jason Gunthorpe
2023-04-05 23:38 ` [PATCH v3 01/17] iommu: Replace iommu_group_device_count() with list_count_nodes() Jason Gunthorpe
2023-04-05 23:38 ` [PATCH v3 02/17] iommu: Add for_each_group_device() Jason Gunthorpe
2023-04-05 23:38 ` [PATCH v3 03/17] iommu: Make __iommu_group_set_domain() handle error unwind Jason Gunthorpe
2023-04-06 18:08   ` Robin Murphy
2023-04-06 19:09     ` Jason Gunthorpe
2023-04-13 16:47       ` Robin Murphy
2023-04-13 18:39         ` Jason Gunthorpe [this message]
2023-04-05 23:38 ` [PATCH v3 04/17] iommu: Use __iommu_group_set_domain() for __iommu_attach_group() Jason Gunthorpe
2023-04-05 23:38 ` [PATCH v3 05/17] iommu: Use __iommu_group_set_domain() in iommu_change_dev_def_domain() Jason Gunthorpe
2023-04-05 23:38 ` [PATCH v3 06/17] iommu: Replace __iommu_group_dma_first_attach() with set_domain Jason Gunthorpe
2023-04-06 15:58   ` Robin Murphy
2023-04-06 19:17     ` Jason Gunthorpe
2023-04-05 23:38 ` [PATCH v3 07/17] iommu: Make iommu_group_do_dma_first_attach() simpler Jason Gunthorpe
2023-04-06  3:28   ` Baolu Lu
2023-04-06 15:44   ` Robin Murphy
2023-04-06 16:09     ` Jason Gunthorpe
2023-04-06 19:05       ` Robin Murphy
2023-04-05 23:38 ` [PATCH v3 08/17] iommu: Make iommu_group_do_dma_first_attach() work with owned groups Jason Gunthorpe
2023-04-06 16:37   ` Robin Murphy
2023-04-06 19:34     ` Jason Gunthorpe
2023-04-05 23:38 ` [PATCH v3 09/17] iommu: Fix iommu_probe_device() to attach the right domain Jason Gunthorpe
2023-04-05 23:38 ` [PATCH v3 10/17] iommu: Do iommu_group_create_direct_mappings() before attach Jason Gunthorpe
2023-04-06  3:14   ` Baolu Lu
2023-04-05 23:38 ` [PATCH v3 11/17] iommu: Remove the assignment of group->domain during default domain alloc Jason Gunthorpe
2023-04-05 23:38 ` [PATCH v3 12/17] iommu: Consolidate the code to calculate the target default domain type Jason Gunthorpe
2023-04-05 23:38 ` [PATCH v3 13/17] iommu: Revise iommu_group_alloc_default_domain() Jason Gunthorpe
2023-04-06  3:40   ` Baolu Lu
2023-04-06 11:34     ` Jason Gunthorpe
2023-04-05 23:38 ` [PATCH v3 14/17] iommu: Consolidate the default_domain setup to one function Jason Gunthorpe
2023-04-06  4:43   ` Baolu Lu
2023-04-05 23:38 ` [PATCH v3 15/17] iommu: Allow IOMMU_RESV_DIRECT to work on ARM Jason Gunthorpe
2023-04-06  3:53   ` Baolu Lu
2023-04-05 23:38 ` [PATCH v3 16/17] iommu: Remove __iommu_group_for_each_dev() Jason Gunthorpe
2023-04-05 23:38 ` [PATCH v3 17/17] iommu: Tidy the control flow in iommu_group_store_type() Jason Gunthorpe
2023-04-07 11:02 ` [PATCH v3 00/17] Consolidate the error handling around device attachment Naresh Kamboju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZDhMed+RDDcvmNle@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=iommu@lists.linux.dev \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=llvm@lists.linux.dev \
    --cc=nathan@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=nicolinc@nvidia.com \
    --cc=ojeda@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=trix@redhat.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).