Re: [RFC PATCH 1/3] iommu: introduce IOMMU_DOMAIN_HYP domain type for hypervisor allocation

iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed

From: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
Cc: "iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org"
	<iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>
Subject: Re: [RFC PATCH 1/3] iommu: introduce IOMMU_DOMAIN_HYP domain type for hypervisor allocation
Date: Tue, 01 Jul 2014 13:28:11 -0600	[thread overview]
Message-ID: <1404242891.3225.144.camel@ul30vt.home> (raw)
In-Reply-To: <20140701180426.GX28164-5wv7dgnIgG8@public.gmane.org>

On Tue, 2014-07-01 at 19:04 +0100, Will Deacon wrote:
> Hi Alex,
> 
> Thanks for having a look.
> 
> On Tue, Jul 01, 2014 at 06:42:33PM +0100, Alex Williamson wrote:
> > On Tue, 2014-07-01 at 17:10 +0100, Will Deacon wrote:
> > > Some IOMMUs, such as the ARM SMMU, support two stages of translation.
> > > The idea behind such a scheme is to allow a guest operating system to
> > > use the IOMMU for DMA mappings in the first stage of translation, with
> > > the hypervisor then installing mappings in the second stage to provide
> > > isolation of the DMA to the physical range assigned to that virtual
> > > machine.
> > > 
> > > In order to allow IOMMU domains to be allocated for second-stage
> > > translation, this patch extends iommu_domain_alloc (and the associated
> > > ->domain_init callback on struct iommu) to take a type parameter
> > > indicating the intended purpose for the domain. The only supported types
> > > at present are IOMMU_DOMAIN_DMA (i.e. what we have at the moment) and
> > > IOMMU_DOMAIN_HYP, which instructs the backend driver to allocate and
> > > initialise a second-stage domain, if possible.
> > > 
> > > All IOMMU drivers are updated to take the new type parameter, but it is
> > > ignored at present. All callers of iommu_domain_alloc are also updated
> > > to pass IOMMU_DOMAIN_DMA as the type parameter, apart from
> > > kvm_iommu_map_guest, which passes the new IOMMU_DOMAIN_HYP flag.
> > > 
> > > Finally, a new IOMMU capability, IOMMU_CAP_HYP_MAPPING, is added so that
> > > it is possible to check whether or not a domain is able to make use of
> > > nested translation.
> > 
> > Why is this necessarily related to HYPervisor use?  It seems like this
> > new domain type is effectively just a normal domain that supports some
> > sort of fault handler that can call out to attempt to create missing
> > mappings.
> 
> Not quite. The idea of this domain is that it provides isolation for a
> guest, so I'd actually expect these domains to contain pinned mappings most
> of the time (handling guest faults in the hypervisor is pretty error-prone).

Hmm, that's exactly what an existing IOMMU domain does and how we use it
through QEMU and VFIO today.

> Perhaps if I explain how the ARM SMMU works, that might help (and if it
> doesn't, please reply saying so :). The ARM SMMU supports two stages of
> translation:
> 
>   Stage-1: Guest VA (VA) -> Guest PA (IPA, or intermediate physical address)
>   Stage-2: IPA -> Host Physical Address (PA)
> 
> These can be glued together to form nested translation, where an incoming VA
> is translated through both stages to get a PA. Page table walks triggered at
> stage-1 expect to see IPAs for the table addresses.
> 
> An important thing to note here is that the hardware is configured
> differently at each stage; the page table formats themselves are slightly
> different (e.g. restricted permissions at stage-2) and certain hardware
> contexts are only capable of stage-2 translation.
> 
> The way this is supposed to work is that the KVM host would install the VFIO
> DMA mapping (ok, now I see why you don't like the name) at stage-2. This
> allows the SMMU driver to allocate a corresponding stage-1 context for the
> mapping and expose that directly to the guest as part of a virtual, stage-1-only
> SMMU. Then the guest can install its own SMMU mappings at stage-1 for
> contiguous DMA (in the guest VA space) without any knowledge of the hypervisor
> mapping.
> 
> To do this successfully, we need to communicate the intention of the mapping
> to the SMMU driver (i.e. stage-1 vs stage-2) at domain initialisation time.
> I could just add ARM SMMU-specific properties there, but I thought this
> might potentially be useful to others.

If I understand, this is just nesting or two-dimensional paging support
and you need the lowest level translation to be setup differently
because the context entries for a stage-1 vs stage-2 page table are
different, right?  Is it then the case that you can also support a
traditional one-dimensional configuration where the SMMU is not exposed
and you'll likely have some QEMU switch to configure which will be used
for a given guest?

The code in 3/3 identifies this as a nested capable domain, so I'm not
sure why we'd want to make the assumption that it's for hypervisor use
when a hypervisor can use either type.

> > IOMMUs supporting PCI PRI (Page Request Interface) could
> > potentially make use of something like that on bare metal or under
> > hypervisor control.  If that's true, then could this be some sort of
> > iommu_domain_set_l2_handler() that happens after the domain is
> > allocated?
> 
> I'm not sure that's what I was aiming for... see above.
> 
> > For this patch, I don't understand why legacy KVM assignment would
> > allocate a HYP domain while VFIO would use a DMA domain.  It seems like
> > you're just counting on x86 never making the distinction between the
> > two.
> 
> That's true, but I was also trying to indicate the intention of the mapping
> so that other IOMMUs could potentially make use of the flags.

I find that defining an API on intentions is rarely a good idea.  Stick
with "what does this actually do".  In this case, I think we want an
interface that either specifies that the domain being allocated is
nesting capable or we want to be able to set a nesting attribute on a
domain, which may be possible after it's been allocated for a small
window before other operations have been done.

If we go the route of defining it at allocation time, I'd probably think
about adding a new interface, like:

iommu_domain_alloc_with_features(struct bus_type *bus, u32 features)

Where features is a bitmap

#define IOMMU_DOMAIN_FEATURE_NESTING (1U << 0)

iommu_domain_alloc(struct bus_type *bus) would then just be a wrapper
with features = 0.  If an IOMMU driver doesn't support a requested
feature, it should fail so we don't have cases like KVM requesting a
"HYP" domain when it doesn't need it and the IOMMU drivers don't support
it.

It would be a lot less intrusive if we could use iommu_domain_set_attr()
to enable nesting after allocation though.  This could return error if
called after the point at which it can be easily enabled.

> > > --- a/include/linux/iommu.h
> > > +++ b/include/linux/iommu.h
> > > @@ -49,6 +49,10 @@ struct iommu_domain_geometry {
> > >  	bool force_aperture;       /* DMA only allowed in mappable range? */
> > >  };
> > >  
> > > +/* iommu domain types */
> > > +#define IOMMU_DOMAIN_DMA	0x0
> > > +#define IOMMU_DOMAIN_HYP	0x1
> > > +
> > >  struct iommu_domain {
> > >  	struct iommu_ops *ops;
> > >  	void *priv;
> > > @@ -59,6 +63,7 @@ struct iommu_domain {
> > >  
> > >  #define IOMMU_CAP_CACHE_COHERENCY	0x1
> > >  #define IOMMU_CAP_INTR_REMAP		0x2	/* isolates device intrs */
> > > +#define IOMMU_CAP_HYP_MAPPING		0x3	/* isolates guest DMA */
> > 
> > This makes no sense, it's exactly what we do with a "DMA" domain.  I
> > think the code needs to focus on what is really different about this
> > domain, not what is the expected use case.  Thanks,
> 
> The use-case is certainly relevant, though. I can do device passthrough
> with a stage-1 mapping for example, but you wouldn't then be able to
> instantiate a virtual SMMU interface in the guest.

I think you've switched the definition of stage-1 here to what we
consider a normal IOMMU domain mapping today, so I'm going to suggest
that stage-1 vs stage-2 where which is which depends on the type of
domain allocated is way too confusing.

> I could rename these IOMMU_CAP_STAGE{1,2}, but that then sounds very
> ARM-specific. What do you think?

Very much so, IOMMU_CAP_NESTING, DOMAIN_ATTR_NESTING?  Thanks,

Alex

next prev parent reply	other threads:[~2014-07-01 19:28 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-01 16:10 [RFC PATCH 1/3] iommu: introduce IOMMU_DOMAIN_HYP domain type for hypervisor allocation Will Deacon
     [not found] ` <1404231017-10856-1-git-send-email-will.deacon-5wv7dgnIgG8@public.gmane.org>
2014-07-01 16:10   ` [RFC PATCH 2/3] vfio/iommu_type1: add new VFIO_TYPE1_HYP_IOMMU IOMMU type Will Deacon
2014-07-01 16:10   ` [RFC PATCH 3/3] iommu/arm-smmu: add support for IOMMU_DOMAIN_HYP flag Will Deacon
2014-07-01 17:42   ` [RFC PATCH 1/3] iommu: introduce IOMMU_DOMAIN_HYP domain type for hypervisor allocation Alex Williamson
     [not found]     ` <1404236553.3225.93.camel-85EaTFmN5p//9pzu0YdTqQ@public.gmane.org>
2014-07-01 18:04       ` Will Deacon
     [not found]         ` <20140701180426.GX28164-5wv7dgnIgG8@public.gmane.org>
2014-07-01 19:28           ` Alex Williamson [this message]
     [not found]             ` <1404242891.3225.144.camel-85EaTFmN5p//9pzu0YdTqQ@public.gmane.org>
2014-07-02 10:49               ` Will Deacon
     [not found]                 ` <20140702104902.GH18731-5wv7dgnIgG8@public.gmane.org>
2014-07-02 13:57                   ` Will Deacon
     [not found]                     ` <20140702135742.GC24879-5wv7dgnIgG8@public.gmane.org>
2014-07-02 15:04                       ` Alex Williamson
     [not found]                         ` <1404313455.1862.34.camel-85EaTFmN5p//9pzu0YdTqQ@public.gmane.org>
2014-07-02 18:57                           ` Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1404242891.3225.144.camel@ul30vt.home \
    --to=alex.williamson-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
    --cc=iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=will.deacon-5wv7dgnIgG8@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).