From: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Varun Sethi <Varun.Sethi-KZfg59tc24xl57MIdRCFDg@public.gmane.org>
Cc: "iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org"
<iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>,
"linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [RFC PATCH] vfio/iommu_type1: Multi-IOMMU domain support
Date: Mon, 20 Jan 2014 11:39:13 -0700 [thread overview]
Message-ID: <1390243153.8705.228.camel@bling.home> (raw)
In-Reply-To: <f4597c60b5a04e8e8baa225a279abd0d-AZ66ij2kwaacCcN9WK45f+O6mTEJWrR4XA4E9RH9d+qIuWR1G4zioA@public.gmane.org>
On Mon, 2014-01-20 at 18:30 +0000, Varun Sethi wrote:
>
> > -----Original Message-----
> > From: Alex Williamson [mailto:alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org]
> > Sent: Monday, January 20, 2014 9:51 PM
> > To: Sethi Varun-B16395
> > Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org; linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > Subject: Re: [RFC PATCH] vfio/iommu_type1: Multi-IOMMU domain support
> >
> > On Mon, 2014-01-20 at 14:45 +0000, Varun Sethi wrote:
> > >
> > > > -----Original Message-----
> > > > From: Alex Williamson [mailto:alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org]
> > > > Sent: Saturday, January 18, 2014 2:06 AM
> > > > To: Sethi Varun-B16395
> > > > Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org; linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > > Subject: [RFC PATCH] vfio/iommu_type1: Multi-IOMMU domain support
> > > >
> > > > RFC: This is not complete but I want to share with Varun the
> > > > dirrection I'm thinking about. In particular, I'm really not sure
> > > > if we want to introduce a "v2" interface version with slightly
> > > > different unmap semantics. QEMU doesn't care about the difference,
> > > > but other users might. Be warned, I'm not even sure if this code
> > works at the moment.
> > > > Thanks,
> > > >
> > > > Alex
> > > >
> > > >
> > > > We currently have a problem that we cannot support advanced features
> > > > of an IOMMU domain (ex. IOMMU_CACHE), because we have no guarantee
> > > > that those features will be supported by all of the hardware units
> > > > involved with the domain over its lifetime. For instance, the Intel
> > > > VT-d architecture does not require that all DRHDs support snoop
> > > > control. If we create a domain based on a device behind a DRHD that
> > > > does support snoop control and enable SNP support via the
> > > > IOMMU_CACHE mapping option, we cannot then add a device behind a
> > > > DRHD which does not support snoop control or we'll get reserved bit
> > > > faults from the SNP bit in the pagetables. To add to the
> > > > complexity, we can't know the properties of a domain until a device
> > is attached.
> > > [Sethi Varun-B16395] Effectively, it's the same iommu and iommu_ops
> > > are common across all bus types. The hardware feature differences are
> > > abstracted by the driver.
> >
> > That's a simplifying assumption that is not made anywhere else in the
> > code. The IOMMU API allows entirely independent IOMMU drivers to
> > register per bus_type. There is no guarantee that all devices are backed
> > by the same IOMMU hardware unit or make use of the same iommu_ops.
> >
> [Sethi Varun-B16395] ok
>
> > > > We could pass this problem off to userspace and require that a
> > > > separate vfio container be used, but we don't know how to handle
> > > > page accounting in that case. How do we know that a page pinned in
> > > > one container is the same page as a different container and avoid
> > > > double billing the user for the page.
> > > >
> > > > The solution is therefore to support multiple IOMMU domains per
> > > > container. In the majority of cases, only one domain will be
> > > > required since hardware is typically consistent within a system.
> > > > However, this provides us the ability to validate compatibility of
> > > > domains and support mixed environments where page table flags can be
> > > > different between domains.
> > > >
> > > > To do this, our DMA tracking needs to change. We currently try to
> > > > coalesce user mappings into as few tracking entries as possible.
> > > > The problem then becomes that we lose granularity of user mappings.
> > > > We've never guaranteed that a user is able to unmap at a finer
> > > > granularity than the original mapping, but we must honor the
> > > > granularity of the original mapping. This coalescing code is
> > > > therefore removed, allowing only unmaps covering complete maps. The
> > > > change in accounting is fairly small here, a typical QEMU VM will
> > > > start out with roughly a dozen entries, so it's arguable if this
> > coalescing was ever needed.
> > > >
> > > > We also move IOMMU domain creation to the point where a group is
> > > > attached to the container. An interesting side-effect of this is
> > > > that we now have access to the device at the time of domain creation
> > > > and can probe the devices within the group to determine the bus_type.
> > > > This finally makes vfio_iommu_type1 completely device/bus agnostic.
> > > > In fact, each IOMMU domain can host devices on different buses
> > > > managed by different physical IOMMUs, and present a single DMA
> > > > mapping interface to the user. When a new domain is created,
> > > > mappings are replayed to bring the IOMMU pagetables up to the state
> > > > of the current container. And of course, DMA mapping and unmapping
> > > > automatically traverse all of the configured IOMMU domains.
> > > >
> > > [Sethi Varun-B16395] This code still checks to see that devices being
> > > attached to the domain are connected to the same bus type. If we
> > > intend to merge devices from different bus types but attached to
> > > compatible domains in to a single domain, why can't we avoid the bus
> > > check? Why can't we remove the bus dependency from domain allocation?
> >
> > So if I were to test iommu_ops instead of bus_type (ie. assume that if a
> > if an IOMMU driver manages iommu_ops across bus_types that it can accept
> > the devices), would that satisfy your concern?
> [Sethi Varun-B16395] I think so. Checking for iommu_ops should allow iommu groups from different bus_types, to share a domain.
>
> >
> > It may be possible to remove the bus_type dependency from domain
> > allocation, but the IOMMU API currently makes the assumption that there's
> > one IOMMU driver per bus_type.
> [Sethi Varun-B16395] Is that a valid assumption?
Perhaps it's really more of a requirement than an assumption.
Theoretically there is no reason we couldn't see a system with multiple
IOMMUs requiring different IOMMU drivers on the same bus_type. In
practice, we don't. We may need to change this in the future, but it's
sufficient for now.
> > Your fix to remove the bus_type
> > dependency from iommu_domain_alloc() adds an assumption that there is
> > only one IOMMU driver for all bus_types. That may work on your platform,
> > but I don't think it's a valid assumption in the general case.
> [Sethi Varun-B16395] ok
>
> > If you'd like to propose alternative ways to remove the bus_type
> > dependency, please do. Thanks,
> >
> [Sethi Varun-B16395] My main concern, was to allow devices from different bus types, to share the iommu domain. I am fine if this can be handled from within vfio.
Ok, I think it can. Thanks,
Alex
WARNING: multiple messages have this Message-ID (diff)
From: Alex Williamson <alex.williamson@redhat.com>
To: Varun Sethi <Varun.Sethi@freescale.com>
Cc: "iommu@lists.linux-foundation.org"
<iommu@lists.linux-foundation.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH] vfio/iommu_type1: Multi-IOMMU domain support
Date: Mon, 20 Jan 2014 11:39:13 -0700 [thread overview]
Message-ID: <1390243153.8705.228.camel@bling.home> (raw)
In-Reply-To: <f4597c60b5a04e8e8baa225a279abd0d@BL2PR03MB468.namprd03.prod.outlook.com>
On Mon, 2014-01-20 at 18:30 +0000, Varun Sethi wrote:
>
> > -----Original Message-----
> > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > Sent: Monday, January 20, 2014 9:51 PM
> > To: Sethi Varun-B16395
> > Cc: iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org
> > Subject: Re: [RFC PATCH] vfio/iommu_type1: Multi-IOMMU domain support
> >
> > On Mon, 2014-01-20 at 14:45 +0000, Varun Sethi wrote:
> > >
> > > > -----Original Message-----
> > > > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > > > Sent: Saturday, January 18, 2014 2:06 AM
> > > > To: Sethi Varun-B16395
> > > > Cc: iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org
> > > > Subject: [RFC PATCH] vfio/iommu_type1: Multi-IOMMU domain support
> > > >
> > > > RFC: This is not complete but I want to share with Varun the
> > > > dirrection I'm thinking about. In particular, I'm really not sure
> > > > if we want to introduce a "v2" interface version with slightly
> > > > different unmap semantics. QEMU doesn't care about the difference,
> > > > but other users might. Be warned, I'm not even sure if this code
> > works at the moment.
> > > > Thanks,
> > > >
> > > > Alex
> > > >
> > > >
> > > > We currently have a problem that we cannot support advanced features
> > > > of an IOMMU domain (ex. IOMMU_CACHE), because we have no guarantee
> > > > that those features will be supported by all of the hardware units
> > > > involved with the domain over its lifetime. For instance, the Intel
> > > > VT-d architecture does not require that all DRHDs support snoop
> > > > control. If we create a domain based on a device behind a DRHD that
> > > > does support snoop control and enable SNP support via the
> > > > IOMMU_CACHE mapping option, we cannot then add a device behind a
> > > > DRHD which does not support snoop control or we'll get reserved bit
> > > > faults from the SNP bit in the pagetables. To add to the
> > > > complexity, we can't know the properties of a domain until a device
> > is attached.
> > > [Sethi Varun-B16395] Effectively, it's the same iommu and iommu_ops
> > > are common across all bus types. The hardware feature differences are
> > > abstracted by the driver.
> >
> > That's a simplifying assumption that is not made anywhere else in the
> > code. The IOMMU API allows entirely independent IOMMU drivers to
> > register per bus_type. There is no guarantee that all devices are backed
> > by the same IOMMU hardware unit or make use of the same iommu_ops.
> >
> [Sethi Varun-B16395] ok
>
> > > > We could pass this problem off to userspace and require that a
> > > > separate vfio container be used, but we don't know how to handle
> > > > page accounting in that case. How do we know that a page pinned in
> > > > one container is the same page as a different container and avoid
> > > > double billing the user for the page.
> > > >
> > > > The solution is therefore to support multiple IOMMU domains per
> > > > container. In the majority of cases, only one domain will be
> > > > required since hardware is typically consistent within a system.
> > > > However, this provides us the ability to validate compatibility of
> > > > domains and support mixed environments where page table flags can be
> > > > different between domains.
> > > >
> > > > To do this, our DMA tracking needs to change. We currently try to
> > > > coalesce user mappings into as few tracking entries as possible.
> > > > The problem then becomes that we lose granularity of user mappings.
> > > > We've never guaranteed that a user is able to unmap at a finer
> > > > granularity than the original mapping, but we must honor the
> > > > granularity of the original mapping. This coalescing code is
> > > > therefore removed, allowing only unmaps covering complete maps. The
> > > > change in accounting is fairly small here, a typical QEMU VM will
> > > > start out with roughly a dozen entries, so it's arguable if this
> > coalescing was ever needed.
> > > >
> > > > We also move IOMMU domain creation to the point where a group is
> > > > attached to the container. An interesting side-effect of this is
> > > > that we now have access to the device at the time of domain creation
> > > > and can probe the devices within the group to determine the bus_type.
> > > > This finally makes vfio_iommu_type1 completely device/bus agnostic.
> > > > In fact, each IOMMU domain can host devices on different buses
> > > > managed by different physical IOMMUs, and present a single DMA
> > > > mapping interface to the user. When a new domain is created,
> > > > mappings are replayed to bring the IOMMU pagetables up to the state
> > > > of the current container. And of course, DMA mapping and unmapping
> > > > automatically traverse all of the configured IOMMU domains.
> > > >
> > > [Sethi Varun-B16395] This code still checks to see that devices being
> > > attached to the domain are connected to the same bus type. If we
> > > intend to merge devices from different bus types but attached to
> > > compatible domains in to a single domain, why can't we avoid the bus
> > > check? Why can't we remove the bus dependency from domain allocation?
> >
> > So if I were to test iommu_ops instead of bus_type (ie. assume that if a
> > if an IOMMU driver manages iommu_ops across bus_types that it can accept
> > the devices), would that satisfy your concern?
> [Sethi Varun-B16395] I think so. Checking for iommu_ops should allow iommu groups from different bus_types, to share a domain.
>
> >
> > It may be possible to remove the bus_type dependency from domain
> > allocation, but the IOMMU API currently makes the assumption that there's
> > one IOMMU driver per bus_type.
> [Sethi Varun-B16395] Is that a valid assumption?
Perhaps it's really more of a requirement than an assumption.
Theoretically there is no reason we couldn't see a system with multiple
IOMMUs requiring different IOMMU drivers on the same bus_type. In
practice, we don't. We may need to change this in the future, but it's
sufficient for now.
> > Your fix to remove the bus_type
> > dependency from iommu_domain_alloc() adds an assumption that there is
> > only one IOMMU driver for all bus_types. That may work on your platform,
> > but I don't think it's a valid assumption in the general case.
> [Sethi Varun-B16395] ok
>
> > If you'd like to propose alternative ways to remove the bus_type
> > dependency, please do. Thanks,
> >
> [Sethi Varun-B16395] My main concern, was to allow devices from different bus types, to share the iommu domain. I am fine if this can be handled from within vfio.
Ok, I think it can. Thanks,
Alex
next prev parent reply other threads:[~2014-01-20 18:39 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-17 20:36 [RFC PATCH] vfio/iommu_type1: Multi-IOMMU domain support Alex Williamson
2014-01-17 20:36 ` Alex Williamson
[not found] ` <20140117203126.11429.25235.stgit-GCcqpEzw8uZBDLzU/O5InQ@public.gmane.org>
2014-01-20 14:45 ` Varun Sethi
2014-01-20 14:45 ` Varun Sethi
[not found] ` <4bc6dcb96df44b0e94152d9729958d60-AZ66ij2kwaacCcN9WK45f+O6mTEJWrR4XA4E9RH9d+qIuWR1G4zioA@public.gmane.org>
2014-01-20 16:21 ` Alex Williamson
2014-01-20 16:21 ` Alex Williamson
[not found] ` <1390234886.8705.142.camel-xdHQ/5r00wBBDLzU/O5InQ@public.gmane.org>
2014-01-20 18:30 ` Varun Sethi
2014-01-20 18:30 ` Varun Sethi
[not found] ` <f4597c60b5a04e8e8baa225a279abd0d-AZ66ij2kwaacCcN9WK45f+O6mTEJWrR4XA4E9RH9d+qIuWR1G4zioA@public.gmane.org>
2014-01-20 18:39 ` Alex Williamson [this message]
2014-01-20 18:39 ` Alex Williamson
2014-01-27 0:19 ` Kai Huang
2014-01-27 0:19 ` Kai Huang
[not found] ` <CAOtp4Kp0-yZBd_P82CV8-VKPmsTByP8pn_C2T9ctbP3oOke3LA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-01-27 8:16 ` Varun Sethi
2014-01-27 8:16 ` Varun Sethi
2014-01-27 21:17 ` Don Dutile
2014-01-27 21:17 ` Don Dutile
[not found] ` <52E6CCE7.4090708-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-01-27 21:36 ` Alex Williamson
2014-01-27 21:36 ` Alex Williamson
2014-01-21 7:27 ` Bharat.Bhushan-KZfg59tc24xl57MIdRCFDg
2014-01-21 7:27 ` Bharat.Bhushan
[not found] ` <4b9118cb4cc744bdb2e9c25fa32d6950-AZ66ij2kwaYQSfqzDfxepuO6mTEJWrR4XA4E9RH9d+qIuWR1G4zioA@public.gmane.org>
2014-01-21 13:35 ` Alex Williamson
2014-01-21 13:35 ` Alex Williamson
[not found] ` <1390311348.8705.265.camel-xdHQ/5r00wBBDLzU/O5InQ@public.gmane.org>
2014-01-21 16:46 ` Bharat.Bhushan-KZfg59tc24xl57MIdRCFDg
2014-01-21 16:46 ` Bharat.Bhushan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1390243153.8705.228.camel@bling.home \
--to=alex.williamson-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
--cc=Varun.Sethi-KZfg59tc24xl57MIdRCFDg@public.gmane.org \
--cc=iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.