From: Joerg Roedel <joro@8bytes.org>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: kvm@vger.kernel.org, Anthony Liguori <anthony@codemonkey.ws>,
Alex Williamson <alex.williamson@redhat.com>,
David Gibson <dwg@au1.ibm.com>, Paul Mackerras <pmac@au1.ibm.com>,
Alexey Kardashevskiy <aik@au1.ibm.com>,
"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Subject: Re: kvm PCI assignment & VFIO ramblings
Date: Fri, 5 Aug 2011 15:44:46 +0200 [thread overview]
Message-ID: <20110805134446.GC30353@8bytes.org> (raw)
In-Reply-To: <1312540958.8598.46.camel@pasglop>
On Fri, Aug 05, 2011 at 08:42:38PM +1000, Benjamin Herrenschmidt wrote:
> Right. In fact to try to clarify the problem for everybody, I think we
> can distinguish two different classes of "constraints" that can
> influence the grouping of devices:
>
> 1- Hard constraints. These are typically devices using the same RID or
> where the RID cannot be reliably guaranteed (the later is the case with
> some PCIe-PCIX bridges which will take ownership of "some" transactions
> such as split but not all). Devices like that must be in the same
> domain. This is where PowerPC adds to what x86 does today the concept
> that the domains are pre-existing, since we use the RID for error
> isolation & MMIO segmenting as well. so we need to create those domains
> at boot time.
Domains (in the iommu-sense) are created at boot time on x86 today.
Every device needs at least a domain to provide dma-mapping
functionality to the drivers. So all the grouping is done too at
boot-time. This is specific to the iommu-drivers today but can be
generalized I think.
> 2- Softer constraints. Those constraints derive from the fact that not
> applying them risks enabling the guest to create side effects outside of
> its "sandbox". To some extent, there can be "degrees" of badness between
> the various things that can cause such constraints. Examples are shared
> LSIs (since trusting DisINTx can be chancy, see earlier discussions),
> potentially any set of functions in the same device can be problematic
> due to the possibility to get backdoor access to the BARs etc...
Hmm, there is no sane way to handle such constraints in a safe way,
right? We can either blacklist devices which are know to have such
backdoors or we just ignore the problem.
> Now, what I derive from the discussion we've had so far, is that we need
> to find a proper fix for #1, but Alex and Avi seem to prefer that #2
> remains a matter of libvirt/user doing the right thing (basically
> keeping a loaded gun aimed at the user's foot with a very very very
> sweet trigger but heh, let's not start a flamewar here :-)
>
> So let's try to find a proper solution for #1 now, and leave #2 alone
> for the time being.
Yes, and the solution for #1 should be entirely in the kernel. The
question is how to do that. Probably the most sane way is to introduce a
concept of device ownership. The ownership can either be a kernel driver
or a userspace process. Giving ownership of a device to userspace is
only possible if all devices in the same group are unbound from its
respective drivers. This is a very intrusive concept, no idea if it
has a chance of acceptance :-)
But the advantage is clearly that this allows better semantics in the
IOMMU drivers and a more stable handover of devices from host drivers to
kvm guests.
> Maybe the right option is for x86 to move toward pre-existing domains
> like powerpc does, or maybe we can just expose some kind of ID.
As I said, the domains are created a iommu driver initialization time
(usually boot time). But the groups are internal to the iommu drivers
and not visible somewhere else.
> Ah you started answering to my above questions :-)
>
> We could do what you propose. It depends what we want to do with
> domains. Practically speaking, we could make domains pre-existing (with
> the ability to group several PEs into larger domains) or we could keep
> the concepts different, possibly with the limitation that on powerpc, a
> domain == a PE.
>
> I suppose we -could- make arbitrary domains on ppc as well by making the
> various PE's iommu's in HW point to the same in-memory table, but that's
> a bit nasty in practice due to the way we manage those, and it would to
> some extent increase the risk of a failing device/driver stomping on
> another one and thus taking it down with itself. IE. isolation of errors
> is an important feature for us.
These arbitrary domains exist in the iommu-api. It would be good to
emulate them on Power too. Can't you put a PE into an isolated
error-domain when something goes wrong with it? This should provide the
same isolation as before.
What you derive the group number from is your business :-) On x86 it is
certainly the best to use the RID these devices share together with the
PCI segment number.
Regards,
Joerg
next prev parent reply other threads:[~2011-08-05 13:44 UTC|newest]
Thread overview: 118+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-29 23:58 kvm PCI assignment & VFIO ramblings Benjamin Herrenschmidt
2011-07-30 18:20 ` Alex Williamson
2011-07-30 23:54 ` Benjamin Herrenschmidt
2011-08-01 18:59 ` Alex Williamson
2011-08-02 2:00 ` Benjamin Herrenschmidt
2011-07-30 23:55 ` Benjamin Herrenschmidt
2011-08-02 8:28 ` David Gibson
2011-08-02 18:14 ` Alex Williamson
2011-08-02 18:35 ` Alex Williamson
2011-08-03 2:04 ` David Gibson
2011-08-03 3:44 ` Alex Williamson
2011-08-04 0:39 ` David Gibson
2011-08-08 8:28 ` Avi Kivity
2011-08-09 23:24 ` Alex Williamson
2011-08-10 2:48 ` Benjamin Herrenschmidt
2011-08-20 16:51 ` Alex Williamson
2011-08-22 5:55 ` David Gibson
2011-08-22 15:45 ` Alex Williamson
2011-08-22 21:01 ` Benjamin Herrenschmidt
2011-08-23 19:30 ` Alex Williamson
2011-08-23 23:51 ` Benjamin Herrenschmidt
2011-08-24 3:40 ` Alexander Graf
2011-08-24 14:47 ` Alex Williamson
2011-08-24 8:43 ` Joerg Roedel
2011-08-24 14:56 ` Alex Williamson
2011-08-25 11:01 ` Roedel, Joerg
2011-08-23 2:38 ` David Gibson
2011-08-23 16:23 ` Alex Williamson
2011-08-23 23:41 ` Benjamin Herrenschmidt
2011-08-24 3:36 ` Alexander Graf
2011-08-22 6:30 ` Avi Kivity
2011-08-22 10:46 ` Joerg Roedel
2011-08-22 10:51 ` Avi Kivity
2011-08-22 12:36 ` Roedel, Joerg
2011-08-22 12:42 ` Avi Kivity
2011-08-22 12:55 ` Roedel, Joerg
2011-08-22 13:06 ` Avi Kivity
2011-08-22 13:15 ` Roedel, Joerg
2011-08-22 13:17 ` Avi Kivity
2011-08-22 14:37 ` Roedel, Joerg
2011-08-22 20:53 ` Benjamin Herrenschmidt
2011-08-22 17:25 ` Joerg Roedel
2011-08-22 19:17 ` Alex Williamson
2011-08-23 13:14 ` Roedel, Joerg
2011-08-23 17:08 ` Alex Williamson
2011-08-24 8:52 ` Roedel, Joerg
2011-08-24 15:07 ` Alex Williamson
2011-08-25 12:31 ` Roedel, Joerg
2011-08-25 13:25 ` Alexander Graf
2011-08-26 4:24 ` David Gibson
2011-08-26 9:24 ` Roedel, Joerg
2011-08-28 13:14 ` Avi Kivity
2011-08-28 13:56 ` Joerg Roedel
2011-08-28 14:04 ` Avi Kivity
2011-08-30 16:14 ` Joerg Roedel
2011-08-22 21:03 ` Benjamin Herrenschmidt
2011-08-23 13:18 ` Roedel, Joerg
2011-08-23 23:35 ` Benjamin Herrenschmidt
2011-08-24 8:53 ` Roedel, Joerg
2011-08-22 20:29 ` aafabbri
2011-08-22 20:49 ` Benjamin Herrenschmidt
2011-08-22 21:38 ` aafabbri
2011-08-22 21:49 ` Benjamin Herrenschmidt
2011-08-23 0:52 ` aafabbri
2011-08-23 6:54 ` Benjamin Herrenschmidt
2011-08-23 11:09 ` Joerg Roedel
2011-08-23 17:01 ` Alex Williamson
2011-08-23 17:33 ` Aaron Fabbri
2011-08-23 18:01 ` Alex Williamson
2011-08-24 9:10 ` Joerg Roedel
2011-08-24 21:13 ` Alex Williamson
2011-08-25 10:54 ` Roedel, Joerg
2011-08-25 15:38 ` Don Dutile
2011-08-25 16:46 ` Roedel, Joerg
2011-08-25 17:20 ` Alex Williamson
2011-08-25 18:05 ` Joerg Roedel
2011-08-26 18:04 ` Alex Williamson
2011-08-30 16:13 ` Joerg Roedel
2011-08-23 11:04 ` Joerg Roedel
2011-08-23 16:54 ` aafabbri
2011-08-24 9:14 ` Roedel, Joerg
2011-08-24 9:33 ` David Gibson
2011-08-24 11:03 ` Roedel, Joerg
2011-08-26 4:20 ` David Gibson
2011-08-26 9:33 ` Roedel, Joerg
2011-08-26 14:07 ` Alexander Graf
2011-08-26 15:24 ` Joerg Roedel
2011-08-26 15:29 ` Alexander Graf
2011-08-26 17:52 ` Aaron Fabbri
2011-08-26 19:35 ` Chris Wright
2011-08-26 20:17 ` Aaron Fabbri
2011-08-26 21:06 ` Chris Wright
2011-08-30 1:29 ` David Gibson
2011-08-04 10:35 ` Joerg Roedel
2011-07-30 22:21 ` Benjamin Herrenschmidt
2011-08-01 16:40 ` Alex Williamson
2011-08-02 1:29 ` Benjamin Herrenschmidt
2011-07-31 14:09 ` Avi Kivity
2011-08-01 20:27 ` Alex Williamson
2011-08-02 8:32 ` Avi Kivity
2011-08-04 10:41 ` Joerg Roedel
2011-08-05 10:26 ` Benjamin Herrenschmidt
2011-08-05 12:57 ` Joerg Roedel
2011-08-02 1:27 ` Benjamin Herrenschmidt
2011-08-02 9:12 ` Avi Kivity
2011-08-02 12:58 ` Benjamin Herrenschmidt
2011-08-02 13:39 ` Avi Kivity
2011-08-02 15:34 ` Alex Williamson
2011-08-02 21:29 ` Konrad Rzeszutek Wilk
2011-08-03 1:02 ` Alex Williamson
2011-08-02 14:39 ` Alex Williamson
2011-08-01 2:48 ` David Gibson
2011-08-04 10:27 ` Joerg Roedel
2011-08-05 10:42 ` Benjamin Herrenschmidt
2011-08-05 13:44 ` Joerg Roedel [this message]
2011-08-05 22:49 ` Benjamin Herrenschmidt
2011-08-05 15:10 ` Alex Williamson
2011-08-08 6:07 ` David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110805134446.GC30353@8bytes.org \
--to=joro@8bytes.org \
--cc=aik@au1.ibm.com \
--cc=alex.williamson@redhat.com \
--cc=anthony@codemonkey.ws \
--cc=benh@kernel.crashing.org \
--cc=dwg@au1.ibm.com \
--cc=kvm@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=pmac@au1.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox