From: Knut Omang <knut.omang@oracle.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>,
David Woodhouse <dwmw2@infradead.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"David S. Miller" <davem@davemloft.net>,
sparclinux@vger.kernel.org, Joerg Roedel <jroedel@suse.de>,
Christian Borntraeger <borntraeger@de.ibm.com>,
Cornelia Huck <cornelia.huck@de.ibm.com>,
Sebastian Ott <sebott@linux.vnet.ibm.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Christoph Hellwig <hch@lst.de>, KVM <kvm@vger.kernel.org>,
Martin Schwidefsky <schwidefsky@de.ibm.com>,
linux-s390 <linux-s390@vger.kernel.org>,
Linux Virtualization <virtualization@lists.linux-foundation.org>,
"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [PATCH v4 0/6] virtio core DMA API conversion
Date: Tue, 10 Nov 2015 10:45:14 +0100 [thread overview]
Message-ID: <1447148714.3005.133.camel@oracle.com> (raw)
In-Reply-To: <1447121076.31884.61.camel@kernel.crashing.org>
On Tue, 2015-11-10 at 13:04 +1100, Benjamin Herrenschmidt wrote:
> On Mon, 2015-11-09 at 16:46 -0800, Andy Lutomirski wrote:
> > The problem here is that in some of the problematic cases the
> > virtio
> > driver may not even be loaded. If someone runs an L1 guest with an
> > IOMMU-bypassing virtio device and assigns it to L2 using vfio, then
> > *boom* L1 crashes. (Same if, say, DPDK gets used, I think.)
> >
> > >
> > > The only way out of this while keeping the "platform" stuff would
> > > be to
> > > also bump some kind of version in the virtio config (or PCI
> > > header). I
> > > have no other way to differenciate between "this is an old qemu
> > > that
> > > doesn't do the 'bypass property' yet" from "this is a virtio
> > > device
> > > that doesn't bypass".
> > >
> > > Any better idea ?
> >
> > I'd suggest that, in the absence of the new DT binding, we assume
> > that
> > any PCI device with the virtio vendor ID is passthrough on powerpc.
> > I
> > can do this in the virtio driver, but if it's in the platform code
> > then vfio gets it right too (i.e. fails to load).
>
> The problem is there isn't *a* virtio vendor ID. It's the RedHat
> vendor
> ID which will be used by more than just virtio, so we need to
> specifically list the devices.
>
> Additionally, that still means that once we have a virtio device that
> actually uses the iommu, powerpc will not work since the "workaround"
> above will kick in.
>
> The "in absence of the new DT binding" doesn't make that much sense.
>
> Those platforms use device-trees defined since the dawn of ages by
> actual open firmware implementations, they either have no iommu
> representation in there (Macs, the platform code hooks it all up) or
> have various properties related to the iommu but no concept of
> "bypass"
> in there.
>
> We can *add* a new property under some circumstances that indicates a
> bypass on a per-device basis, however that doesn't completely solve
> it:
>
> - As I said above, what does the absence of that property mean ? An
> old qemu that does bypass on all virtio or a new qemu trying to tell
> you that the virtio device actually does use the iommu (or some other
> environment that isn't qemu) ?
>
> - On things like macs, the device-tree is generated by openbios, it
> would have to have some added logic to try to figure that out, which
> means it needs to know *via different means* that some or all virtio
> devices bypass the iommu.
>
> I thus go back to my original statement, it's a LOT easier to handle
> if
> the device itself is self describing, indicating whether it is set to
> bypass a host iommu or not. For L1->L2, well, that wouldn't be the
> first time qemu/VFIO plays tricks with the passed through device
> configuration space...
>
> Note that the above can be solved via some kind of compromise: The
> device self describes the ability to honor the iommu, along with the
> property (or ACPI table entry) that indicates whether or not it does.
>
> IE. We could use the revision or ProgIf field of the config space for
> example. Or something in virtio config. If it's an "old" device, we
> know it always bypass. If it's a new device, we know it only bypasses
> if the corresponding property is in. I still would have to sort out
> the
> openbios case for mac among others but it's at least a workable
> direction.
>
> BTW. Don't you have a similar problem on x86 that today qemu claims
> that everything honors the iommu in ACPI ?
>
> Unless somebody can come up with a better idea...
Can something be done by means of PCIe capabilities?
ATS (Address Translation Support) seems like a natural choice?
Knut
> Cheers,
> Ben.
>
> --
> To unsubscribe from this list: send the line "unsubscribe sparclinux"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2015-11-10 9:45 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-30 1:09 [PATCH v4 0/6] virtio core DMA API conversion Andy Lutomirski
2015-10-30 1:09 ` [PATCH v4 1/6] virtio-net: Stop doing DMA from the stack Andy Lutomirski
2015-10-30 13:55 ` Christian Borntraeger
2015-10-31 5:02 ` Andy Lutomirski
2015-10-30 1:09 ` [PATCH v4 2/6] virtio_ring: Support DMA APIs Andy Lutomirski
2015-10-30 12:01 ` Cornelia Huck
2015-10-30 12:05 ` Christian Borntraeger
2015-10-30 18:51 ` Andy Lutomirski
2015-10-30 1:09 ` [PATCH v4 3/6] virtio_pci: Use the DMA API Andy Lutomirski
2015-10-30 1:09 ` [PATCH v4 4/6] virtio: Add improved queue allocation API Andy Lutomirski
2015-10-30 1:09 ` [PATCH v4 5/6] virtio_mmio: Use the DMA API Andy Lutomirski
2015-10-30 1:09 ` [PATCH v4 6/6] virtio_pci: " Andy Lutomirski
2015-10-30 1:17 ` [PATCH v4 0/6] virtio core DMA API conversion Andy Lutomirski
2015-10-30 9:57 ` Christian Borntraeger
2015-11-09 12:15 ` Michael S. Tsirkin
2015-11-09 12:27 ` Paolo Bonzini
2015-11-09 22:58 ` Benjamin Herrenschmidt
2015-11-10 0:46 ` Andy Lutomirski
2015-11-10 2:04 ` Benjamin Herrenschmidt
2015-11-10 2:18 ` Andy Lutomirski
2015-11-10 5:26 ` Benjamin Herrenschmidt
2015-11-10 5:33 ` Andy Lutomirski
2015-11-10 5:28 ` Benjamin Herrenschmidt
2015-11-10 5:35 ` Andy Lutomirski
2015-11-10 10:37 ` Benjamin Herrenschmidt
2015-11-10 12:43 ` Michael S. Tsirkin
2015-11-10 19:37 ` Benjamin Herrenschmidt
2015-11-10 18:54 ` Andy Lutomirski
2015-11-10 22:27 ` Benjamin Herrenschmidt
2015-11-10 23:44 ` Andy Lutomirski
2015-11-11 0:44 ` Benjamin Herrenschmidt
2015-11-11 4:46 ` Andy Lutomirski
2015-11-11 5:08 ` Benjamin Herrenschmidt
2015-11-10 7:28 ` Jan Kiszka
2015-11-10 9:45 ` Knut Omang [this message]
2015-11-10 10:26 ` Benjamin Herrenschmidt
2015-11-10 10:27 ` Joerg Roedel
2015-11-10 19:36 ` Benjamin Herrenschmidt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1447148714.3005.133.camel@oracle.com \
--to=knut.omang@oracle.com \
--cc=benh@kernel.crashing.org \
--cc=borntraeger@de.ibm.com \
--cc=cornelia.huck@de.ibm.com \
--cc=davem@davemloft.net \
--cc=dwmw2@infradead.org \
--cc=hch@lst.de \
--cc=jroedel@suse.de \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=luto@kernel.org \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=schwidefsky@de.ibm.com \
--cc=sebott@linux.vnet.ibm.com \
--cc=sparclinux@vger.kernel.org \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).