From: Christopher Covington <cov@codeaurora.org>
To: Andy Lutomirski <luto@amacapital.net>
Cc: virtio-dev@lists.oasis-open.org,
"linux-s390@vger.kernel.org" <linux-s390@vger.kernel.org>,
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Linux Virtualization <virtualization@lists.linux-foundation.org>,
Christian Borntraeger <borntraeger@de.ibm.com>,
Paolo Bonzini <pbonzini@redhat.com>,
"linux390@de.ibm.com" <linux390@de.ibm.com>
Subject: Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
Date: Wed, 10 Sep 2014 11:36:51 -0400 [thread overview]
Message-ID: <54107013.2030701@codeaurora.org> (raw)
In-Reply-To: <CALCETrX7feY6f-iVEgdxp-YPNNPctRjQr=KyByyPRgiW1xKUdA@mail.gmail.com>
On 09/04/2014 10:57 PM, Andy Lutomirski wrote:
> On Thu, Sep 4, 2014 at 7:31 PM, Rusty Russell <rusty@rustcorp.com.au> wrote:
>> Andy Lutomirski <luto@amacapital.net> writes:
>>> On Sep 2, 2014 11:53 PM, "Rusty Russell" <rusty@rustcorp.com.au> wrote:
>>>>
>>>> Andy Lutomirski <luto@amacapital.net> writes:
>>>>> There really are virtio devices that are pieces of silicon and not
>>>>> figments of a hypervisor's imagination [1].
>>>>
>>>> Hi Andy,
>>>>
>>>> As you're discovering, there's a reason no one has done the DMA
>>>> API before.
>>>>
>>>> So the problem is that ppc64's IOMMU is a platform thing, not a bus
>>>> thing. They really do carve out an exception for virtio devices,
>>>> because performance (LOTS of performance). It remains to be seen if
>>>> other platforms have the same performance issues, but in absence of
>>>> other evidence, the answer is yes.
>>>>
>>>> It's a hack. But having specific virtual-only devices are an even
>>>> bigger hack.
>>>>
>>>> Physical virtio devices have been talked about, but don't actually exist
>>>> in Real Life. And someone a virtio PCI card is going to have serious
>>>> performance issues: mainly because they'll want the rings in the card's
>>>> MMIO region, not allocated by the driver. Being broken on PPC is really
>>>> the least of their problems.
>>>>
>>>> So, what do we do? It'd be nice if Linux virtio Just Worked under Xen,
>>>> though Xen's IOMMU is outside the virtio spec. Since virtio_pci can be
>>>> a module, obvious hacks like having xen_arch_setup initialize a dma_ops pointer
>>>> exposed by virtio_pci.c is out.
>>>
>>> Xen does expose dma_ops. The trick is knowing when to use it.
>>>
>>>>
>>>> I think the best approach is to have a new feature bit (25 is free),
>>>> VIRTIO_F_USE_BUS_MAPPING which indicates that a device really wants to
>>>> use the mapping for the bus it is on. A real device would set this,
>>>> or it won't work behind an IOMMU. A Xen device would also set this.
>>>
>>> The devices I care about aren't actually Xen devices. They're devices
>>> supplied by QEMU/KVM, booting a Xen hypervisor, which in turn passes
>>> the virtio device (along with every other PCI device) through to dom0.
>>> So this is exactly the same virtio device that regular x86 KVM guests
>>> would see. The reason that current code fails is that Xen guest
>>> physical addresses aren't the same as the addresses seen by the outer
>>> hypervisor.
>>>
>>> These devices don't know that physical addresses != bus addresses, so
>>> they can't advertise that fact.
>>
>> Ah, I see. Then we will need a Xen-specific hack.
>>
>>> Grr. This is mostly a result of the fact that virtio_pci devices
>>> aren't really PCI devices. I still think that virtio_pci shouldn't
>>> have to worry about this; ideally this would all be handled higher up
>>> in the device hierarchy. x86 already gets this right.
>>
>> Yes. Adding a feature to say "I am a real PCI device" is possible, but
>> has other issues (particularly as Michael Tsirkin pointed out, what do
>> you do if the driver doesn't understand the feature).
>>
>>> Are there any hypervisors except PPC that use virtio_pci, have IOMMUs
>>> on the pci slot that virtio_pci lives in, and that use physical
>>> addressing? If not, I think that just quirking PPC will work (at
>>> least until someone wants IOMMU support in virtio_pci on PPC, in which
>>> case doing something using devicetree seems like a reasonable
>>> solution).
>>
>> We can either patch to make PPC weird or make Xen weird. I'm on the
>> fence.
>>
>> Two questions for Paulo:
>> 1) When QEMU support IOMMU on x86, will the virtio devices behind it
>> respect the IOMMU (do they use the right memory access primitives?).
>>
>> 2) Are we really going to be able to exclude virtio devices from using
>> the x86 IOMMU in a portable way which will always work? If it's
>> per-bus granularity, will qemu really put them on their own PCI bus
>> and get this right? Or will it sometimes get it wrong and users will
>> end up using virtio devices via IOMMU by accident?
>>
>> If the answers are both "yes", then x86 is going to be able to use
>> virtio+IOMMU, so PPC looks like the odd one out. Otherwise it looks
>> like we're really going to want to stick with the "ignore IOMMU" rule
>> until (handwave future), and we make an exception for Xen.
>
> There's a third option: try to make virtio-mmio work everywhere
> (except s390), at least in the long run. This other benefits: it
> makes minimal hypervisors simpler, I think it'll get rid of the limits
> on the number of virtio devices in a system. ARM is already going
> this direction, and I imagine that PPC support would be
> straightforward (it's already using devicetree).
In my opinion, a uniform "virt" machine for every instruction set would be
very beneficial. I would guess that MMIO is more universally available than
PCI, and as you point out, simpler to implement.
> Does virtio-mmio have any reasonable way of doing hotplug? It could
> also eventually make sense to have a standard for virtio on virtio.
I don't think so, but it seems possible. My bystander understanding is that
QEMU allocates some fixed number of VirtIO-MMIO devices, maybe a dozen, in the
device tree. The ones that don't actually get hooked up to something real like
a block device or network interface are populated with a dummy device. One
naive approach might be to allow the dummy devices to tell the kernel that
they are now changing to a real device.
Also, higher level hotplug for at least SCSI sounds possible.
https://bugzilla.redhat.com/show_bug.cgi?id=1123390
Christopher
--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by the Linux Foundation.
next prev parent reply other threads:[~2014-09-10 15:36 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-01 17:39 [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API Andy Lutomirski
2014-09-01 17:39 ` [PATCH v4 1/4] virtio_ring: Support DMA APIs if requested Andy Lutomirski
2014-09-01 17:39 ` [PATCH v4 2/4] virtio_pci: Use the DMA API for virtqueues Andy Lutomirski
2014-09-01 17:39 ` [PATCH v4 3/4] virtio_net: Don't set the end flag on reusable sg entries Andy Lutomirski
2014-09-01 17:39 ` [PATCH v4 4/4] virtio_net: Stop doing DMA from the stack Andy Lutomirski
2014-09-01 22:16 ` [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API Benjamin Herrenschmidt
2014-09-02 5:55 ` Andy Lutomirski
2014-09-02 20:53 ` Benjamin Herrenschmidt
2014-09-02 20:56 ` Konrad Rzeszutek Wilk
2014-09-02 21:08 ` Benjamin Herrenschmidt
2014-09-02 21:37 ` Andy Lutomirski
2014-09-02 22:10 ` Benjamin Herrenschmidt
2014-09-02 23:11 ` Andy Lutomirski
2014-09-02 23:20 ` Benjamin Herrenschmidt
2014-09-02 23:42 ` Andy Lutomirski
2014-09-03 0:25 ` Benjamin Herrenschmidt
2014-09-03 0:32 ` Andy Lutomirski
2014-09-03 0:43 ` Benjamin Herrenschmidt
2014-09-04 2:03 ` Andy Lutomirski
2014-09-03 7:47 ` Paolo Bonzini
2014-09-03 7:52 ` Andy Lutomirski
2014-09-03 8:01 ` Paolo Bonzini
2014-09-03 8:05 ` Benjamin Herrenschmidt
2014-09-03 12:11 ` Paolo Bonzini
2014-09-03 15:07 ` Andy Lutomirski
2014-09-03 15:11 ` Paolo Bonzini
2014-09-03 16:39 ` Michael S. Tsirkin
2014-09-03 20:38 ` Andy Lutomirski
2014-09-03 7:43 ` Paolo Bonzini
2014-09-03 6:42 ` Rusty Russell
2014-09-03 7:50 ` Andy Lutomirski
2014-09-05 2:31 ` Rusty Russell
2014-09-05 2:57 ` Andy Lutomirski
2014-09-05 5:20 ` Benjamin Herrenschmidt
2014-09-05 7:33 ` Christian Borntraeger
2014-09-10 15:36 ` Christopher Covington [this message]
2014-09-10 16:15 ` Andy Lutomirski
2014-09-05 5:16 ` Benjamin Herrenschmidt
2014-09-14 8:58 ` Michael S. Tsirkin
2014-09-03 12:51 ` Michael S. Tsirkin
2014-09-05 2:32 ` Rusty Russell
2014-09-05 3:06 ` Andy Lutomirski
2014-09-02 21:10 ` Michael S. Tsirkin
2014-09-02 21:49 ` Andy Lutomirski
-- strict thread matches above, loose matches on Subject: below --
2015-07-28 1:08 Andy Lutomirski
2015-07-28 7:05 ` Christian Borntraeger
2015-07-28 8:16 ` Paolo Bonzini
2015-07-28 10:12 ` Benjamin Herrenschmidt
2015-07-28 12:46 ` Paolo Bonzini
2015-07-28 13:06 ` Michael S. Tsirkin
2015-07-28 13:11 ` Jan Kiszka
2015-07-28 16:11 ` Andy Lutomirski
2015-07-28 16:44 ` Jan Kiszka
2015-07-28 17:10 ` Andy Lutomirski
2015-07-28 17:17 ` Jan Kiszka
2015-07-28 18:22 ` Andy Lutomirski
2015-07-28 19:06 ` Jan Kiszka
2015-07-28 19:24 ` Andy Lutomirski
2015-07-28 19:33 ` Jan Kiszka
2015-07-28 21:16 ` Andy Lutomirski
2015-07-28 22:43 ` Andy Lutomirski
2015-07-28 23:21 ` Benjamin Herrenschmidt
2015-07-28 23:33 ` Andy Lutomirski
2015-07-29 0:36 ` Benjamin Herrenschmidt
2015-07-29 0:47 ` Andy Lutomirski
2015-07-29 0:54 ` Benjamin Herrenschmidt
2015-07-29 8:17 ` Paolo Bonzini
2015-07-29 8:20 ` Jan Kiszka
2015-07-29 9:21 ` Benjamin Herrenschmidt
2015-07-29 8:07 ` Jan Kiszka
2015-07-28 16:36 ` Paolo Bonzini
2015-07-28 16:42 ` Jan Kiszka
2015-07-28 17:15 ` Paolo Bonzini
2015-07-28 17:19 ` Jan Kiszka
2015-07-28 17:31 ` Paolo Bonzini
2015-07-28 13:08 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54107013.2030701@codeaurora.org \
--to=cov@codeaurora.org \
--cc=benh@kernel.crashing.org \
--cc=borntraeger@de.ibm.com \
--cc=konrad.wilk@oracle.com \
--cc=linux-s390@vger.kernel.org \
--cc=linux390@de.ibm.com \
--cc=luto@amacapital.net \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=virtio-dev@lists.oasis-open.org \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).