virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: Christopher Covington <cov@codeaurora.org>
To: Andy Lutomirski <luto@amacapital.net>
Cc: virtio-dev@lists.oasis-open.org,
	"linux-s390@vger.kernel.org" <linux-s390@vger.kernel.org>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Linux Virtualization <virtualization@lists.linux-foundation.org>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"linux390@de.ibm.com" <linux390@de.ibm.com>
Subject: Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
Date: Wed, 10 Sep 2014 11:36:51 -0400	[thread overview]
Message-ID: <54107013.2030701@codeaurora.org> (raw)
In-Reply-To: <CALCETrX7feY6f-iVEgdxp-YPNNPctRjQr=KyByyPRgiW1xKUdA@mail.gmail.com>

On 09/04/2014 10:57 PM, Andy Lutomirski wrote:
> On Thu, Sep 4, 2014 at 7:31 PM, Rusty Russell <rusty@rustcorp.com.au> wrote:
>> Andy Lutomirski <luto@amacapital.net> writes:
>>> On Sep 2, 2014 11:53 PM, "Rusty Russell" <rusty@rustcorp.com.au> wrote:
>>>>
>>>> Andy Lutomirski <luto@amacapital.net> writes:
>>>>> There really are virtio devices that are pieces of silicon and not
>>>>> figments of a hypervisor's imagination [1].
>>>>
>>>> Hi Andy,
>>>>
>>>>         As you're discovering, there's a reason no one has done the DMA
>>>> API before.
>>>>
>>>> So the problem is that ppc64's IOMMU is a platform thing, not a bus
>>>> thing.  They really do carve out an exception for virtio devices,
>>>> because performance (LOTS of performance).  It remains to be seen if
>>>> other platforms have the same performance issues, but in absence of
>>>> other evidence, the answer is yes.
>>>>
>>>> It's a hack.  But having specific virtual-only devices are an even
>>>> bigger hack.
>>>>
>>>> Physical virtio devices have been talked about, but don't actually exist
>>>> in Real Life.  And someone a virtio PCI card is going to have serious
>>>> performance issues: mainly because they'll want the rings in the card's
>>>> MMIO region, not allocated by the driver.  Being broken on PPC is really
>>>> the least of their problems.
>>>>
>>>> So, what do we do?  It'd be nice if Linux virtio Just Worked under Xen,
>>>> though Xen's IOMMU is outside the virtio spec.  Since virtio_pci can be
>>>> a module, obvious hacks like having xen_arch_setup initialize a dma_ops pointer
>>>> exposed by virtio_pci.c is out.
>>>
>>> Xen does expose dma_ops.  The trick is knowing when to use it.
>>>
>>>>
>>>> I think the best approach is to have a new feature bit (25 is free),
>>>> VIRTIO_F_USE_BUS_MAPPING which indicates that a device really wants to
>>>> use the mapping for the bus it is on.  A real device would set this,
>>>> or it won't work behind an IOMMU.  A Xen device would also set this.
>>>
>>> The devices I care about aren't actually Xen devices.  They're devices
>>> supplied by QEMU/KVM, booting a Xen hypervisor, which in turn passes
>>> the virtio device (along with every other PCI device) through to dom0.
>>> So this is exactly the same virtio device that regular x86 KVM guests
>>> would see.  The reason that current code fails is that Xen guest
>>> physical addresses aren't the same as the addresses seen by the outer
>>> hypervisor.
>>>
>>> These devices don't know that physical addresses != bus addresses, so
>>> they can't advertise that fact.
>>
>> Ah, I see.  Then we will need a Xen-specific hack.
>>
>>> Grr.  This is mostly a result of the fact that virtio_pci devices
>>> aren't really PCI devices.  I still think that virtio_pci shouldn't
>>> have to worry about this; ideally this would all be handled higher up
>>> in the device hierarchy.  x86 already gets this right.
>>
>> Yes.  Adding a feature to say "I am a real PCI device" is possible, but
>> has other issues (particularly as Michael Tsirkin pointed out, what do
>> you do if the driver doesn't understand the feature).
>>
>>> Are there any hypervisors except PPC that use virtio_pci, have IOMMUs
>>> on the pci slot that virtio_pci lives in, and that use physical
>>> addressing?  If not, I think that just quirking PPC will work (at
>>> least until someone wants IOMMU support in virtio_pci on PPC, in which
>>> case doing something using devicetree seems like a reasonable
>>> solution).
>>
>> We can either patch to make PPC weird or make Xen weird.  I'm on the
>> fence.
>>
>> Two questions for Paulo:
>> 1) When QEMU support IOMMU on x86, will the virtio devices behind it
>>    respect the IOMMU (do they use the right memory access primitives?).
>>
>> 2) Are we really going to be able to exclude virtio devices from using
>>    the x86 IOMMU in a portable way which will always work?  If it's
>>    per-bus granularity, will qemu really put them on their own PCI bus
>>    and get this right?  Or will it sometimes get it wrong and users will
>>    end up using virtio devices via IOMMU by accident?
>>
>> If the answers are both "yes", then x86 is going to be able to use
>> virtio+IOMMU, so PPC looks like the odd one out.  Otherwise it looks
>> like we're really going to want to stick with the "ignore IOMMU" rule
>> until (handwave future), and we make an exception for Xen.
> 
> There's a third option: try to make virtio-mmio work everywhere
> (except s390), at least in the long run.  This other benefits: it
> makes minimal hypervisors simpler, I think it'll get rid of the limits
> on the number of virtio devices in a system.  ARM is already going
> this direction, and I imagine that PPC support would be
> straightforward (it's already using devicetree).

In my opinion, a uniform "virt" machine for every instruction set would be
very beneficial. I would guess that MMIO is more universally available than
PCI, and as you point out, simpler to implement.

> Does virtio-mmio have any reasonable way of doing hotplug?  It could
> also eventually make sense to have a standard for virtio on virtio.

I don't think so, but it seems possible. My bystander understanding is that
QEMU allocates some fixed number of VirtIO-MMIO devices, maybe a dozen, in the
device tree. The ones that don't actually get hooked up to something real like
a block device or network interface are populated with a dummy device. One
naive approach might be to allow the dummy devices to tell the kernel that
they are now changing to a real device.

Also, higher level hotplug for at least SCSI sounds possible.

https://bugzilla.redhat.com/show_bug.cgi?id=1123390

Christopher

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by the Linux Foundation.

  parent reply	other threads:[~2014-09-10 15:36 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-01 17:39 [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API Andy Lutomirski
2014-09-01 17:39 ` [PATCH v4 1/4] virtio_ring: Support DMA APIs if requested Andy Lutomirski
2014-09-01 17:39 ` [PATCH v4 2/4] virtio_pci: Use the DMA API for virtqueues Andy Lutomirski
2014-09-01 17:39 ` [PATCH v4 3/4] virtio_net: Don't set the end flag on reusable sg entries Andy Lutomirski
2014-09-01 17:39 ` [PATCH v4 4/4] virtio_net: Stop doing DMA from the stack Andy Lutomirski
2014-09-01 22:16 ` [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API Benjamin Herrenschmidt
2014-09-02  5:55   ` Andy Lutomirski
2014-09-02 20:53     ` Benjamin Herrenschmidt
2014-09-02 20:56       ` Konrad Rzeszutek Wilk
2014-09-02 21:08         ` Benjamin Herrenschmidt
2014-09-02 21:37       ` Andy Lutomirski
2014-09-02 22:10         ` Benjamin Herrenschmidt
2014-09-02 23:11           ` Andy Lutomirski
2014-09-02 23:20             ` Benjamin Herrenschmidt
2014-09-02 23:42               ` Andy Lutomirski
2014-09-03  0:25                 ` Benjamin Herrenschmidt
2014-09-03  0:32                   ` Andy Lutomirski
2014-09-03  0:43                     ` Benjamin Herrenschmidt
2014-09-04  2:03                       ` Andy Lutomirski
2014-09-03  7:47                   ` Paolo Bonzini
2014-09-03  7:52                     ` Andy Lutomirski
2014-09-03  8:01                       ` Paolo Bonzini
2014-09-03  8:05                     ` Benjamin Herrenschmidt
2014-09-03 12:11                       ` Paolo Bonzini
2014-09-03 15:07                         ` Andy Lutomirski
2014-09-03 15:11                           ` Paolo Bonzini
2014-09-03 16:39                           ` Michael S. Tsirkin
2014-09-03 20:38                             ` Andy Lutomirski
2014-09-03  7:43               ` Paolo Bonzini
2014-09-03  6:42         ` Rusty Russell
2014-09-03  7:50           ` Andy Lutomirski
2014-09-05  2:31             ` Rusty Russell
2014-09-05  2:57               ` Andy Lutomirski
2014-09-05  5:20                 ` Benjamin Herrenschmidt
2014-09-05  7:33                 ` Christian Borntraeger
2014-09-10 15:36                 ` Christopher Covington [this message]
2014-09-10 16:15                   ` Andy Lutomirski
2014-09-05  5:16               ` Benjamin Herrenschmidt
2014-09-14  8:58               ` Michael S. Tsirkin
2014-09-03 12:51           ` Michael S. Tsirkin
2014-09-05  2:32             ` Rusty Russell
2014-09-05  3:06               ` Andy Lutomirski
2014-09-02 21:10     ` Michael S. Tsirkin
2014-09-02 21:49       ` Andy Lutomirski
  -- strict thread matches above, loose matches on Subject: below --
2015-07-28  1:08 Andy Lutomirski
2015-07-28  7:05 ` Christian Borntraeger
2015-07-28  8:16 ` Paolo Bonzini
2015-07-28 10:12   ` Benjamin Herrenschmidt
2015-07-28 12:46     ` Paolo Bonzini
2015-07-28 13:06       ` Michael S. Tsirkin
2015-07-28 13:11         ` Jan Kiszka
2015-07-28 16:11           ` Andy Lutomirski
2015-07-28 16:44             ` Jan Kiszka
2015-07-28 17:10               ` Andy Lutomirski
2015-07-28 17:17                 ` Jan Kiszka
2015-07-28 18:22                   ` Andy Lutomirski
2015-07-28 19:06                     ` Jan Kiszka
2015-07-28 19:24                       ` Andy Lutomirski
2015-07-28 19:33                         ` Jan Kiszka
2015-07-28 21:16                           ` Andy Lutomirski
2015-07-28 22:43                             ` Andy Lutomirski
2015-07-28 23:21                               ` Benjamin Herrenschmidt
2015-07-28 23:33                                 ` Andy Lutomirski
2015-07-29  0:36                                   ` Benjamin Herrenschmidt
2015-07-29  0:47                                     ` Andy Lutomirski
2015-07-29  0:54                                       ` Benjamin Herrenschmidt
2015-07-29  8:17                                       ` Paolo Bonzini
2015-07-29  8:20                                         ` Jan Kiszka
2015-07-29  9:21                                         ` Benjamin Herrenschmidt
2015-07-29  8:07                                 ` Jan Kiszka
2015-07-28 16:36           ` Paolo Bonzini
2015-07-28 16:42             ` Jan Kiszka
2015-07-28 17:15               ` Paolo Bonzini
2015-07-28 17:19                 ` Jan Kiszka
2015-07-28 17:31                   ` Paolo Bonzini
2015-07-28 13:08 ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54107013.2030701@codeaurora.org \
    --to=cov@codeaurora.org \
    --cc=benh@kernel.crashing.org \
    --cc=borntraeger@de.ibm.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux390@de.ibm.com \
    --cc=luto@amacapital.net \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=virtio-dev@lists.oasis-open.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).