virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: Anthony Liguori <aliguori@us.ibm.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>,
	kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	KONRAD Frederic <fred.konrad@greensocs.com>
Subject: Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Date: Wed, 05 Jun 2013 15:42:57 -0500	[thread overview]
Message-ID: <87k3m8qofi.fsf@codemonkey.ws> (raw)
In-Reply-To: <20130605194317.GA30923@redhat.com>

"Michael S. Tsirkin" <mst@redhat.com> writes:

> On Wed, Jun 05, 2013 at 01:57:16PM -0500, Anthony Liguori wrote:
>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> 
>> > On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote:
>> >> Look, it's very simple.
>> > We only need to do it if we do a change that breaks guests.
>> >
>> > Please find a guest that is broken by the patches. You won't find any.
>> 
>> I think the problem in this whole discussion is that we're talking past
>> each other.
>> 
>> Here is my understanding:
>> 
>> 1) PCI-e says that you must be able to disable IO bars and still have a
>> functioning device.
>> 
>> 2) It says (1) because you must size IO bars to 4096 which means that
>> practically speaking, once you enable a dozen or so PIO bars, you run
>> out of PIO space (16 * 4k == 64k and not all that space can be used).
>
>
> Let me add 3 other issues which I mentioned and you seem to miss:
>
> 3) architectures which don't have fast access to IO ports, exist
>    virtio does not work there ATM

Which architectures have PCI but no IO ports?

> 4) setups with many PCI bridges exist and have the same issue
>    as PCI express. virtio does not work there ATM

This is not virtio specific.  This is true for all devices that use IO.

> 5) On x86, even with nested page tables, firmware only decodes
>    the page address on an invalid PTE, not the data. You need to
>    emulate the guest to get at the data. Without
>    nested page tables, we have to do page table walk and emulate
>    to get both address and data. Since this is how MMIO
>    is implemented in kvm on x86, MMIO is much slower than PIO
>    (with nested page tables by a factor of >2, did not test without).

Am well aware of this, this is why we use PIO.

I fully agree with you that when we do MMIO, we should switch the
notification mechanism to avoid encoding anything meaningful as data.

>> virtio-pci uses a IO bars exclusively today.  Existing guest drivers
>> assume that there is an IO bar that contains the virtio-pci registers.
>> So let's consider the following scenarios:
>> 
>> QEMU of today:
>> 
>> 1) qemu -drive file=ubuntu-13.04.img,if=virtio
>> 
>> This works today.  Does adding an MMIO bar at BAR1 break this?
>> Certainly not if the device is behind a PCI bus...
>> 
>> But are we going to put devices behind a PCI-e bus by default?  Are we
>> going to ask the user to choose whether devices are put behind a legacy
>> bus or the express bus?
>> 
>> What happens if we put the device behind a PCI-e bus by default?  Well,
>> it can still work.  That is, until we do something like this:
>> 
>> 2) qemu -drive file=ubuntu-13.04.img,if=virtio -device virtio-rng
>>         -device virtio-balloon..
>> 
>> Such that we have more than a dozen or so devices.  This works
>> perfectly fine today.  It works fine because we've designed virtio to
>> make sure it works fine.  Quoting the spec:
>> 
>> "Configuration space is generally used for rarely-changing or
>>  initialization-time parameters. But it is a limited resource, so it
>>  might be better to use a virtqueue to update configuration information
>>  (the network device does this for filtering, otherwise the table in the
>>  config space could potentially be very large)."
>> 
>> In fact, we can have 100s of PCI devices today without running out of IO
>> space because we're so careful about this.
>> 
>> So if we switch to using PCI-e by default *and* we keep virtio-pci
>> without modifying the device IDs, then very frequently we are going to
>> break existing guests because the drivers they already have no longer
>> work.
>> 
>> A few virtio-serial channels, a few block devices, a couple of network
>> adapters, the balloon and RNG driver, and we hit the IO space limit
>> pretty damn quickly so this is not a contrived scenario at all.  I would
>> expect that we frequently run into this if we don't address this problem.
>> 
>> So we have a few options:
>> 1) Punt all of this complexity to libvirt et al and watch people make
>>    the wrong decisions about when to use PCI-e.  This will become yet
>>    another example of KVM being too hard to configure.
>> 
>> 2) Enable PCI-e by default and just force people to upgrade their
>>    drivers.
>> 
>> 3) Don't use PCI-e by default but still add BAR1 to virtio-pci
>> 
>> 4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely),
>
> We can't do this - it will hurt performance.

Can you explain?  I thought the whole trick with separating out the
virtqueue notification register was to regain the performance?

>>    give
>>    it a new device/vendor ID.   Continue to use virtio-pci for existing
>>    devices potentially adding virtio-{net,blk,...}-pcie variants for
>>    people that care to use them.
>> 
>> I think 1 == 2 == 3 and I view 2 as an ABI breaker.
>
> Why do you think 2 == 3? 2 changes default behaviour. 3 does not.

It doesn't change the default behavior but then we're pushing the
decision of when to use pci-e to the user.  They have to understand that
there can be subtle breakages because the virtio-pci driver may not work
if they are using an old guest.

>> libvirt does like
>> policy so they're going to make a simple decision and always use the
>> same bus by default.  I suspect if we made PCI the default, they might
>> just always set the PCI-e flag just because.
>
> This sounds very strange. But let's assume you are right for
> the sake of the argument ...
>
>> There are hundreds of thousands if not millions of guests with existing
>> virtio-pci drivers.  Forcing them to upgrade better have an extremely
>> good justification.
>> 
>> I think 4 is the best path forward.  It's better for users (guests
>> continue to work as they always have).  There's less confusion about
>> enabling PCI-e support--you must ask for the virtio-pcie variant and you
>> must have a virtio-pcie driver.  It's easy to explain.
>
> I don't think how this changes the situation. libvirt still need
> to set policy and decide which device to use.

But virtio-pcie never exhausts the IO configuration space.  That's the
difference.

And virtio-pcie is a separate driver so presumably libvirt will make
that visible in the XML.  In fact, it should.

>> It also maps to what regular hardware does.  I highly doubt that there
>> are any real PCI cards that made the shift from PCI to PCI-e without
>> bumping at least a revision ID.
>
> Only because the chance it's 100% compatible on the software level is 0.
> It always has some hardware specific quirks.
> No such excuse here.
>
>> It also means we don't need to play games about sometimes enabling IO
>> bars and sometimes not.
>
> This last paragraph is wrong, it ignores the issues 3) to 5) 
> I added above.
>
> If you do take them into account:
> 	- there are reasons to add MMIO BAR to PCI,
> 	  even without PCI express

So far, the only reason you've provided is "it doesn't work on some
architectures."  Which architectures?

> 	- we won't be able to drop IO BAR from virtio

An IO BAR is useless if it means we can't have more than 12 devices.

>
>> Regards,
>> 
>> Anthony Liguori
>> 
>> >
>> >
>> > -- 
>> > MST
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe kvm" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2013-06-05 20:42 UTC|newest]

Thread overview: 95+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-28 16:03 [PATCH RFC] virtio-pci: new config layout: using memory BAR Michael S. Tsirkin
2013-05-28 17:15 ` Anthony Liguori
     [not found] ` <87bo7vvxej.fsf@codemonkey.ws>
2013-05-28 17:32   ` Michael S. Tsirkin
2013-05-28 17:43     ` Paolo Bonzini
2013-05-29  2:02       ` Laszlo Ersek
2013-05-29  4:33       ` Rusty Russell
     [not found]       ` <87mwremmm8.fsf@rustcorp.com.au>
2013-05-29  7:27         ` Paolo Bonzini
2013-05-29  8:05           ` Michael S. Tsirkin
2013-05-29 10:07           ` Laszlo Ersek
2013-05-28 18:53     ` Anthony Liguori
2013-05-28 19:27       ` Michael S. Tsirkin
2013-05-29  4:31   ` Rusty Russell
2013-05-29  8:24     ` Michael S. Tsirkin
2013-05-29  8:52       ` Paolo Bonzini
2013-05-29  9:00       ` Peter Maydell
2013-05-29 10:08         ` Michael S. Tsirkin
2013-05-29 10:53           ` Peter Maydell
2013-05-29 12:16             ` Michael S. Tsirkin
2013-05-29 12:28               ` Paolo Bonzini
2013-05-29 12:37                 ` Michael S. Tsirkin
2013-05-29 12:52     ` Anthony Liguori
2013-05-29 13:24       ` Michael S. Tsirkin
2013-05-29 13:35         ` Peter Maydell
2013-05-29 13:41         ` Paolo Bonzini
2013-05-29 14:02           ` Michael S. Tsirkin
2013-05-29 14:18           ` Anthony Liguori
2013-05-30  7:43           ` Michael S. Tsirkin
2013-05-29 14:16         ` Anthony Liguori
     [not found]         ` <8761y1q3aw.fsf@codemonkey.ws>
2013-05-29 14:30           ` Michael S. Tsirkin
2013-05-29 14:32             ` Paolo Bonzini
2013-05-29 14:52               ` Michael S. Tsirkin
2013-05-29 14:55             ` Anthony Liguori
     [not found]             ` <87k3mhkf7o.fsf@codemonkey.ws>
2013-05-29 16:12               ` Michael S. Tsirkin
2013-05-29 18:16               ` Michael S. Tsirkin
2013-05-30  3:58       ` Rusty Russell
2013-05-30  5:55         ` Michael S. Tsirkin
2013-05-30  7:55         ` Michael S. Tsirkin
2013-06-03  0:17           ` Rusty Russell
2013-05-30 13:53         ` Anthony Liguori
2013-05-30 14:01           ` Michael S. Tsirkin
2013-06-03  0:26             ` Rusty Russell
2013-06-03 10:11               ` Michael S. Tsirkin
2013-06-04  5:31                 ` Rusty Russell
2013-06-04  6:42                   ` Michael S. Tsirkin
2013-06-05  7:19                     ` Rusty Russell
2013-06-05 10:22                       ` Michael S. Tsirkin
2013-06-05 12:59                     ` Anthony Liguori
2013-06-05 14:09                       ` Michael S. Tsirkin
2013-06-05 15:08                         ` Anthony Liguori
2013-06-05 15:19                           ` Michael S. Tsirkin
2013-06-05 15:46                             ` Anthony Liguori
     [not found]                             ` <87bo7ktvaw.fsf@codemonkey.ws>
2013-06-05 16:20                               ` Michael S. Tsirkin
2013-06-05 18:57                                 ` Anthony Liguori
2013-06-05 19:43                                   ` Michael S. Tsirkin
2013-06-05 19:52                                     ` Michael S. Tsirkin
2013-06-05 20:45                                       ` Anthony Liguori
2013-06-05 21:15                                         ` H. Peter Anvin
2013-06-05 21:15                                         ` Michael S. Tsirkin
2013-06-05 20:42                                     ` Anthony Liguori [this message]
2013-06-05 21:14                                       ` Michael S. Tsirkin
2013-06-05 21:53                                         ` Anthony Liguori
     [not found]                                         ` <87d2s0mdh8.fsf@codemonkey.ws>
2013-06-05 22:19                                           ` Benjamin Herrenschmidt
2013-06-05 22:53                                             ` Anthony Liguori
2013-06-05 23:27                                               ` Benjamin Herrenschmidt
2013-06-05 19:54                                   ` Michael S. Tsirkin
2013-06-06  3:42                                   ` Rusty Russell
2013-06-06 14:59                                     ` Anthony Liguori
2013-06-07  1:58                                       ` Rusty Russell
2013-06-07  8:25                                       ` Peter Maydell
2013-06-05 21:10                                 ` H. Peter Anvin
2013-06-05 21:17                                   ` Michael S. Tsirkin
2013-06-05 21:50                                   ` Anthony Liguori
2013-06-05 21:55                                     ` H. Peter Anvin
2013-06-05 22:08                                       ` Anthony Liguori
2013-06-05 23:07                                         ` H. Peter Anvin
2013-06-06  0:41                                           ` Anthony Liguori
2013-06-06  6:34                                             ` Gleb Natapov
2013-06-06 13:53                                               ` H. Peter Anvin
2013-06-06 15:02                                               ` Anthony Liguori
2013-06-06 15:06                                               ` Gerd Hoffmann
2013-06-06 15:10                                                 ` Gleb Natapov
2013-06-06 15:19                                                   ` H. Peter Anvin
2013-06-06 15:22                                                   ` Gerd Hoffmann
2013-07-08  4:25                                                   ` Kevin O'Connor
     [not found]                                               ` <871u8fp9jd.fsf@codemonkey.ws>
2013-06-07 11:30                                                 ` Gleb Natapov
2013-06-11  7:10                                                 ` Michael S. Tsirkin
2013-06-11  7:53                                                   ` Gleb Natapov
2013-06-11  8:02                                                     ` Michael S. Tsirkin
2013-06-11  8:03                                                       ` Gleb Natapov
2013-06-11  8:19                                                         ` Michael S. Tsirkin
2013-06-11  8:22                                                           ` Gleb Natapov
2013-06-11  8:30                                                             ` Michael S. Tsirkin
2013-06-11  8:32                                                               ` Gleb Natapov
2013-06-11  8:04                                                       ` Michael S. Tsirkin
2013-06-06  8:02                   ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k3m8qofi.fsf@codemonkey.ws \
    --to=aliguori@us.ibm.com \
    --cc=fred.konrad@greensocs.com \
    --cc=kvm@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=stefanha@redhat.com \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).