virtio PCI on KVM without IO BARs

virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed

* virtio PCI on KVM without IO BARs
@ 2013-02-28 15:24 Michael S. Tsirkin
  2013-02-28 15:43 ` Jan Kiszka
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Michael S. Tsirkin @ 2013-02-28 15:24 UTC (permalink / raw)
  To: virtualization; +Cc: kvm

OK we talked about this a while ago, here's
a summary and some proposals:
At the moment, virtio PCI uses IO BARs for all accesses.

The reason for IO use is the cost of different VM exit types
of transactions and their emulation on KVM on x86
(it would be trivial to use memory BARs on non x86 platforms
 if they don't have PIO).
Example benchmark (cycles per transaction):
	(io access) outw 1737
	(memory access) movw 4341
for comparison:
	(hypercall access): vmcall 1566
	(pv memory access) movw_fast 1817 (*explanation what this is below)

This creates a problem if we want to make virtio devices
proper PCI express devices with native hotplug support.
This is because each hotpluggable PCI express device always has
a PCI express port (port per device),
where each port is represented by a PCI to PCI bridge.
In turn, a PCI to PCI bridge claims a 4Kbyte aligned
range of IO addresses. This means that we can have at
most 15 such devices, this is a nasty limitation.

Another problem with PIO is support for physical virtio devices,
and nested virt: KVM currently programs all PIO accesses
to cause vm exit, so using this device in a VM will be slow.

So we really want to stop using IO BARs completely if at all possible,
but looking at the table above, switching to memory BAR and movw for
notifications will not work well.

Possible solutions:
1. hypercall instead of PIO
	basically add a hypercall that gets an MMIO address/data
	and does an MMIO write for us.
	We'll want some capability in the device to let guest know
		this is what it should do.
	Pros: even faster than PIO
	Cons: this won't help nested or assigned devices (won't hurt
	      them either as it will be conditional on the capability above).
	Cons: need host kernel support, which then has to be maintained
	      forever, even if intel speeds up MMIO exits.

2. pv memory access
	There are two reasons that memory access is slower:
		- one is that it's handled as an EPT misconfiguration error
		so handled by cpu slow path
		- one is that we need to decode the x86 instruction in
		software, to calculate address/data for the access.

	We could agree that guests would use a specific instruction
	for virtio accesses, and fast-path it specifically.
	This is the pv memory access option above.
	Pros: helps assigned devices and nested virt
	Pros: easy to drop if hardware support is there
	Cons: a bit slower than IO
	Cons: need host kernel support

3. hypervisor assigned IO address
	qemu can reserve IO addresses and assign to virtio devices.
	2 bytes per device (for notification and ISR access) will be
	enough. So we can reserve 4K and this gets us 2000 devices.
        From KVM perspective, nothing changes.
	We'll want some capability in the device to let guest know
	this is what it should do, and pass the io address.
	One way to reserve the addresses is by using the bridge.
	Pros: no need for host kernel support
	Pros: regular PIO so fast
	Cons: does not help assigned devices, breaks nested virt

Simply counting pros/cons, option 3 seems best. It's also the
easiest to implement.

Comments?

-- 
MST

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: virtio PCI on KVM without IO BARs
  2013-02-28 15:24 virtio PCI on KVM without IO BARs Michael S. Tsirkin
@ 2013-02-28 15:43 ` Jan Kiszka
  2013-03-04 22:01 ` Marcelo Tosatti
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Jan Kiszka @ 2013-02-28 15:43 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, virtualization

On 2013-02-28 16:24, Michael S. Tsirkin wrote:
> Another problem with PIO is support for physical virtio devices,
> and nested virt: KVM currently programs all PIO accesses
> to cause vm exit, so using this device in a VM will be slow.

Not answering your question, but support for programming direct PIO
access into KVM's I/O bitmap would be feasible. Such feature may have
some value for assigned devices that use PIO more heavily. They cause
lengthy user-space exists so far.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: virtio PCI on KVM without IO BARs
  2013-02-28 15:24 virtio PCI on KVM without IO BARs Michael S. Tsirkin
  2013-02-28 15:43 ` Jan Kiszka
@ 2013-03-04 22:01 ` Marcelo Tosatti
  2013-03-06  0:05 ` H. Peter Anvin
  2013-04-29 14:48 ` Don Dutile
  3 siblings, 0 replies; 10+ messages in thread
From: Marcelo Tosatti @ 2013-03-04 22:01 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, virtualization

On Thu, Feb 28, 2013 at 05:24:33PM +0200, Michael S. Tsirkin wrote:
> OK we talked about this a while ago, here's
> a summary and some proposals:
> At the moment, virtio PCI uses IO BARs for all accesses.
> 
> The reason for IO use is the cost of different VM exit types
> of transactions and their emulation on KVM on x86
> (it would be trivial to use memory BARs on non x86 platforms
>  if they don't have PIO).
> Example benchmark (cycles per transaction):
> 	(io access) outw 1737
> 	(memory access) movw 4341
> for comparison:
> 	(hypercall access): vmcall 1566
> 	(pv memory access) movw_fast 1817 (*explanation what this is below)
> 
> This creates a problem if we want to make virtio devices
> proper PCI express devices with native hotplug support.
> This is because each hotpluggable PCI express device always has
> a PCI express port (port per device),
> where each port is represented by a PCI to PCI bridge.
> In turn, a PCI to PCI bridge claims a 4Kbyte aligned
> range of IO addresses. This means that we can have at
> most 15 such devices, this is a nasty limitation.
> 
> Another problem with PIO is support for physical virtio devices,
> and nested virt: KVM currently programs all PIO accesses
> to cause vm exit, so using this device in a VM will be slow.
> 
> So we really want to stop using IO BARs completely if at all possible,
> but looking at the table above, switching to memory BAR and movw for
> notifications will not work well.
> 
> Possible solutions:
> 1. hypercall instead of PIO
> 	basically add a hypercall that gets an MMIO address/data
> 	and does an MMIO write for us.
> 	We'll want some capability in the device to let guest know
> 		this is what it should do.
> 	Pros: even faster than PIO
> 	Cons: this won't help nested or assigned devices (won't hurt
> 	      them either as it will be conditional on the capability above).
> 	Cons: need host kernel support, which then has to be maintained
> 	      forever, even if intel speeds up MMIO exits.
> 
> 2. pv memory access
> 	There are two reasons that memory access is slower:
> 		- one is that it's handled as an EPT misconfiguration error
> 		so handled by cpu slow path
> 		- one is that we need to decode the x86 instruction in
> 		software, to calculate address/data for the access.
> 
> 	We could agree that guests would use a specific instruction
> 	for virtio accesses, and fast-path it specifically.
> 	This is the pv memory access option above.
> 	Pros: helps assigned devices and nested virt
> 	Pros: easy to drop if hardware support is there
> 	Cons: a bit slower than IO
> 	Cons: need host kernel support
> 
> 3. hypervisor assigned IO address
> 	qemu can reserve IO addresses and assign to virtio devices.
> 	2 bytes per device (for notification and ISR access) will be
> 	enough. So we can reserve 4K and this gets us 2000 devices.
>         From KVM perspective, nothing changes.
> 	We'll want some capability in the device to let guest know
> 	this is what it should do, and pass the io address.
> 	One way to reserve the addresses is by using the bridge.
> 	Pros: no need for host kernel support
> 	Pros: regular PIO so fast
> 	Cons: does not help assigned devices, breaks nested virt
> 
> Simply counting pros/cons, option 3 seems best. It's also the
> easiest to implement.

Agree.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: virtio PCI on KVM without IO BARs
  2013-02-28 15:24 virtio PCI on KVM without IO BARs Michael S. Tsirkin
  2013-02-28 15:43 ` Jan Kiszka
  2013-03-04 22:01 ` Marcelo Tosatti
@ 2013-03-06  0:05 ` H. Peter Anvin
  2013-03-06  7:14   ` H. Peter Anvin
  2013-04-29 14:48 ` Don Dutile
  3 siblings, 1 reply; 10+ messages in thread
From: H. Peter Anvin @ 2013-03-06  0:05 UTC (permalink / raw)
  To: virtualization; +Cc: Jan Kiszka, KVM list, Michael S. Tsirkin

On 02/28/2013 07:24 AM, Michael S. Tsirkin wrote:
> 
> 3. hypervisor assigned IO address
> 	qemu can reserve IO addresses and assign to virtio devices.
> 	2 bytes per device (for notification and ISR access) will be
> 	enough. So we can reserve 4K and this gets us 2000 devices.
>         From KVM perspective, nothing changes.
> 	We'll want some capability in the device to let guest know
> 	this is what it should do, and pass the io address.
> 	One way to reserve the addresses is by using the bridge.
> 	Pros: no need for host kernel support
> 	Pros: regular PIO so fast
> 	Cons: does not help assigned devices, breaks nested virt
> 
> Simply counting pros/cons, option 3 seems best. It's also the
> easiest to implement.
> 

The problem here is the 4K I/O window for IO device BARs in bridges.
Why not simply add a (possibly proprietary) capability to the PCI bridge
to allow a much narrower window?  That fits much more nicely into the
device resource assignment on the guest side, and could even be
implemented on a real hardware device -- we can offer it to the PCI-SIG
for standardization, even.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: virtio PCI on KVM without IO BARs
  2013-03-06  0:05 ` H. Peter Anvin
@ 2013-03-06  7:14   ` H. Peter Anvin
  2013-03-06  9:21     ` Michael S. Tsirkin
  0 siblings, 1 reply; 10+ messages in thread
From: H. Peter Anvin @ 2013-03-06  7:14 UTC (permalink / raw)
  To: virtualization; +Cc: Jan Kiszka, KVM list, Michael S. Tsirkin

On 03/05/2013 04:05 PM, H. Peter Anvin wrote:
> On 02/28/2013 07:24 AM, Michael S. Tsirkin wrote:
>>
>> 3. hypervisor assigned IO address
>> 	qemu can reserve IO addresses and assign to virtio devices.
>> 	2 bytes per device (for notification and ISR access) will be
>> 	enough. So we can reserve 4K and this gets us 2000 devices.
>>         From KVM perspective, nothing changes.
>> 	We'll want some capability in the device to let guest know
>> 	this is what it should do, and pass the io address.
>> 	One way to reserve the addresses is by using the bridge.
>> 	Pros: no need for host kernel support
>> 	Pros: regular PIO so fast
>> 	Cons: does not help assigned devices, breaks nested virt
>>
>> Simply counting pros/cons, option 3 seems best. It's also the
>> easiest to implement.
>>
> 
> The problem here is the 4K I/O window for IO device BARs in bridges.
> Why not simply add a (possibly proprietary) capability to the PCI bridge
> to allow a much narrower window?  That fits much more nicely into the
> device resource assignment on the guest side, and could even be
> implemented on a real hardware device -- we can offer it to the PCI-SIG
> for standardization, even.
> 

Just a correction: I'm of course not talking about BARs but of the
bridge windows.  The BARs are not a problem; an I/O BAR can cover as
little as four bytes.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: virtio PCI on KVM without IO BARs
  2013-03-06  7:14   ` H. Peter Anvin
@ 2013-03-06  9:21     ` Michael S. Tsirkin
  2013-03-06 11:15       ` H. Peter Anvin
  0 siblings, 1 reply; 10+ messages in thread
From: Michael S. Tsirkin @ 2013-03-06  9:21 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Jan Kiszka, KVM list, virtualization

On Tue, Mar 05, 2013 at 11:14:31PM -0800, H. Peter Anvin wrote:
> On 03/05/2013 04:05 PM, H. Peter Anvin wrote:
> > On 02/28/2013 07:24 AM, Michael S. Tsirkin wrote:
> >>
> >> 3. hypervisor assigned IO address
> >> 	qemu can reserve IO addresses and assign to virtio devices.
> >> 	2 bytes per device (for notification and ISR access) will be
> >> 	enough. So we can reserve 4K and this gets us 2000 devices.
> >>         From KVM perspective, nothing changes.
> >> 	We'll want some capability in the device to let guest know
> >> 	this is what it should do, and pass the io address.
> >> 	One way to reserve the addresses is by using the bridge.
> >> 	Pros: no need for host kernel support
> >> 	Pros: regular PIO so fast
> >> 	Cons: does not help assigned devices, breaks nested virt
> >>
> >> Simply counting pros/cons, option 3 seems best. It's also the
> >> easiest to implement.
> >>
> > 
> > The problem here is the 4K I/O window for IO device BARs in bridges.
> > Why not simply add a (possibly proprietary) capability to the PCI bridge
> > to allow a much narrower window?  That fits much more nicely into the
> > device resource assignment on the guest side, and could even be
> > implemented on a real hardware device -- we can offer it to the PCI-SIG
> > for standardization, even.
> > 
> 
> Just a correction: I'm of course not talking about BARs but of the
> bridge windows.  The BARs are not a problem; an I/O BAR can cover as
> little as four bytes.
> 
> 	-hpa

Right. Though even with better granularify bridge windows
would still be a (smaller) problem causing fragmentation.

If we were to extend the PCI spec I would go for a bridge without
windows at all: a bridge can snoop on configuration transactions and
responses programming devices behind it and build a full map of address
to device mappings.

In partucular, this would be a good fit for an uplink bridge in a PCI
express switch, which is integrated with downlink bridges on the same
silicon, so bridge windows do nothing but add overhead.

> -- 
> H. Peter Anvin, Intel Open Source Technology Center
> I work for Intel.  I don't speak on their behalf.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: virtio PCI on KVM without IO BARs
  2013-03-06  9:21     ` Michael S. Tsirkin
@ 2013-03-06 11:15       ` H. Peter Anvin
  2013-03-06 12:02         ` Michael S. Tsirkin
  0 siblings, 1 reply; 10+ messages in thread
From: H. Peter Anvin @ 2013-03-06 11:15 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Jan Kiszka, KVM list, virtualization

On 03/06/2013 01:21 AM, Michael S. Tsirkin wrote:
> 
> Right. Though even with better granularify bridge windows
> would still be a (smaller) problem causing fragmentation.
> 
> If we were to extend the PCI spec I would go for a bridge without
> windows at all: a bridge can snoop on configuration transactions and
> responses programming devices behind it and build a full map of address
> to device mappings.
> 
> In partucular, this would be a good fit for an uplink bridge in a PCI
> express switch, which is integrated with downlink bridges on the same
> silicon, so bridge windows do nothing but add overhead.
> 

True, but the real problem is that the downlink (type 1 header) is
typically on a different piece of silicon than the device BAR (type 0
header).

I am not sure that a snooping-based system will work and not be
prohibitive in its hardware cost on an actual hardware system.  I
suspect it would decay into needing a large RAM array in every bridge.

	-hpa

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: virtio PCI on KVM without IO BARs
  2013-03-06 11:15       ` H. Peter Anvin
@ 2013-03-06 12:02         ` Michael S. Tsirkin
  0 siblings, 0 replies; 10+ messages in thread
From: Michael S. Tsirkin @ 2013-03-06 12:02 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Jan Kiszka, KVM list, virtualization

On Wed, Mar 06, 2013 at 03:15:16AM -0800, H. Peter Anvin wrote:
> On 03/06/2013 01:21 AM, Michael S. Tsirkin wrote:
> > 
> > Right. Though even with better granularify bridge windows
> > would still be a (smaller) problem causing fragmentation.
> > 
> > If we were to extend the PCI spec I would go for a bridge without
> > windows at all: a bridge can snoop on configuration transactions and
> > responses programming devices behind it and build a full map of address
> > to device mappings.
> > 
> > In partucular, this would be a good fit for an uplink bridge in a PCI
> > express switch, which is integrated with downlink bridges on the same
> > silicon, so bridge windows do nothing but add overhead.
> > 
> 
> True, but the real problem is that the downlink (type 1 header) is
> typically on a different piece of silicon than the device BAR (type 0
> header).
> 
> I am not sure that a snooping-based system will work and not be
> prohibitive in its hardware cost on an actual hardware system.  I
> suspect it would decay into needing a large RAM array in every bridge.
> 
> 	-hpa

This might be more difficult if you allow arbitrary number of functions
behind a bridge.  However, this is not a problem for integrated functions
(e.g. on the same silicon or board).

Bridge could specify which slots host integrated functions and do
snooping for these, and have regular window rules apply to open slots.


-- 
MST

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: virtio PCI on KVM without IO BARs
  2013-02-28 15:24 virtio PCI on KVM without IO BARs Michael S. Tsirkin
                   ` (2 preceding siblings ...)
  2013-03-06  0:05 ` H. Peter Anvin
@ 2013-04-29 14:48 ` Don Dutile
  2013-04-29 23:03   ` H. Peter Anvin
  3 siblings, 1 reply; 10+ messages in thread
From: Don Dutile @ 2013-04-29 14:48 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, virtualization

On 02/28/2013 10:24 AM, Michael S. Tsirkin wrote:
> OK we talked about this a while ago, here's
> a summary and some proposals:
> At the moment, virtio PCI uses IO BARs for all accesses.
>
> The reason for IO use is the cost of different VM exit types
> of transactions and their emulation on KVM on x86
> (it would be trivial to use memory BARs on non x86 platforms
>   if they don't have PIO).
> Example benchmark (cycles per transaction):
> 	(io access) outw 1737
> 	(memory access) movw 4341
> for comparison:
> 	(hypercall access): vmcall 1566
> 	(pv memory access) movw_fast 1817 (*explanation what this is below)
>
> This creates a problem if we want to make virtio devices
> proper PCI express devices with native hotplug support.
> This is because each hotpluggable PCI express device always has
> a PCI express port (port per device),
> where each port is represented by a PCI to PCI bridge.
> In turn, a PCI to PCI bridge claims a 4Kbyte aligned
> range of IO addresses. This means that we can have at
> most 15 such devices, this is a nasty limitation.
>
> Another problem with PIO is support for physical virtio devices,
> and nested virt: KVM currently programs all PIO accesses
> to cause vm exit, so using this device in a VM will be slow.
>
> So we really want to stop using IO BARs completely if at all possible,
> but looking at the table above, switching to memory BAR and movw for
> notifications will not work well.
>
> Possible solutions:
> 1. hypercall instead of PIO
> 	basically add a hypercall that gets an MMIO address/data
> 	and does an MMIO write for us.
> 	We'll want some capability in the device to let guest know
> 		this is what it should do.
> 	Pros: even faster than PIO
> 	Cons: this won't help nested or assigned devices (won't hurt
> 	      them either as it will be conditional on the capability above).
> 	Cons: need host kernel support, which then has to be maintained
> 	      forever, even if intel speeds up MMIO exits.
>
> 2. pv memory access
> 	There are two reasons that memory access is slower:
> 		- one is that it's handled as an EPT misconfiguration error
> 		so handled by cpu slow path
> 		- one is that we need to decode the x86 instruction in
> 		software, to calculate address/data for the access.
>
> 	We could agree that guests would use a specific instruction
> 	for virtio accesses, and fast-path it specifically.
> 	This is the pv memory access option above.
> 	Pros: helps assigned devices and nested virt
> 	Pros: easy to drop if hardware support is there
> 	Cons: a bit slower than IO
> 	Cons: need host kernel support
>
> 3. hypervisor assigned IO address
> 	qemu can reserve IO addresses and assign to virtio devices.
> 	2 bytes per device (for notification and ISR access) will be
> 	enough. So we can reserve 4K and this gets us 2000 devices.
>          From KVM perspective, nothing changes.
> 	We'll want some capability in the device to let guest know
> 	this is what it should do, and pass the io address.
> 	One way to reserve the addresses is by using the bridge.
> 	Pros: no need for host kernel support
> 	Pros: regular PIO so fast
> 	Cons: does not help assigned devices, breaks nested virt
>
> Simply counting pros/cons, option 3 seems best. It's also the
> easiest to implement.
>
> Comments?
>
apologies for late response...

It seems that solution 1 would be the best option for the following reasons:

a) (nearly?) every virt technology out there (xen, kvm, vmware, hyperv)
    has pv drivers in the major OS's using virt (Windows, Linux),
    so having a hypercall table searched, initialized and used for
    fast virtio register access is trivially simple to do.

b) the support can be added with whatever pvdriver set is provided
    w/o impacting OS core support.

c) it's architecture neutral, or can be made architecture neutral.
    e.g., inb/outb & PCI ioport support is very different btwn x86 & non-x86.
    A hypercall interface would not have that dependency/difference.
  
d) it doesn't require new OS support in std/core areas for
    new standard(s), as another thread proposed; this kind of approach
    has a long, time delay to get defined & implemented across OS's.
    In contrast, a hypercall defined interface can be indep. of standards
    bodies, and if built into a pvdriver core, can change &/or adapt rapidly,
    and have additional i/f mechanisms for version-levels, which enables
    cross-hypervisor(version) migration.

e) the hypercall can be extended to do pv-specific hot add/remove,
    eliminating dependencies on emulation support of ACPI-hp or PCIe-hp,
    and simply(?) track core interfaces for hot-plug of (class) devices.

f) For migration, hypercall interfaces could be extended for better/faster
    migration as well (suspend/resume pv device).

my (late) 5 cents (I'll admit it was more than 2 cents)... Don

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: virtio PCI on KVM without IO BARs
  2013-04-29 14:48 ` Don Dutile
@ 2013-04-29 23:03   ` H. Peter Anvin
  0 siblings, 0 replies; 10+ messages in thread
From: H. Peter Anvin @ 2013-04-29 23:03 UTC (permalink / raw)
  To: Don Dutile; +Cc: virtualization, kvm, Michael S. Tsirkin

On 04/29/2013 07:48 AM, Don Dutile wrote:
> 
> c) it's architecture neutral, or can be made architecture neutral.
>    e.g., inb/outb & PCI ioport support is very different btwn x86 &
> non-x86.
>    A hypercall interface would not have that dependency/difference.
>  

You are joking, right?  Hypercalls are if anything *more* architecture
and OS dependent.

	-hpa

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-04-29 23:03 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-28 15:24 virtio PCI on KVM without IO BARs Michael S. Tsirkin
2013-02-28 15:43 ` Jan Kiszka
2013-03-04 22:01 ` Marcelo Tosatti
2013-03-06  0:05 ` H. Peter Anvin
2013-03-06  7:14   ` H. Peter Anvin
2013-03-06  9:21     ` Michael S. Tsirkin
2013-03-06 11:15       ` H. Peter Anvin
2013-03-06 12:02         ` Michael S. Tsirkin
2013-04-29 14:48 ` Don Dutile
2013-04-29 23:03   ` H. Peter Anvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).