* [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
@ 2008-10-22 8:38 Yu Zhao
0 siblings, 0 replies; 54+ messages in thread
From: Yu Zhao @ 2008-10-22 8:38 UTC (permalink / raw)
To: linux-pci
Cc: randy.dunlap, grundler, achiang, matthew, greg, rdreier,
linux-kernel, jbarnes, virtualization, kvm, mingo
Greetings,
Following patches are intended to support SR-IOV capability in the
Linux kernel. With these patches, people can turn a PCI device with
the capability into multiple ones from software perspective, which
will benefit KVM and achieve other purposes such as QoS, security,
and etc.
Changes from v5 to v6:
1, update ABI document to include SR-IOV sysfs entries (Greg KH)
2, fix two coding style problems (Ingo Molnar)
---
[PATCH 1/16 v6] PCI: remove unnecessary arg of pci_update_resource()
[PATCH 2/16 v6] PCI: define PCI resource names in an 'enum'
[PATCH 3/16 v6] PCI: export __pci_read_base
[PATCH 4/16 v6] PCI: make pci_alloc_child_bus() be able to handle NULL bridge
[PATCH 5/16 v6] PCI: add a wrapper for resource_alignment()
[PATCH 6/16 v6] PCI: add a new function to map BAR offset
[PATCH 7/16 v6] PCI: cleanup pcibios_allocate_resources()
[PATCH 8/16 v6] PCI: add boot options to reassign resources
[PATCH 9/16 v6] PCI: add boot option to align MMIO resources
[PATCH 10/16 v6] PCI: cleanup pci_bus_add_devices()
[PATCH 11/16 v6] PCI: split a new function from pci_bus_add_devices()
[PATCH 12/16 v6] PCI: support the SR-IOV capability
[PATCH 13/16 v6] PCI: reserve bus range for SR-IOV device
[PATCH 14/16 v6] PCI: document for SR-IOV user and developer
[PATCH 15/16 v6] PCI: document the SR-IOV sysfs entries
[PATCH 16/16 v6] PCI: document the new PCI boot parameters
---
Single Root I/O Virtualization (SR-IOV) capability defined by PCI-SIG
is intended to enable multiple system software to share PCI hardware
resources. PCI device that supports this capability can be extended
to one Physical Functions plus multiple Virtual Functions. Physical
Function, which could be considered as the "real" PCI device, reflects
the hardware instance and manages all physical resources. Virtual
Functions are associated with a Physical Function and shares physical
resources with the Physical Function.Software can control allocation of
Virtual Functions via registers encapsulated in the capability structure.
SR-IOV specification can be found at
http://www.pcisig.com/members/downloads/specifications/iov/sr-iov1.0_11Sep07.pdf
Devices that support SR-IOV are available from following vendors:
http://download.intel.com/design/network/ProdBrf/320025.pdf
http://www.netxen.com/products/chipsolutions/NX3031.html
http://www.neterion.com/products/x3100.html
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] <20081022083809.GA3757@yzhao12-linux.sh.intel.com>
@ 2008-11-06 4:48 ` Greg KH
[not found] ` <20081106044828.GA30417@kroah.com>
1 sibling, 0 replies; 54+ messages in thread
From: Greg KH @ 2008-11-06 4:48 UTC (permalink / raw)
To: Yu Zhao
Cc: randy.dunlap, grundler, achiang, matthew, linux-pci, rdreier,
linux-kernel, jbarnes, virtualization, kvm, mingo
On Wed, Oct 22, 2008 at 04:38:09PM +0800, Yu Zhao wrote:
> Greetings,
>
> Following patches are intended to support SR-IOV capability in the
> Linux kernel. With these patches, people can turn a PCI device with
> the capability into multiple ones from software perspective, which
> will benefit KVM and achieve other purposes such as QoS, security,
> and etc.
Is there any actual users of this API around yet? How was it tested as
there is no hardware to test on? Which drivers are going to have to be
rewritten to take advantage of this new interface?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <20081106044828.GA30417@kroah.com>
@ 2008-11-06 15:40 ` H L
[not found] ` <909674.99469.qm@web45112.mail.sp1.yahoo.com>
` (2 subsequent siblings)
3 siblings, 0 replies; 54+ messages in thread
From: H L @ 2008-11-06 15:40 UTC (permalink / raw)
To: Yu Zhao, Greg KH
Cc: randy.dunlap, grundler, achiang, matthew, linux-pci, rdreier,
linux-kernel, jbarnes, virtualization, kvm, mingo
Greetings (from a new lurker to the list),
To your question Greg, "yes" and "sort of" ;-). I have started taking a look at these patches with a strong interest in understanding how they work. I've built a kernel with them and tried out a few things with real SR-IOV hardware.
--
Lance Hartmann
--- On Wed, 11/5/08, Greg KH <greg@kroah.com> wrote:
>
> Is there any actual users of this API around yet? How was
> it tested as
> there is no hardware to test on? Which drivers are going
> to have to be
> rewritten to take advantage of this new interface?
>
> thanks,
>
> greg k-h
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <909674.99469.qm@web45112.mail.sp1.yahoo.com>
@ 2008-11-06 15:43 ` Greg KH
0 siblings, 0 replies; 54+ messages in thread
From: Greg KH @ 2008-11-06 15:43 UTC (permalink / raw)
To: H L
Cc: randy.dunlap, grundler, achiang, matthew, linux-pci, rdreier,
linux-kernel, jbarnes, virtualization, kvm, mingo
On Thu, Nov 06, 2008 at 07:40:12AM -0800, H L wrote:
>
> Greetings (from a new lurker to the list),
Welcome!
> To your question Greg, "yes" and "sort of" ;-). I have started taking
> a look at these patches with a strong interest in understanding how
> they work. I've built a kernel with them and tried out a few things
> with real SR-IOV hardware.
Did you have to modify individual drivers to take advantage of this
code? It looks like the core code will run on this type of hardware,
but there seems to be no real advantage until a driver is modified to
use it, right?
Or am I missing some great advantage to having this code without
modified drivers?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] <20081106154351.GA30459@kroah.com>
@ 2008-11-06 16:41 ` H L
[not found] ` <894107.30288.qm@web45108.mail.sp1.yahoo.com>
` (2 subsequent siblings)
3 siblings, 0 replies; 54+ messages in thread
From: H L @ 2008-11-06 16:41 UTC (permalink / raw)
To: Greg KH
Cc: randy.dunlap, grundler, achiang, matthew, linux-pci, rdreier,
linux-kernel, jbarnes, virtualization, kvm, mingo
I have not modified any existing drivers, but instead I threw together a bare-bones module enabling me to make a call to pci_iov_register() and then poke at an SR-IOV adapter's /sys entries for which no driver was loaded.
It appears from my perusal thus far that drivers using these new SR-IOV patches will require modification; i.e. the driver associated with the Physical Function (PF) will be required to make the pci_iov_register() call along with the requisite notify() function. Essentially this suggests to me a model for the PF driver to perform any "global actions" or setup on behalf of VFs before enabling them after which VF drivers could be associated.
I have so far only seen Yu Zhao's "7-patch" set. I've not yet looked at his subsequently tendered "15-patch" set so I don't know what has changed. The hardware/firmware implementation for any given SR-IOV compatible device, will determine the extent of differences required between a PF driver and a VF driver.
--
Lance Hartmann
--- On Thu, 11/6/08, Greg KH <greg@kroah.com> wrote:
> Date: Thursday, November 6, 2008, 9:43 AM
> On Thu, Nov 06, 2008 at 07:40:12AM -0800, H L wrote:
> >
> > Greetings (from a new lurker to the list),
>
> Welcome!
>
> > To your question Greg, "yes" and "sort
> of" ;-). I have started taking
> > a look at these patches with a strong interest in
> understanding how
> > they work. I've built a kernel with them and
> tried out a few things
> > with real SR-IOV hardware.
>
> Did you have to modify individual drivers to take advantage
> of this
> code? It looks like the core code will run on this type of
> hardware,
> but there seems to be no real advantage until a driver is
> modified to
> use it, right?
>
> Or am I missing some great advantage to having this code
> without
> modified drivers?
>
> thanks,
>
> greg k-h
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <894107.30288.qm@web45108.mail.sp1.yahoo.com>
@ 2008-11-06 16:49 ` Greg KH
[not found] ` <20081106164919.GA4099@kroah.com>
1 sibling, 0 replies; 54+ messages in thread
From: Greg KH @ 2008-11-06 16:49 UTC (permalink / raw)
To: H L
Cc: randy.dunlap, grundler, achiang, matthew, linux-pci, rdreier,
linux-kernel, jbarnes, virtualization, kvm, mingo
A: No.
Q: Should I include quotations after my reply?
On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote:
> I have not modified any existing drivers, but instead I threw together
> a bare-bones module enabling me to make a call to pci_iov_register()
> and then poke at an SR-IOV adapter's /sys entries for which no driver
> was loaded.
>
> It appears from my perusal thus far that drivers using these new
> SR-IOV patches will require modification; i.e. the driver associated
> with the Physical Function (PF) will be required to make the
> pci_iov_register() call along with the requisite notify() function.
> Essentially this suggests to me a model for the PF driver to perform
> any "global actions" or setup on behalf of VFs before enabling them
> after which VF drivers could be associated.
Where would the VF drivers have to be associated? On the "pci_dev"
level or on a higher one?
Will all drivers that want to bind to a "VF" device need to be
rewritten?
> I have so far only seen Yu Zhao's "7-patch" set. I've not yet looked
> at his subsequently tendered "15-patch" set so I don't know what has
> changed. The hardware/firmware implementation for any given SR-IOV
> compatible device, will determine the extent of differences required
> between a PF driver and a VF driver.
Yeah, that's what I'm worried/curious about. Without seeing the code
for such a driver, how can we properly evaluate if this infrastructure
is the correct one and the proper way to do all of this?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 54+ messages in thread
* git repository for SR-IOV development?
[not found] <20081106154351.GA30459@kroah.com>
2008-11-06 16:41 ` [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support H L
[not found] ` <894107.30288.qm@web45108.mail.sp1.yahoo.com>
@ 2008-11-06 16:51 ` H L
[not found] ` <1374.36291.qm@web45108.mail.sp1.yahoo.com>
3 siblings, 0 replies; 54+ messages in thread
From: H L @ 2008-11-06 16:51 UTC (permalink / raw)
To: Greg KH
Cc: randy.dunlap, Xen-devel List, grundler, achiang, matthew,
linux-pci, rdreier, linux-kernel, jbarnes, virtualization, kvm,
mingo
Has anyone initiated or given consideration to the creation of a git repository (say, on kernel.org) for SR-IOV development?
--
Lance Hartmann
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: git repository for SR-IOV development?
[not found] ` <1374.36291.qm@web45108.mail.sp1.yahoo.com>
@ 2008-11-06 16:59 ` Greg KH
0 siblings, 0 replies; 54+ messages in thread
From: Greg KH @ 2008-11-06 16:59 UTC (permalink / raw)
To: H L
Cc: randy.dunlap, Xen-devel List, grundler, achiang, matthew,
linux-pci, rdreier, linux-kernel, jbarnes, virtualization, kvm,
mingo
On Thu, Nov 06, 2008 at 08:51:09AM -0800, H L wrote:
>
> Has anyone initiated or given consideration to the creation of a git
> repository (say, on kernel.org) for SR-IOV development?
Why? It's only a few patches, right? Why would it need a whole new git
tree?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 54+ messages in thread
* RE: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <20081106164919.GA4099@kroah.com>
@ 2008-11-06 17:38 ` Fischer, Anna
2008-11-06 17:47 ` Matthew Wilcox
` (4 subsequent siblings)
5 siblings, 0 replies; 54+ messages in thread
From: Fischer, Anna @ 2008-11-06 17:38 UTC (permalink / raw)
To: Greg KH, H L
Cc: randy.dunlap@oracle.com, grundler@parisc-linux.org,
Chiang, Alexander, matthew@wil.cx, linux-pci@vger.kernel.org,
rdreier@cisco.com, linux-kernel@vger.kernel.org,
jbarnes@virtuousgeek.org,
virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
mingo@elte.hu
> On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote:
> > I have not modified any existing drivers, but instead I threw
> together
> > a bare-bones module enabling me to make a call to pci_iov_register()
> > and then poke at an SR-IOV adapter's /sys entries for which no driver
> > was loaded.
> >
> > It appears from my perusal thus far that drivers using these new
> > SR-IOV patches will require modification; i.e. the driver associated
> > with the Physical Function (PF) will be required to make the
> > pci_iov_register() call along with the requisite notify() function.
> > Essentially this suggests to me a model for the PF driver to perform
> > any "global actions" or setup on behalf of VFs before enabling them
> > after which VF drivers could be associated.
>
> Where would the VF drivers have to be associated? On the "pci_dev"
> level or on a higher one?
A VF appears to the Linux OS as a standard (full, additional) PCI device. The driver is associated in the same way as for a normal PCI device. Ideally, you would use SR-IOV devices on a virtualized system, for example, using Xen. A VF can then be assigned to a guest domain as a full PCI device.
> Will all drivers that want to bind to a "VF" device need to be
> rewritten?
Currently, any vendor providing a SR-IOV device needs to provide a PF driver and a VF driver that runs on their hardware. A VF driver does not necessarily need to know much about SR-IOV but just run on the presented PCI device. You might want to have a communication channel between PF and VF driver though, for various reasons, if such a channel is not provided in hardware.
> > I have so far only seen Yu Zhao's "7-patch" set. I've not yet looked
> > at his subsequently tendered "15-patch" set so I don't know what has
> > changed. The hardware/firmware implementation for any given SR-IOV
> > compatible device, will determine the extent of differences required
> > between a PF driver and a VF driver.
>
> Yeah, that's what I'm worried/curious about. Without seeing the code
> for such a driver, how can we properly evaluate if this infrastructure
> is the correct one and the proper way to do all of this?
Yu's API allows a PF driver to register with the Linux PCI code and use it to activate VFs and allocate their resources. The PF driver needs to be modified to work with that API. While you can argue about how that API is supposed to look like, it is clear that such an API is required in some form. The PF driver needs to know when VFs are active as it might want to allocate further (device-specific) resources to VFs or initiate further (device-specific) configurations. While probably a lot of SR-IOV specific code has to be in the PF driver, there is also support required from the Linux PCI subsystem, which is to some extend provided by Yu's patches.
Anna
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <20081106164919.GA4099@kroah.com>
2008-11-06 17:38 ` Fischer, Anna
@ 2008-11-06 17:47 ` Matthew Wilcox
[not found] ` <20081106174741.GC11773@parisc-linux.org>
` (3 subsequent siblings)
5 siblings, 0 replies; 54+ messages in thread
From: Matthew Wilcox @ 2008-11-06 17:47 UTC (permalink / raw)
To: Greg KH
Cc: randy.dunlap, Chris Wright, grundler, achiang, linux-pci, rdreier,
linux-kernel, jbarnes, virtualization, kvm, mingo
On Thu, Nov 06, 2008 at 08:49:19AM -0800, Greg KH wrote:
> On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote:
> > I have not modified any existing drivers, but instead I threw together
> > a bare-bones module enabling me to make a call to pci_iov_register()
> > and then poke at an SR-IOV adapter's /sys entries for which no driver
> > was loaded.
> >
> > It appears from my perusal thus far that drivers using these new
> > SR-IOV patches will require modification; i.e. the driver associated
> > with the Physical Function (PF) will be required to make the
> > pci_iov_register() call along with the requisite notify() function.
> > Essentially this suggests to me a model for the PF driver to perform
> > any "global actions" or setup on behalf of VFs before enabling them
> > after which VF drivers could be associated.
>
> Where would the VF drivers have to be associated? On the "pci_dev"
> level or on a higher one?
>
> Will all drivers that want to bind to a "VF" device need to be
> rewritten?
The current model being implemented by my colleagues has separate
drivers for the PF (aka native) and VF devices. I don't personally
believe this is the correct path, but I'm reserving judgement until I
see some code.
I don't think we really know what the One True Usage model is for VF
devices. Chris Wright has some ideas, I have some ideas and Yu Zhao has
some ideas. I bet there's other people who have other ideas too.
--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <20081106174741.GC11773@parisc-linux.org>
@ 2008-11-06 17:53 ` Greg KH
[not found] ` <20081106175308.GA17027@kroah.com>
1 sibling, 0 replies; 54+ messages in thread
From: Greg KH @ 2008-11-06 17:53 UTC (permalink / raw)
To: Matthew Wilcox
Cc: randy.dunlap, Chris Wright, grundler, achiang, linux-pci, rdreier,
linux-kernel, jbarnes, virtualization, kvm, mingo
On Thu, Nov 06, 2008 at 10:47:41AM -0700, Matthew Wilcox wrote:
> On Thu, Nov 06, 2008 at 08:49:19AM -0800, Greg KH wrote:
> > On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote:
> > > I have not modified any existing drivers, but instead I threw together
> > > a bare-bones module enabling me to make a call to pci_iov_register()
> > > and then poke at an SR-IOV adapter's /sys entries for which no driver
> > > was loaded.
> > >
> > > It appears from my perusal thus far that drivers using these new
> > > SR-IOV patches will require modification; i.e. the driver associated
> > > with the Physical Function (PF) will be required to make the
> > > pci_iov_register() call along with the requisite notify() function.
> > > Essentially this suggests to me a model for the PF driver to perform
> > > any "global actions" or setup on behalf of VFs before enabling them
> > > after which VF drivers could be associated.
> >
> > Where would the VF drivers have to be associated? On the "pci_dev"
> > level or on a higher one?
> >
> > Will all drivers that want to bind to a "VF" device need to be
> > rewritten?
>
> The current model being implemented by my colleagues has separate
> drivers for the PF (aka native) and VF devices. I don't personally
> believe this is the correct path, but I'm reserving judgement until I
> see some code.
Hm, I would like to see that code before we can properly evaluate this
interface. Especially as they are all tightly tied together.
> I don't think we really know what the One True Usage model is for VF
> devices. Chris Wright has some ideas, I have some ideas and Yu Zhao has
> some ideas. I bet there's other people who have other ideas too.
I'd love to hear those ideas.
Rumor has it, there is some Xen code floating around to support this
already, is that true?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <0199E0D51A61344794750DC57738F58E5E26F996C4@GVW1118EXC.americas.hpqcorp.net>
@ 2008-11-06 18:03 ` Greg KH
2008-11-06 18:36 ` Matthew Wilcox
` (2 subsequent siblings)
3 siblings, 0 replies; 54+ messages in thread
From: Greg KH @ 2008-11-06 18:03 UTC (permalink / raw)
To: Fischer, Anna
Cc: randy.dunlap@oracle.com, grundler@parisc-linux.org,
Chiang, Alexander, matthew@wil.cx, linux-pci@vger.kernel.org,
rdreier@cisco.com, linux-kernel@vger.kernel.org,
jbarnes@virtuousgeek.org,
virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
mingo@elte.hu
On Thu, Nov 06, 2008 at 05:38:16PM +0000, Fischer, Anna wrote:
> > On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote:
> > > I have not modified any existing drivers, but instead I threw
> > together
> > > a bare-bones module enabling me to make a call to pci_iov_register()
> > > and then poke at an SR-IOV adapter's /sys entries for which no driver
> > > was loaded.
> > >
> > > It appears from my perusal thus far that drivers using these new
> > > SR-IOV patches will require modification; i.e. the driver associated
> > > with the Physical Function (PF) will be required to make the
> > > pci_iov_register() call along with the requisite notify() function.
> > > Essentially this suggests to me a model for the PF driver to perform
> > > any "global actions" or setup on behalf of VFs before enabling them
> > > after which VF drivers could be associated.
> >
> > Where would the VF drivers have to be associated? On the "pci_dev"
> > level or on a higher one?
>
> A VF appears to the Linux OS as a standard (full, additional) PCI
> device. The driver is associated in the same way as for a normal PCI
> device. Ideally, you would use SR-IOV devices on a virtualized system,
> for example, using Xen. A VF can then be assigned to a guest domain as
> a full PCI device.
It's that "second" part that I'm worried about. How is that going to
happen? Do you have any patches that show this kind of "assignment"?
> > Will all drivers that want to bind to a "VF" device need to be
> > rewritten?
>
> Currently, any vendor providing a SR-IOV device needs to provide a PF
> driver and a VF driver that runs on their hardware.
Are there any such drivers available yet?
> A VF driver does not necessarily need to know much about SR-IOV but
> just run on the presented PCI device. You might want to have a
> communication channel between PF and VF driver though, for various
> reasons, if such a channel is not provided in hardware.
Agreed, but what does that channel look like in Linux?
I have some ideas of what I think it should look like, but if people
already have code, I'd love to see that as well.
> > > I have so far only seen Yu Zhao's "7-patch" set. I've not yet looked
> > > at his subsequently tendered "15-patch" set so I don't know what has
> > > changed. The hardware/firmware implementation for any given SR-IOV
> > > compatible device, will determine the extent of differences required
> > > between a PF driver and a VF driver.
> >
> > Yeah, that's what I'm worried/curious about. Without seeing the code
> > for such a driver, how can we properly evaluate if this infrastructure
> > is the correct one and the proper way to do all of this?
>
> Yu's API allows a PF driver to register with the Linux PCI code and
> use it to activate VFs and allocate their resources. The PF driver
> needs to be modified to work with that API. While you can argue about
> how that API is supposed to look like, it is clear that such an API is
> required in some form.
I totally agree, I'm arguing about what that API looks like :)
I want to see some code...
thanks,
greg k-h
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <20081106164919.GA4099@kroah.com>
` (3 preceding siblings ...)
[not found] ` <0199E0D51A61344794750DC57738F58E5E26F996C4@GVW1118EXC.americas.hpqcorp.net>
@ 2008-11-06 18:05 ` H L
[not found] ` <392264.50990.qm@web45103.mail.sp1.yahoo.com>
5 siblings, 0 replies; 54+ messages in thread
From: H L @ 2008-11-06 18:05 UTC (permalink / raw)
To: Greg KH
Cc: randy.dunlap, grundler, achiang, matthew, linux-pci, rdreier,
linux-kernel, jbarnes, virtualization, kvm, mingo
--- On Thu, 11/6/08, Greg KH <greg@kroah.com> wrote:
> On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote:
> > I have not modified any existing drivers, but instead
> I threw together
> > a bare-bones module enabling me to make a call to
> pci_iov_register()
> > and then poke at an SR-IOV adapter's /sys entries
> for which no driver
> > was loaded.
> >
> > It appears from my perusal thus far that drivers using
> these new
> > SR-IOV patches will require modification; i.e. the
> driver associated
> > with the Physical Function (PF) will be required to
> make the
> > pci_iov_register() call along with the requisite
> notify() function.
> > Essentially this suggests to me a model for the PF
> driver to perform
> > any "global actions" or setup on behalf of
> VFs before enabling them
> > after which VF drivers could be associated.
>
> Where would the VF drivers have to be associated? On the
> "pci_dev"
> level or on a higher one?
I have not yet fully grocked Yu Zhao's model to answer this. That said, I would *hope* to find it on the "pci_dev" level.
> Will all drivers that want to bind to a "VF"
> device need to be
> rewritten?
Not necessarily, or perhaps minimally; depends on hardware/firmware and actions the driver wants to take. An example here might assist. Let's just say someone has created, oh, I don't know, maybe an SR-IOV NIC. Now, for 'general' I/O operations to pass network traffic back and forth there would ideally be no difference in the actions and therefore behavior of a PF driver and a VF driver. But, what do you do in the instance a VF wants to change link-speed? As that physical characteristic affects all VFs, how do you handle that? This is where the hardware/firmware implementation part comes to play. If a VF driver performs some actions to initiate the change in link speed, the logic in the adapter could be anything like:
1. Acknowledge the request as if it were really done, but effectively ignore it. The Independent Hardware Vendor (IHV) might dictate that if you want to change any "global" characteristics of an adapter, you may only do so via the PF driver. Granted, this, depending on the device class, may just not be acceptable.
2. Acknowledge the request and then trigger an interrupt to the PF driver to have it assist. The PF driver might then just set the new link-speed, or it could result in a PF driver communicating by some mechanism to all of the VF driver instances that this change of link-speed was requested.
3. Acknowledge the request and perform inner PF and VF communication of this event within the logic of the card (e.g. to "vote" on whether or not to perform this action) with interrupts and associated status delivered to all PF and VF drivers.
The list goes on.
>
> > I have so far only seen Yu Zhao's
> "7-patch" set. I've not yet looked
> > at his subsequently tendered "15-patch" set
> so I don't know what has
> > changed. The hardware/firmware implementation for
> any given SR-IOV
> > compatible device, will determine the extent of
> differences required
> > between a PF driver and a VF driver.
>
> Yeah, that's what I'm worried/curious about.
> Without seeing the code
> for such a driver, how can we properly evaluate if this
> infrastructure
> is the correct one and the proper way to do all of this?
As the example above demonstrates, that's a tough question to answer. Ideally, in my view, there would only be one driver written per SR-IOV device and it would contain the logic to "do the right things" based on whether its running as a PF or VF with that determination easily accomplished by testing the existence of the SR-IOV extended capability. Then, in an effort to minimize (if not eliminate) the complexities of driver-to-driver actions for fielding "global events", contain as much of the logic as is possible within the adapter. Minimizing the efforts required for the device driver writers in my opinion paves the way to greater adoption of this technology.
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <392264.50990.qm@web45103.mail.sp1.yahoo.com>
@ 2008-11-06 18:24 ` Greg KH
[not found] ` <20081106182443.GB17782@kroah.com>
1 sibling, 0 replies; 54+ messages in thread
From: Greg KH @ 2008-11-06 18:24 UTC (permalink / raw)
To: H L
Cc: randy.dunlap, grundler, achiang, matthew, linux-pci, rdreier,
linux-kernel, jbarnes, virtualization, kvm, mingo
On Thu, Nov 06, 2008 at 10:05:39AM -0800, H L wrote:
>
> --- On Thu, 11/6/08, Greg KH <greg@kroah.com> wrote:
>
> > On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote:
> > > I have not modified any existing drivers, but instead
> > I threw together
> > > a bare-bones module enabling me to make a call to
> > pci_iov_register()
> > > and then poke at an SR-IOV adapter's /sys entries
> > for which no driver
> > > was loaded.
> > >
> > > It appears from my perusal thus far that drivers using
> > these new
> > > SR-IOV patches will require modification; i.e. the
> > driver associated
> > > with the Physical Function (PF) will be required to
> > make the
> > > pci_iov_register() call along with the requisite
> > notify() function.
> > > Essentially this suggests to me a model for the PF
> > driver to perform
> > > any "global actions" or setup on behalf of
> > VFs before enabling them
> > > after which VF drivers could be associated.
> >
> > Where would the VF drivers have to be associated? On the
> > "pci_dev"
> > level or on a higher one?
>
>
> I have not yet fully grocked Yu Zhao's model to answer this. That
> said, I would *hope* to find it on the "pci_dev" level.
Me too.
> > Will all drivers that want to bind to a "VF"
> > device need to be
> > rewritten?
>
> Not necessarily, or perhaps minimally; depends on hardware/firmware
> and actions the driver wants to take. An example here might assist.
> Let's just say someone has created, oh, I don't know, maybe an SR-IOV
> NIC. Now, for 'general' I/O operations to pass network traffic back
> and forth there would ideally be no difference in the actions and
> therefore behavior of a PF driver and a VF driver. But, what do you
> do in the instance a VF wants to change link-speed? As that physical
> characteristic affects all VFs, how do you handle that? This is where
> the hardware/firmware implementation part comes to play. If a VF
> driver performs some actions to initiate the change in link speed, the
> logic in the adapter could be anything like:
<snip>
Yes, I agree that all of this needs to be done, somehow.
It's that "somehow" that I am interested in trying to see how it works
out.
> >
> > > I have so far only seen Yu Zhao's
> > "7-patch" set. I've not yet looked
> > > at his subsequently tendered "15-patch" set
> > so I don't know what has
> > > changed. The hardware/firmware implementation for
> > any given SR-IOV
> > > compatible device, will determine the extent of
> > differences required
> > > between a PF driver and a VF driver.
> >
> > Yeah, that's what I'm worried/curious about.
> > Without seeing the code
> > for such a driver, how can we properly evaluate if this
> > infrastructure
> > is the correct one and the proper way to do all of this?
>
>
> As the example above demonstrates, that's a tough question to answer.
> Ideally, in my view, there would only be one driver written per SR-IOV
> device and it would contain the logic to "do the right things" based
> on whether its running as a PF or VF with that determination easily
> accomplished by testing the existence of the SR-IOV extended
> capability. Then, in an effort to minimize (if not eliminate) the
> complexities of driver-to-driver actions for fielding "global events",
> contain as much of the logic as is possible within the adapter.
> Minimizing the efforts required for the device driver writers in my
> opinion paves the way to greater adoption of this technology.
Yes, making things easier is the key here.
Perhaps some of this could be hidden with a new bus type for these kinds
of devices? Or a "virtual" bus of pci devices that the original SR-IOV
device creates that corrispond to the individual virtual PCI devices?
If that were the case, then it might be a lot easier in the end.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <0199E0D51A61344794750DC57738F58E5E26F996C4@GVW1118EXC.americas.hpqcorp.net>
2008-11-06 18:03 ` Greg KH
@ 2008-11-06 18:36 ` Matthew Wilcox
[not found] ` <20081106180354.GA17429@kroah.com>
[not found] ` <20081106183630.GD11773@parisc-linux.org>
3 siblings, 0 replies; 54+ messages in thread
From: Matthew Wilcox @ 2008-11-06 18:36 UTC (permalink / raw)
To: Fischer, Anna
Cc: randy.dunlap@oracle.com, grundler@parisc-linux.org,
Chiang, Alexander, Greg KH, rdreier@cisco.com,
linux-kernel@vger.kernel.org, jbarnes@virtuousgeek.org,
virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
linux-pci@vger.kernel.org, mingo@elte.hu
[Anna, can you fix your word-wrapping please? Your lines appear to be
infinitely long which is most unpleasant to reply to]
On Thu, Nov 06, 2008 at 05:38:16PM +0000, Fischer, Anna wrote:
> > Where would the VF drivers have to be associated? On the "pci_dev"
> > level or on a higher one?
>
> A VF appears to the Linux OS as a standard (full, additional) PCI
> device. The driver is associated in the same way as for a normal PCI
> device. Ideally, you would use SR-IOV devices on a virtualized system,
> for example, using Xen. A VF can then be assigned to a guest domain as
> a full PCI device.
It's not clear thats the right solution. If the VF devices are _only_
going to be used by the guest, then arguably, we don't want to create
pci_devs for them in the host. (I think it _is_ the right answer, but I
want to make it clear there's multiple opinions on this).
> > Will all drivers that want to bind to a "VF" device need to be
> > rewritten?
>
> Currently, any vendor providing a SR-IOV device needs to provide a PF
> driver and a VF driver that runs on their hardware. A VF driver does not
> necessarily need to know much about SR-IOV but just run on the presented
> PCI device. You might want to have a communication channel between PF
> and VF driver though, for various reasons, if such a channel is not
> provided in hardware.
That is one model. Another model is to provide one driver that can
handle both PF and VF devices. A third model is to provide, say, a
Windows VF driver and a Xen PF driver and only support Windows-on-Xen.
(This last would probably be an exercise in foot-shooting, but
nevertheless, I've heard it mooted).
> > Yeah, that's what I'm worried/curious about. Without seeing the code
> > for such a driver, how can we properly evaluate if this infrastructure
> > is the correct one and the proper way to do all of this?
>
> Yu's API allows a PF driver to register with the Linux PCI code and use
> it to activate VFs and allocate their resources. The PF driver needs to
> be modified to work with that API. While you can argue about how that API
> is supposed to look like, it is clear that such an API is required in some
> form. The PF driver needs to know when VFs are active as it might want to
> allocate further (device-specific) resources to VFs or initiate further
> (device-specific) configurations. While probably a lot of SR-IOV specific
> code has to be in the PF driver, there is also support required from
> the Linux PCI subsystem, which is to some extend provided by Yu's patches.
Everyone agrees that some support is necessary. The question is exactly
what it looks like. I must confess to not having reviewed this latest
patch series yet -- I'm a little burned out on patch review.
--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
^ permalink raw reply [flat|nested] 54+ messages in thread
* RE: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <20081106180354.GA17429@kroah.com>
@ 2008-11-06 20:04 ` Fischer, Anna
2008-11-09 12:44 ` Avi Kivity
[not found] ` <4916DB16.2040709@redhat.com>
2 siblings, 0 replies; 54+ messages in thread
From: Fischer, Anna @ 2008-11-06 20:04 UTC (permalink / raw)
To: Greg KH
Cc: randy.dunlap@oracle.com, grundler@parisc-linux.org,
Chiang, Alexander, matthew@wil.cx, linux-pci@vger.kernel.org,
rdreier@cisco.com, linux-kernel@vger.kernel.org,
jbarnes@virtuousgeek.org,
virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
mingo@elte.hu
> Subject: Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
>
> On Thu, Nov 06, 2008 at 05:38:16PM +0000, Fischer, Anna wrote:
> > > On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote:
> > > > I have not modified any existing drivers, but instead I threw
> > > together
> > > > a bare-bones module enabling me to make a call to
> pci_iov_register()
> > > > and then poke at an SR-IOV adapter's /sys entries for which no
> driver
> > > > was loaded.
> > > >
> > > > It appears from my perusal thus far that drivers using these new
> > > > SR-IOV patches will require modification; i.e. the driver
> associated
> > > > with the Physical Function (PF) will be required to make the
> > > > pci_iov_register() call along with the requisite notify()
> function.
> > > > Essentially this suggests to me a model for the PF driver to
> perform
> > > > any "global actions" or setup on behalf of VFs before enabling
> them
> > > > after which VF drivers could be associated.
> > >
> > > Where would the VF drivers have to be associated? On the "pci_dev"
> > > level or on a higher one?
> >
> > A VF appears to the Linux OS as a standard (full, additional) PCI
> > device. The driver is associated in the same way as for a normal PCI
> > device. Ideally, you would use SR-IOV devices on a virtualized
> system,
> > for example, using Xen. A VF can then be assigned to a guest domain
> as
> > a full PCI device.
>
> It's that "second" part that I'm worried about. How is that going to
> happen? Do you have any patches that show this kind of "assignment"?
That depends on your setup. Using Xen, you could assign the VF to a guest domain like any other PCI device, e.g. using PCI pass-through. For VMware, KVM, there are standard ways to do that, too. I currently don't see why SR-IOV devices would need any specific, non-standard mechanism for device assignment.
> > > Will all drivers that want to bind to a "VF" device need to be
> > > rewritten?
> >
> > Currently, any vendor providing a SR-IOV device needs to provide a PF
> > driver and a VF driver that runs on their hardware.
>
> Are there any such drivers available yet?
I don't know.
> > A VF driver does not necessarily need to know much about SR-IOV but
> > just run on the presented PCI device. You might want to have a
> > communication channel between PF and VF driver though, for various
> > reasons, if such a channel is not provided in hardware.
>
> Agreed, but what does that channel look like in Linux?
>
> I have some ideas of what I think it should look like, but if people
> already have code, I'd love to see that as well.
At this point I would guess that this code is vendor specific, as are the drivers. The issue I see is that most likely drivers will run in different environments, for example, in Xen the PF driver runs in a driver domain while a VF driver runs in a guest VM. So a communication channel would need to be either Xen specific, or vendor specific. Also, a guest using the VF might run Windows while the PF might be controlled under Linux.
Anna
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <20081106175308.GA17027@kroah.com>
@ 2008-11-06 22:24 ` Simon Horman
2008-11-06 22:40 ` Anthony Liguori
` (3 subsequent siblings)
4 siblings, 0 replies; 54+ messages in thread
From: Simon Horman @ 2008-11-06 22:24 UTC (permalink / raw)
To: Greg KH
Cc: randy.dunlap, Chris Wright, grundler, achiang, Matthew Wilcox,
linux-pci, rdreier, linux-kernel, jbarnes, virtualization, kvm,
mingo
On Thu, Nov 06, 2008 at 09:53:08AM -0800, Greg KH wrote:
> On Thu, Nov 06, 2008 at 10:47:41AM -0700, Matthew Wilcox wrote:
> > On Thu, Nov 06, 2008 at 08:49:19AM -0800, Greg KH wrote:
> > > On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote:
> > > > I have not modified any existing drivers, but instead I threw together
> > > > a bare-bones module enabling me to make a call to pci_iov_register()
> > > > and then poke at an SR-IOV adapter's /sys entries for which no driver
> > > > was loaded.
> > > >
> > > > It appears from my perusal thus far that drivers using these new
> > > > SR-IOV patches will require modification; i.e. the driver associated
> > > > with the Physical Function (PF) will be required to make the
> > > > pci_iov_register() call along with the requisite notify() function.
> > > > Essentially this suggests to me a model for the PF driver to perform
> > > > any "global actions" or setup on behalf of VFs before enabling them
> > > > after which VF drivers could be associated.
> > >
> > > Where would the VF drivers have to be associated? On the "pci_dev"
> > > level or on a higher one?
> > >
> > > Will all drivers that want to bind to a "VF" device need to be
> > > rewritten?
> >
> > The current model being implemented by my colleagues has separate
> > drivers for the PF (aka native) and VF devices. I don't personally
> > believe this is the correct path, but I'm reserving judgement until I
> > see some code.
>
> Hm, I would like to see that code before we can properly evaluate this
> interface. Especially as they are all tightly tied together.
>
> > I don't think we really know what the One True Usage model is for VF
> > devices. Chris Wright has some ideas, I have some ideas and Yu Zhao has
> > some ideas. I bet there's other people who have other ideas too.
>
> I'd love to hear those ideas.
>
> Rumor has it, there is some Xen code floating around to support this
> already, is that true?
Xen patches were posted to xen-devel by Yu Zhao on the 29th of September [1].
Unfortunately the only responses that I can find are a) that the patches
were mangled and b) they seem to include changes (by others) that have
been merged into Linux. I have confirmed that both of these concerns
are valid.
I have not yet examined the difference, if any, in the approach taken by Yu
to SR-IOV in Linux and Xen. Unfortunately comparison is less than trivial
due to the gaping gap in kernel versions between Linux-Xen (2.6.18.8) and
Linux itself.
One approach that I was considering in order to familiarise myself with the
code was to backport the v6 Linux patches (this thread) to Linux-Xen. I made a
start on that, but again due to kernel version differences it is non-trivial.
[1] http://lists.xensource.com/archives/html/xen-devel/2008-09/msg00923.html
--
Simon Horman
VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <20081106183630.GD11773@parisc-linux.org>
@ 2008-11-06 22:38 ` Anthony Liguori
[not found] ` <491371F0.7020805@codemonkey.ws>
1 sibling, 0 replies; 54+ messages in thread
From: Anthony Liguori @ 2008-11-06 22:38 UTC (permalink / raw)
To: Matthew Wilcox
Cc: randy.dunlap@oracle.com, grundler@parisc-linux.org,
Chiang, Alexander, Greg KH, rdreier@cisco.com,
linux-kernel@vger.kernel.org, jbarnes@virtuousgeek.org,
virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
linux-pci@vger.kernel.org, mingo@elte.hu
Matthew Wilcox wrote:
> [Anna, can you fix your word-wrapping please? Your lines appear to be
> infinitely long which is most unpleasant to reply to]
>
> On Thu, Nov 06, 2008 at 05:38:16PM +0000, Fischer, Anna wrote:
>
>>> Where would the VF drivers have to be associated? On the "pci_dev"
>>> level or on a higher one?
>>>
>> A VF appears to the Linux OS as a standard (full, additional) PCI
>> device. The driver is associated in the same way as for a normal PCI
>> device. Ideally, you would use SR-IOV devices on a virtualized system,
>> for example, using Xen. A VF can then be assigned to a guest domain as
>> a full PCI device.
>>
>
> It's not clear thats the right solution. If the VF devices are _only_
> going to be used by the guest, then arguably, we don't want to create
> pci_devs for them in the host. (I think it _is_ the right answer, but I
> want to make it clear there's multiple opinions on this).
>
The VFs shouldn't be limited to being used by the guest.
SR-IOV is actually an incredibly painful thing. You need to have a VF
driver in the guest, do hardware pass through, have a PV driver stub in
the guest that's hypervisor specific (a VF is not usable on it's own),
have a device specific backend in the VMM, and if you want to do live
migration, have another PV driver in the guest that you can do teaming
with. Just a mess.
What we would rather do in KVM, is have the VFs appear in the host as
standard network devices. We would then like to back our existing PV
driver to this VF directly bypassing the host networking stack. A key
feature here is being able to fill the VF's receive queue with guest
memory instead of host kernel memory so that you can get zero-copy
receive traffic. This will perform just as well as doing passthrough
(at least) and avoid all that ugliness of dealing with SR-IOV in the guest.
This eliminates all of the mess of various drivers in the guest and all
the associated baggage of doing hardware passthrough.
So IMHO, having VFs be usable in the host is absolutely critical because
I think it's the only reasonable usage model.
Regards,
Anthony Liguori
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <20081106175308.GA17027@kroah.com>
2008-11-06 22:24 ` Simon Horman
@ 2008-11-06 22:40 ` Anthony Liguori
2008-11-06 23:54 ` Chris Wright
` (2 subsequent siblings)
4 siblings, 0 replies; 54+ messages in thread
From: Anthony Liguori @ 2008-11-06 22:40 UTC (permalink / raw)
To: Greg KH
Cc: randy.dunlap, Chris Wright, grundler, achiang, Matthew Wilcox,
linux-pci, rdreier, linux-kernel, jbarnes, virtualization, kvm,
mingo
Greg KH wrote:
> On Thu, Nov 06, 2008 at 10:47:41AM -0700, Matthew Wilcox wrote:
>
>> I don't think we really know what the One True Usage model is for VF
>> devices. Chris Wright has some ideas, I have some ideas and Yu Zhao has
>> some ideas. I bet there's other people who have other ideas too.
>>
>
> I'd love to hear those ideas.
>
We've been talking about avoiding hardware passthrough entirely and just
backing a virtio-net backend driver by a dedicated VF in the host. That
avoids a huge amount of guest-facing complexity, let's migration Just
Work, and should give the same level of performance.
Regards,
Anthony Liguori
> Rumor has it, there is some Xen code floating around to support this
> already, is that true?
>
> thanks,
>
> greg k-h
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <491371F0.7020805@codemonkey.ws>
@ 2008-11-06 22:58 ` Matthew Wilcox
2008-11-07 1:52 ` Dong, Eddie
` (5 subsequent siblings)
6 siblings, 0 replies; 54+ messages in thread
From: Matthew Wilcox @ 2008-11-06 22:58 UTC (permalink / raw)
To: Anthony Liguori
Cc: randy.dunlap@oracle.com, grundler@parisc-linux.org,
Chiang, Alexander, Greg KH, rdreier@cisco.com,
linux-kernel@vger.kernel.org, jbarnes@virtuousgeek.org,
virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
linux-pci@vger.kernel.org, mingo@elte.hu
On Thu, Nov 06, 2008 at 04:38:40PM -0600, Anthony Liguori wrote:
> >It's not clear thats the right solution. If the VF devices are _only_
> >going to be used by the guest, then arguably, we don't want to create
> >pci_devs for them in the host. (I think it _is_ the right answer, but I
> >want to make it clear there's multiple opinions on this).
>
> The VFs shouldn't be limited to being used by the guest.
>
> SR-IOV is actually an incredibly painful thing. You need to have a VF
> driver in the guest, do hardware pass through, have a PV driver stub in
> the guest that's hypervisor specific (a VF is not usable on it's own),
> have a device specific backend in the VMM, and if you want to do live
> migration, have another PV driver in the guest that you can do teaming
> with. Just a mess.
Not to mention that you basically have to statically allocate them up
front.
> What we would rather do in KVM, is have the VFs appear in the host as
> standard network devices. We would then like to back our existing PV
> driver to this VF directly bypassing the host networking stack. A key
> feature here is being able to fill the VF's receive queue with guest
> memory instead of host kernel memory so that you can get zero-copy
> receive traffic. This will perform just as well as doing passthrough
> (at least) and avoid all that ugliness of dealing with SR-IOV in the guest.
This argues for ignoring the SR-IOV mess completely. Just have the
host driver expose multiple 'ethN' devices.
> This eliminates all of the mess of various drivers in the guest and all
> the associated baggage of doing hardware passthrough.
>
> So IMHO, having VFs be usable in the host is absolutely critical because
> I think it's the only reasonable usage model.
>
> Regards,
>
> Anthony Liguori
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <20081106175308.GA17027@kroah.com>
2008-11-06 22:24 ` Simon Horman
2008-11-06 22:40 ` Anthony Liguori
@ 2008-11-06 23:54 ` Chris Wright
[not found] ` <49137255.9010104@codemonkey.ws>
[not found] ` <20081106235406.GB30790@sequoia.sous-sol.org>
4 siblings, 0 replies; 54+ messages in thread
From: Chris Wright @ 2008-11-06 23:54 UTC (permalink / raw)
To: Greg KH
Cc: randy.dunlap, Chris Wright, grundler, achiang, Matthew Wilcox,
linux-pci, rdreier, linux-kernel, jbarnes, virtualization, kvm,
mingo
* Greg KH (greg@kroah.com) wrote:
> On Thu, Nov 06, 2008 at 10:47:41AM -0700, Matthew Wilcox wrote:
> > On Thu, Nov 06, 2008 at 08:49:19AM -0800, Greg KH wrote:
> > > On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote:
> > > > I have not modified any existing drivers, but instead I threw together
> > > > a bare-bones module enabling me to make a call to pci_iov_register()
> > > > and then poke at an SR-IOV adapter's /sys entries for which no driver
> > > > was loaded.
> > > >
> > > > It appears from my perusal thus far that drivers using these new
> > > > SR-IOV patches will require modification; i.e. the driver associated
> > > > with the Physical Function (PF) will be required to make the
> > > > pci_iov_register() call along with the requisite notify() function.
> > > > Essentially this suggests to me a model for the PF driver to perform
> > > > any "global actions" or setup on behalf of VFs before enabling them
> > > > after which VF drivers could be associated.
> > >
> > > Where would the VF drivers have to be associated? On the "pci_dev"
> > > level or on a higher one?
> > >
> > > Will all drivers that want to bind to a "VF" device need to be
> > > rewritten?
> >
> > The current model being implemented by my colleagues has separate
> > drivers for the PF (aka native) and VF devices. I don't personally
> > believe this is the correct path, but I'm reserving judgement until I
> > see some code.
>
> Hm, I would like to see that code before we can properly evaluate this
> interface. Especially as they are all tightly tied together.
>
> > I don't think we really know what the One True Usage model is for VF
> > devices. Chris Wright has some ideas, I have some ideas and Yu Zhao has
> > some ideas. I bet there's other people who have other ideas too.
>
> I'd love to hear those ideas.
First there's the question of how to represent the VF on the host.
Ideally (IMO) this would show up as a normal interface so that normal tools
can configure the interface. This is not exactly how the first round of
patches were designed.
Second there's the question of reserving the BDF on the host such that
we don't have two drivers (one in the host and one in a guest) trying to
drive the same device (an issue that shows up for device assignment as
well as VF assignment).
Third there's the question of whether the VF can be used in the host at
all.
Fourth there's the question of whether the VF and PF drivers are the
same or separate.
The typical usecase is assigning the VF to the guest directly, so
there's only enough functionality in the host side to allocate a VF,
configure it, and assign it (and propagate AER). This is with separate
PF and VF driver.
As Anthony mentioned, we are interested in allowing the host to use the
VF. This could be useful for containers as well as dedicating a VF (a
set of device resources) to a guest w/out passing it through.
thanks,
-chris
^ permalink raw reply [flat|nested] 54+ messages in thread
* RE: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <491371F0.7020805@codemonkey.ws>
2008-11-06 22:58 ` Matthew Wilcox
@ 2008-11-07 1:52 ` Dong, Eddie
2008-11-07 2:08 ` Nakajima, Jun
` (4 subsequent siblings)
6 siblings, 0 replies; 54+ messages in thread
From: Dong, Eddie @ 2008-11-07 1:52 UTC (permalink / raw)
To: Anthony Liguori, Matthew Wilcox
Cc: randy.dunlap@oracle.com, grundler@parisc-linux.org,
Chiang, Alexander, Greg KH, rdreier@cisco.com, Dong, Eddie,
linux-kernel@vger.kernel.org, jbarnes@virtuousgeek.org,
virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
linux-pci@vger.kernel.org, mingo@elte.hu
> What we would rather do in KVM, is have the VFs appear in
> the host as standard network devices. We would then like
> to back our existing PV driver to this VF directly
> bypassing the host networking stack. A key feature here
> is being able to fill the VF's receive queue with guest
> memory instead of host kernel memory so that you can get
> zero-copy
> receive traffic. This will perform just as well as doing
> passthrough (at least) and avoid all that ugliness of
> dealing with SR-IOV in the guest.
>
Anthony:
This is already addressed by VMDq solution(or so called netchannel2), right? Qing He is debugging the KVM side patch and pretty much close to end.
For this single purpose, we don't need SR-IOV. BTW at least Intel SR-IOV NIC also supports VMDq, so you can achieve this by simply use "native" VMDq enabled driver here, plus the work we are debugging now.
Thx, eddie
^ permalink raw reply [flat|nested] 54+ messages in thread
* RE: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <491371F0.7020805@codemonkey.ws>
2008-11-06 22:58 ` Matthew Wilcox
2008-11-07 1:52 ` Dong, Eddie
@ 2008-11-07 2:08 ` Nakajima, Jun
[not found] ` <20081106225854.GA15439@parisc-linux.org>
` (3 subsequent siblings)
6 siblings, 0 replies; 54+ messages in thread
From: Nakajima, Jun @ 2008-11-07 2:08 UTC (permalink / raw)
To: Anthony Liguori, Matthew Wilcox
Cc: randy.dunlap@oracle.com, grundler@parisc-linux.org,
Chiang, Alexander, Greg KH, rdreier@cisco.com,
linux-kernel@vger.kernel.org, jbarnes@virtuousgeek.org,
virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
linux-pci@vger.kernel.org, mingo@elte.hu
On 11/6/2008 2:38:40 PM, Anthony Liguori wrote:
> Matthew Wilcox wrote:
> > [Anna, can you fix your word-wrapping please? Your lines appear to
> > be infinitely long which is most unpleasant to reply to]
> >
> > On Thu, Nov 06, 2008 at 05:38:16PM +0000, Fischer, Anna wrote:
> >
> > > > Where would the VF drivers have to be associated? On the "pci_dev"
> > > > level or on a higher one?
> > > >
> > > A VF appears to the Linux OS as a standard (full, additional) PCI
> > > device. The driver is associated in the same way as for a normal
> > > PCI device. Ideally, you would use SR-IOV devices on a virtualized
> > > system, for example, using Xen. A VF can then be assigned to a
> > > guest domain as a full PCI device.
> > >
> >
> > It's not clear thats the right solution. If the VF devices are
> > _only_ going to be used by the guest, then arguably, we don't want
> > to create pci_devs for them in the host. (I think it _is_ the right
> > answer, but I want to make it clear there's multiple opinions on this).
> >
>
> The VFs shouldn't be limited to being used by the guest.
>
> SR-IOV is actually an incredibly painful thing. You need to have a VF
> driver in the guest, do hardware pass through, have a PV driver stub
> in the guest that's hypervisor specific (a VF is not usable on it's
> own), have a device specific backend in the VMM, and if you want to do
> live migration, have another PV driver in the guest that you can do
> teaming with. Just a mess.
Actually "a PV driver stub in the guest" _was_ correct; I admit that I stated so at a virt mini summit more than a half year ago ;-). But the things have changed, and such a stub is no longer required (at least in our implementation). The major benefit of VF drivers now is that they are VMM-agnostic.
>
> What we would rather do in KVM, is have the VFs appear in the host as
> standard network devices. We would then like to back our existing PV
> driver to this VF directly bypassing the host networking stack. A key
> feature here is being able to fill the VF's receive queue with guest
> memory instead of host kernel memory so that you can get zero-copy
> receive traffic. This will perform just as well as doing passthrough
> (at
> least) and avoid all that ugliness of dealing with SR-IOV in the guest.
>
> This eliminates all of the mess of various drivers in the guest and
> all the associated baggage of doing hardware passthrough.
>
> So IMHO, having VFs be usable in the host is absolutely critical
> because I think it's the only reasonable usage model.
As Eddie said, VMDq is better for this model, and the feature is already available today. It is much simpler because it was designed for such purposes. It does not require hardware pass-through (e.g. VT-d) or VFs as a PCI device, either.
>
> Regards,
>
> Anthony Liguori
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in the
> body of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
.
Jun Nakajima | Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <20081106044828.GA30417@kroah.com>
2008-11-06 15:40 ` H L
[not found] ` <909674.99469.qm@web45112.mail.sp1.yahoo.com>
@ 2008-11-07 5:18 ` Zhao, Yu
[not found] ` <4913CFBC.4040203@intel.com>
3 siblings, 0 replies; 54+ messages in thread
From: Zhao, Yu @ 2008-11-07 5:18 UTC (permalink / raw)
To: Greg KH
Cc: randy.dunlap@oracle.com, grundler@parisc-linux.org,
achiang@hp.com, matthew@wil.cx, linux-pci@vger.kernel.org,
rdreier@cisco.com, linux-kernel@vger.kernel.org,
jbarnes@virtuousgeek.org,
virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
mingo@elte.hu
Greg KH wrote:
> On Wed, Oct 22, 2008 at 04:38:09PM +0800, Yu Zhao wrote:
>> Greetings,
>>
>> Following patches are intended to support SR-IOV capability in the
>> Linux kernel. With these patches, people can turn a PCI device with
>> the capability into multiple ones from software perspective, which
>> will benefit KVM and achieve other purposes such as QoS, security,
>> and etc.
>
> Is there any actual users of this API around yet? How was it tested as
> there is no hardware to test on? Which drivers are going to have to be
> rewritten to take advantage of this new interface?
Yes, the API is used by Intel, HP, NextIO and some other anonymous
companies as they rise questions and send me feedback. I haven't seen
their works but I guess some of drivers using SR-IOV API are going to be
released soon.
My test was done with Intel 82576 Gigabit Ethernet Controller. The
product brief is at
http://download.intel.com/design/network/ProdBrf/320025.pdf and the spec
is available at
http://download.intel.com/design/network/datashts/82576_Datasheet_v2p1.pdf
Regards,
Yu
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <20081106182443.GB17782@kroah.com>
@ 2008-11-07 6:03 ` Zhao, Yu
0 siblings, 0 replies; 54+ messages in thread
From: Zhao, Yu @ 2008-11-07 6:03 UTC (permalink / raw)
To: Greg KH
Cc: randy.dunlap@oracle.com, grundler@parisc-linux.org,
achiang@hp.com, matthew@wil.cx, linux-pci@vger.kernel.org,
rdreier@cisco.com, linux-kernel@vger.kernel.org,
jbarnes@virtuousgeek.org,
virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
mingo@elte.hu
Greg KH wrote:
> On Thu, Nov 06, 2008 at 10:05:39AM -0800, H L wrote:
>> --- On Thu, 11/6/08, Greg KH <greg@kroah.com> wrote:
>>
>>> On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote:
>>>> I have not modified any existing drivers, but instead
>>> I threw together
>>>> a bare-bones module enabling me to make a call to
>>> pci_iov_register()
>>>> and then poke at an SR-IOV adapter's /sys entries
>>> for which no driver
>>>> was loaded.
>>>>
>>>> It appears from my perusal thus far that drivers using
>>> these new
>>>> SR-IOV patches will require modification; i.e. the
>>> driver associated
>>>> with the Physical Function (PF) will be required to
>>> make the
>>>> pci_iov_register() call along with the requisite
>>> notify() function.
>>>> Essentially this suggests to me a model for the PF
>>> driver to perform
>>>> any "global actions" or setup on behalf of
>>> VFs before enabling them
>>>> after which VF drivers could be associated.
>>> Where would the VF drivers have to be associated? On the
>>> "pci_dev"
>>> level or on a higher one?
>>
>> I have not yet fully grocked Yu Zhao's model to answer this. That
>> said, I would *hope* to find it on the "pci_dev" level.
>
> Me too.
VF is kind of lightweight PCI device, and it's represented by "struct
pci_dev". VF driver bounds to the "pci_dev" and works in the same way as
other drivers.
>
>>> Will all drivers that want to bind to a "VF"
>>> device need to be
>>> rewritten?
>> Not necessarily, or perhaps minimally; depends on hardware/firmware
>> and actions the driver wants to take. An example here might assist.
>> Let's just say someone has created, oh, I don't know, maybe an SR-IOV
>> NIC. Now, for 'general' I/O operations to pass network traffic back
>> and forth there would ideally be no difference in the actions and
>> therefore behavior of a PF driver and a VF driver. But, what do you
>> do in the instance a VF wants to change link-speed? As that physical
>> characteristic affects all VFs, how do you handle that? This is where
>> the hardware/firmware implementation part comes to play. If a VF
>> driver performs some actions to initiate the change in link speed, the
>> logic in the adapter could be anything like:
>
> <snip>
>
> Yes, I agree that all of this needs to be done, somehow.
>
> It's that "somehow" that I am interested in trying to see how it works
> out.
This is device specific part. VF driver is free to do what it wants to
do with device specific registers and resources, and wouldn't concern us
as far as it behaves as PCI device driver.
>
>>>> I have so far only seen Yu Zhao's
>>> "7-patch" set. I've not yet looked
>>>> at his subsequently tendered "15-patch" set
>>> so I don't know what has
>>>> changed. The hardware/firmware implementation for
>>> any given SR-IOV
>>>> compatible device, will determine the extent of
>>> differences required
>>>> between a PF driver and a VF driver.
>>> Yeah, that's what I'm worried/curious about.
>>> Without seeing the code
>>> for such a driver, how can we properly evaluate if this
>>> infrastructure
>>> is the correct one and the proper way to do all of this?
>>
>> As the example above demonstrates, that's a tough question to answer.
>> Ideally, in my view, there would only be one driver written per SR-IOV
>> device and it would contain the logic to "do the right things" based
>> on whether its running as a PF or VF with that determination easily
>> accomplished by testing the existence of the SR-IOV extended
>> capability. Then, in an effort to minimize (if not eliminate) the
>> complexities of driver-to-driver actions for fielding "global events",
>> contain as much of the logic as is possible within the adapter.
>> Minimizing the efforts required for the device driver writers in my
>> opinion paves the way to greater adoption of this technology.
>
> Yes, making things easier is the key here.
>
> Perhaps some of this could be hidden with a new bus type for these kinds
> of devices? Or a "virtual" bus of pci devices that the original SR-IOV
> device creates that corrispond to the individual virtual PCI devices?
> If that were the case, then it might be a lot easier in the end.
PCI SIG only defines SR-IOV at PCI level, we can't predict what the
hardware vendors would implement at device specific logic level.
An example of SR-IOV NIC: PF may not have network functionality, it only
controls VFs. Because people only want to use VFs in virtual machines,
they don't need network functionality in the environment (e.g.
hypervisor) where PF resides.
Thanks,
Yu
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <4913CFBC.4040203@intel.com>
@ 2008-11-07 6:07 ` Greg KH
0 siblings, 0 replies; 54+ messages in thread
From: Greg KH @ 2008-11-07 6:07 UTC (permalink / raw)
To: Zhao, Yu
Cc: randy.dunlap@oracle.com, grundler@parisc-linux.org,
achiang@hp.com, matthew@wil.cx, linux-pci@vger.kernel.org,
rdreier@cisco.com, linux-kernel@vger.kernel.org,
jbarnes@virtuousgeek.org,
virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
mingo@elte.hu
On Fri, Nov 07, 2008 at 01:18:52PM +0800, Zhao, Yu wrote:
> Greg KH wrote:
>> On Wed, Oct 22, 2008 at 04:38:09PM +0800, Yu Zhao wrote:
>>> Greetings,
>>>
>>> Following patches are intended to support SR-IOV capability in the
>>> Linux kernel. With these patches, people can turn a PCI device with
>>> the capability into multiple ones from software perspective, which
>>> will benefit KVM and achieve other purposes such as QoS, security,
>>> and etc.
>> Is there any actual users of this API around yet? How was it tested as
>> there is no hardware to test on? Which drivers are going to have to be
>> rewritten to take advantage of this new interface?
>
> Yes, the API is used by Intel, HP, NextIO and some other anonymous
> companies as they rise questions and send me feedback. I haven't seen their
> works but I guess some of drivers using SR-IOV API are going to be released
> soon.
Well, we can't merge infrastructure without seeing the users of that
infrastructure, right?
> My test was done with Intel 82576 Gigabit Ethernet Controller. The product
> brief is at http://download.intel.com/design/network/ProdBrf/320025.pdf and
> the spec is available at
> http://download.intel.com/design/network/datashts/82576_Datasheet_v2p1.pdf
Cool, do you have that driver we can see?
How does it interact and handle the kvm and xen issues that have been
posted?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <20081106235406.GB30790@sequoia.sous-sol.org>
@ 2008-11-07 6:10 ` Greg KH
2008-11-07 7:06 ` Zhao, Yu
[not found] ` <4913E8E8.5040103@intel.com>
2 siblings, 0 replies; 54+ messages in thread
From: Greg KH @ 2008-11-07 6:10 UTC (permalink / raw)
To: Chris Wright
Cc: randy.dunlap, grundler, achiang, Matthew Wilcox, linux-pci,
rdreier, linux-kernel, jbarnes, virtualization, kvm, mingo
On Thu, Nov 06, 2008 at 03:54:06PM -0800, Chris Wright wrote:
> * Greg KH (greg@kroah.com) wrote:
> > On Thu, Nov 06, 2008 at 10:47:41AM -0700, Matthew Wilcox wrote:
> > > On Thu, Nov 06, 2008 at 08:49:19AM -0800, Greg KH wrote:
> > > > On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote:
> > > > > I have not modified any existing drivers, but instead I threw together
> > > > > a bare-bones module enabling me to make a call to pci_iov_register()
> > > > > and then poke at an SR-IOV adapter's /sys entries for which no driver
> > > > > was loaded.
> > > > >
> > > > > It appears from my perusal thus far that drivers using these new
> > > > > SR-IOV patches will require modification; i.e. the driver associated
> > > > > with the Physical Function (PF) will be required to make the
> > > > > pci_iov_register() call along with the requisite notify() function.
> > > > > Essentially this suggests to me a model for the PF driver to perform
> > > > > any "global actions" or setup on behalf of VFs before enabling them
> > > > > after which VF drivers could be associated.
> > > >
> > > > Where would the VF drivers have to be associated? On the "pci_dev"
> > > > level or on a higher one?
> > > >
> > > > Will all drivers that want to bind to a "VF" device need to be
> > > > rewritten?
> > >
> > > The current model being implemented by my colleagues has separate
> > > drivers for the PF (aka native) and VF devices. I don't personally
> > > believe this is the correct path, but I'm reserving judgement until I
> > > see some code.
> >
> > Hm, I would like to see that code before we can properly evaluate this
> > interface. Especially as they are all tightly tied together.
> >
> > > I don't think we really know what the One True Usage model is for VF
> > > devices. Chris Wright has some ideas, I have some ideas and Yu Zhao has
> > > some ideas. I bet there's other people who have other ideas too.
> >
> > I'd love to hear those ideas.
>
> First there's the question of how to represent the VF on the host.
> Ideally (IMO) this would show up as a normal interface so that normal tools
> can configure the interface. This is not exactly how the first round of
> patches were designed.
>
> Second there's the question of reserving the BDF on the host such that
> we don't have two drivers (one in the host and one in a guest) trying to
> drive the same device (an issue that shows up for device assignment as
> well as VF assignment).
>
> Third there's the question of whether the VF can be used in the host at
> all.
>
> Fourth there's the question of whether the VF and PF drivers are the
> same or separate.
>
> The typical usecase is assigning the VF to the guest directly, so
> there's only enough functionality in the host side to allocate a VF,
> configure it, and assign it (and propagate AER). This is with separate
> PF and VF driver.
>
> As Anthony mentioned, we are interested in allowing the host to use the
> VF. This could be useful for containers as well as dedicating a VF (a
> set of device resources) to a guest w/out passing it through.
All of this looks great. So, with all of these questions, how does the
current code pertain to these issues? It seems like we have a long way
to go...
thanks,
greg k-h
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <49137255.9010104@codemonkey.ws>
@ 2008-11-07 6:17 ` Greg KH
2008-11-09 6:41 ` Muli Ben-Yehuda
` (2 subsequent siblings)
3 siblings, 0 replies; 54+ messages in thread
From: Greg KH @ 2008-11-07 6:17 UTC (permalink / raw)
To: Anthony Liguori
Cc: randy.dunlap, Chris Wright, grundler, achiang, Matthew Wilcox,
linux-pci, rdreier, linux-kernel, jbarnes, virtualization, kvm,
mingo
On Thu, Nov 06, 2008 at 04:40:21PM -0600, Anthony Liguori wrote:
> Greg KH wrote:
>> On Thu, Nov 06, 2008 at 10:47:41AM -0700, Matthew Wilcox wrote:
>>
>>> I don't think we really know what the One True Usage model is for VF
>>> devices. Chris Wright has some ideas, I have some ideas and Yu Zhao has
>>> some ideas. I bet there's other people who have other ideas too.
>>>
>>
>> I'd love to hear those ideas.
>>
>
> We've been talking about avoiding hardware passthrough entirely and
> just backing a virtio-net backend driver by a dedicated VF in the
> host. That avoids a huge amount of guest-facing complexity, let's
> migration Just Work, and should give the same level of performance.
Does that involve this patch set? Or a different type of interface.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <20081106225854.GA15439@parisc-linux.org>
@ 2008-11-07 6:19 ` Greg KH
[not found] ` <20081107061952.GF3860@kroah.com>
2008-11-09 12:47 ` Avi Kivity
2 siblings, 0 replies; 54+ messages in thread
From: Greg KH @ 2008-11-07 6:19 UTC (permalink / raw)
To: Matthew Wilcox
Cc: randy.dunlap@oracle.com, grundler@parisc-linux.org,
Chiang, Alexander, linux-pci@vger.kernel.org, rdreier@cisco.com,
linux-kernel@vger.kernel.org, jbarnes@virtuousgeek.org,
virtualization@lists.linux-foundation.org, Anthony Liguori,
kvm@vger.kernel.org, mingo@elte.hu
On Thu, Nov 06, 2008 at 03:58:54PM -0700, Matthew Wilcox wrote:
> > What we would rather do in KVM, is have the VFs appear in the host as
> > standard network devices. We would then like to back our existing PV
> > driver to this VF directly bypassing the host networking stack. A key
> > feature here is being able to fill the VF's receive queue with guest
> > memory instead of host kernel memory so that you can get zero-copy
> > receive traffic. This will perform just as well as doing passthrough
> > (at least) and avoid all that ugliness of dealing with SR-IOV in the guest.
>
> This argues for ignoring the SR-IOV mess completely. Just have the
> host driver expose multiple 'ethN' devices.
That would work, but do we want to do that for every different type of
driver?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <20081106235406.GB30790@sequoia.sous-sol.org>
2008-11-07 6:10 ` Greg KH
@ 2008-11-07 7:06 ` Zhao, Yu
[not found] ` <4913E8E8.5040103@intel.com>
2 siblings, 0 replies; 54+ messages in thread
From: Zhao, Yu @ 2008-11-07 7:06 UTC (permalink / raw)
To: Chris Wright
Cc: randy.dunlap@oracle.com, grundler@parisc-linux.org,
achiang@hp.com, Matthew Wilcox, Greg KH, rdreier@cisco.com,
linux-kernel@vger.kernel.org, jbarnes@virtuousgeek.org,
virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
linux-pci@vger.kernel.org, mingo@elte.hu
Chris Wright wrote:
> * Greg KH (greg@kroah.com) wrote:
>> On Thu, Nov 06, 2008 at 10:47:41AM -0700, Matthew Wilcox wrote:
>>> On Thu, Nov 06, 2008 at 08:49:19AM -0800, Greg KH wrote:
>>>> On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote:
>>>>> I have not modified any existing drivers, but instead I threw together
>>>>> a bare-bones module enabling me to make a call to pci_iov_register()
>>>>> and then poke at an SR-IOV adapter's /sys entries for which no driver
>>>>> was loaded.
>>>>>
>>>>> It appears from my perusal thus far that drivers using these new
>>>>> SR-IOV patches will require modification; i.e. the driver associated
>>>>> with the Physical Function (PF) will be required to make the
>>>>> pci_iov_register() call along with the requisite notify() function.
>>>>> Essentially this suggests to me a model for the PF driver to perform
>>>>> any "global actions" or setup on behalf of VFs before enabling them
>>>>> after which VF drivers could be associated.
>>>> Where would the VF drivers have to be associated? On the "pci_dev"
>>>> level or on a higher one?
>>>>
>>>> Will all drivers that want to bind to a "VF" device need to be
>>>> rewritten?
>>> The current model being implemented by my colleagues has separate
>>> drivers for the PF (aka native) and VF devices. I don't personally
>>> believe this is the correct path, but I'm reserving judgement until I
>>> see some code.
>> Hm, I would like to see that code before we can properly evaluate this
>> interface. Especially as they are all tightly tied together.
>>
>>> I don't think we really know what the One True Usage model is for VF
>>> devices. Chris Wright has some ideas, I have some ideas and Yu Zhao has
>>> some ideas. I bet there's other people who have other ideas too.
>> I'd love to hear those ideas.
>
> First there's the question of how to represent the VF on the host.
> Ideally (IMO) this would show up as a normal interface so that normal tools
> can configure the interface. This is not exactly how the first round of
> patches were designed.
Whether the VF can show up as a normal interface is decided by VF
driver. VF is represented by 'pci_dev' at PCI level, so VF driver can be
loaded as normal PCI device driver.
What the software representation (eth, framebuffer, etc.) created by VF
driver is not controlled by SR-IOV framework.
So you definitely can use normal tool to configure the VF if its driver
supports that :-)
>
> Second there's the question of reserving the BDF on the host such that
> we don't have two drivers (one in the host and one in a guest) trying to
> drive the same device (an issue that shows up for device assignment as
> well as VF assignment).
If we don't reserve BDF for the device, they can't work neither in the
host nor the guest.
Without BDF, we can't access the config space of the device, the device
also can't do DMA.
Did I miss your point?
>
> Third there's the question of whether the VF can be used in the host at
> all.
Why can't? My VFs work well in the host as normal PCI devices :-)
>
> Fourth there's the question of whether the VF and PF drivers are the
> same or separate.
As I mentioned in another email of this thread. We can't predict how
hardware vendor creates their SR-IOV device. PCI SIG doesn't define
device specific logics.
So I think the answer of this question is up to the device driver
developers. If PF and VF in a SR-IOV device have similar logics, then
they can combine the driver. Otherwise, e.g., if PF doesn't have real
functionality at all -- it only has registers to control internal
resource allocation for VFs, then the drivers should be separate, right?
>
> The typical usecase is assigning the VF to the guest directly, so
> there's only enough functionality in the host side to allocate a VF,
> configure it, and assign it (and propagate AER). This is with separate
> PF and VF driver.
>
> As Anthony mentioned, we are interested in allowing the host to use the
> VF. This could be useful for containers as well as dedicating a VF (a
> set of device resources) to a guest w/out passing it through.
I've considered the container cases, we don't have problem with running
VF driver in the host.
Thanks,
Yu
^ permalink raw reply [flat|nested] 54+ messages in thread
* RE: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <4913E8E8.5040103@intel.com>
@ 2008-11-07 7:29 ` Leonid Grossman
0 siblings, 0 replies; 54+ messages in thread
From: Leonid Grossman @ 2008-11-07 7:29 UTC (permalink / raw)
To: Zhao, Yu, Chris Wright
Cc: randy.dunlap, grundler, achiang, Matthew Wilcox, Greg KH, rdreier,
linux-kernel, jbarnes, virtualization, kvm, linux-pci, mingo
> -----Original Message-----
> From: virtualization-bounces@lists.linux-foundation.org
> [mailto:virtualization-bounces@lists.linux-foundation.org] On Behalf
Of
> Zhao, Yu
> Sent: Thursday, November 06, 2008 11:06 PM
> To: Chris Wright
> Cc: randy.dunlap@oracle.com; grundler@parisc-linux.org;
achiang@hp.com;
> Matthew Wilcox; Greg KH; rdreier@cisco.com;
linux-kernel@vger.kernel.org;
> jbarnes@virtuousgeek.org; virtualization@lists.linux-foundation.org;
> kvm@vger.kernel.org; linux-pci@vger.kernel.org; mingo@elte.hu
> Subject: Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
>
> Chris Wright wrote:
> > * Greg KH (greg@kroah.com) wrote:
> >> On Thu, Nov 06, 2008 at 10:47:41AM -0700, Matthew Wilcox wrote:
> >>> On Thu, Nov 06, 2008 at 08:49:19AM -0800, Greg KH wrote:
> >>>> On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote:
> >>>>> I have not modified any existing drivers, but instead I threw
> together
> >>>>> a bare-bones module enabling me to make a call to
pci_iov_register()
> >>>>> and then poke at an SR-IOV adapter's /sys entries for which no
> driver
> >>>>> was loaded.
> >>>>>
> >>>>> It appears from my perusal thus far that drivers using these new
> >>>>> SR-IOV patches will require modification; i.e. the driver
associated
> >>>>> with the Physical Function (PF) will be required to make the
> >>>>> pci_iov_register() call along with the requisite notify()
function.
> >>>>> Essentially this suggests to me a model for the PF driver to
perform
> >>>>> any "global actions" or setup on behalf of VFs before enabling
them
> >>>>> after which VF drivers could be associated.
> >>>> Where would the VF drivers have to be associated? On the
"pci_dev"
> >>>> level or on a higher one?
> >>>>
> >>>> Will all drivers that want to bind to a "VF" device need to be
> >>>> rewritten?
> >>> The current model being implemented by my colleagues has separate
> >>> drivers for the PF (aka native) and VF devices. I don't
personally
> >>> believe this is the correct path, but I'm reserving judgement
until I
> >>> see some code.
> >> Hm, I would like to see that code before we can properly evaluate
this
> >> interface. Especially as they are all tightly tied together.
> >>
> >>> I don't think we really know what the One True Usage model is for
VF
> >>> devices. Chris Wright has some ideas, I have some ideas and Yu
Zhao
> has
> >>> some ideas. I bet there's other people who have other ideas too.
> >> I'd love to hear those ideas.
> >
> > First there's the question of how to represent the VF on the host.
> > Ideally (IMO) this would show up as a normal interface so that
normal
> tools
> > can configure the interface. This is not exactly how the first
round of
> > patches were designed.
>
> Whether the VF can show up as a normal interface is decided by VF
> driver. VF is represented by 'pci_dev' at PCI level, so VF driver can
be
> loaded as normal PCI device driver.
>
> What the software representation (eth, framebuffer, etc.) created by
VF
> driver is not controlled by SR-IOV framework.
>
> So you definitely can use normal tool to configure the VF if its
driver
> supports that :-)
>
> >
> > Second there's the question of reserving the BDF on the host such
that
> > we don't have two drivers (one in the host and one in a guest)
trying to
> > drive the same device (an issue that shows up for device assignment
as
> > well as VF assignment).
>
> If we don't reserve BDF for the device, they can't work neither in the
> host nor the guest.
>
> Without BDF, we can't access the config space of the device, the
device
> also can't do DMA.
>
> Did I miss your point?
>
> >
> > Third there's the question of whether the VF can be used in the host
at
> > all.
>
> Why can't? My VFs work well in the host as normal PCI devices :-)
>
> >
> > Fourth there's the question of whether the VF and PF drivers are the
> > same or separate.
>
> As I mentioned in another email of this thread. We can't predict how
> hardware vendor creates their SR-IOV device. PCI SIG doesn't define
> device specific logics.
>
> So I think the answer of this question is up to the device driver
> developers. If PF and VF in a SR-IOV device have similar logics, then
> they can combine the driver. Otherwise, e.g., if PF doesn't have real
> functionality at all -- it only has registers to control internal
> resource allocation for VFs, then the drivers should be separate,
right?
Right, this really depends upon the functionality behind a VF. If VF is
done as a subset of netdev interface (for example, a queue pair), then a
split VF/PF driver model and a proprietary communication channel is in
order.
If each VF is done as a complete netdev interface (like in our 10GbE IOV
controllers), then PF and VF drivers could be the same. Each VF can be
independently driven by such "native" netdev driver; this includes the
ability to run a native driver in a guest in passthru mode.
A PF driver in a privileged domain doesn't even have to be present.
>
> >
> > The typical usecase is assigning the VF to the guest directly, so
> > there's only enough functionality in the host side to allocate a VF,
> > configure it, and assign it (and propagate AER). This is with
separate
> > PF and VF driver.
> >
> > As Anthony mentioned, we are interested in allowing the host to use
the
> > VF. This could be useful for containers as well as dedicating a VF
(a
> > set of device resources) to a guest w/out passing it through.
>
> I've considered the container cases, we don't have problem with
running
> VF driver in the host.
>
> Thanks,
> Yu
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <20081107061700.GD3860@kroah.com>
@ 2008-11-07 7:47 ` Zhao, Yu
2008-11-09 12:58 ` Avi Kivity
[not found] ` <4913F2AA.90405@intel.com>
2 siblings, 0 replies; 54+ messages in thread
From: Zhao, Yu @ 2008-11-07 7:47 UTC (permalink / raw)
To: Greg KH, Anthony Liguori, Leonid.Grossman
Cc: randy.dunlap@oracle.com, Chris Wright, grundler@parisc-linux.org,
achiang@hp.com, Matthew Wilcox, linux-pci@vger.kernel.org,
rdreier@cisco.com, linux-kernel@vger.kernel.org,
jbarnes@virtuousgeek.org,
virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
mingo@elte.hu
Greg KH wrote:
> On Thu, Nov 06, 2008 at 04:40:21PM -0600, Anthony Liguori wrote:
>> Greg KH wrote:
>>> On Thu, Nov 06, 2008 at 10:47:41AM -0700, Matthew Wilcox wrote:
>>>
>>>> I don't think we really know what the One True Usage model is for VF
>>>> devices. Chris Wright has some ideas, I have some ideas and Yu Zhao has
>>>> some ideas. I bet there's other people who have other ideas too.
>>>>
>>> I'd love to hear those ideas.
>>>
>> We've been talking about avoiding hardware passthrough entirely and
>> just backing a virtio-net backend driver by a dedicated VF in the
>> host. That avoids a huge amount of guest-facing complexity, let's
>> migration Just Work, and should give the same level of performance.
This can be commonly used not only with VF -- devices that have multiple
DMA queues (e.g., Intel VMDq, Neterion Xframe) and even traditional
devices can also take the advantage of this.
CC Rusty Russel in case he has more comments.
>
> Does that involve this patch set? Or a different type of interface.
I think that is a different type of interface. We need to hook the DMA
interface in the device driver to virtio-net backend so the hardware
(normal device, VF, VMDq, etc.) can DMA data to/from the virtio-net backend.
Regards,
Yu
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <20081107061952.GF3860@kroah.com>
@ 2008-11-07 15:17 ` Yu Zhao
[not found] ` <49145C14.1050409@uniscape.net>
1 sibling, 0 replies; 54+ messages in thread
From: Yu Zhao @ 2008-11-07 15:17 UTC (permalink / raw)
To: Greg KH, Matthew Wilcox
Cc: randy.dunlap@oracle.com, grundler@parisc-linux.org,
Chiang, Alexander, linux-pci@vger.kernel.org, rdreier@cisco.com,
Leonid.Grossman, linux-kernel@vger.kernel.org,
jbarnes@virtuousgeek.org,
virtualization@lists.linux-foundation.org, eddie.dong,
keir.fraser, Anthony Liguori, kvm@vger.kernel.org, mingo@elte.hu,
avi
While we are arguing what the software model the SR-IOV should be, let
me ask two simple questions first:
1, What does the SR-IOV looks like?
2, Why do we need to support it?
I'm sure people have different understandings from their own view
points. No one is wrong, but, please don't make thing complicated and
don't ignore user requirements.
PCI SIG and hardware vendors create such thing intending to make
hardware resource in one PCI device be shared from different software
instances -- I guess all of us agree with this. No doubt PF is real
function in the PCI device, but VF is different? No, it also has its own
Bus, Device and Function numbers, and PCI configuration space and Memory
Space (MMIO). To be more detailed, it can response to and initiate PCI
Transaction Layer Protocol packets, which means it can do everything a
PF can in PCI level. From these obvious behaviors, we can conclude PCI
SIG model VF as a normal PCI device function, even it's not standalone.
As you know the Linux kernel is the base of various virtual machine
monitors such as KVM, Xen, OpenVZ and VServer. We need SR-IOV support in
the kernel because mostly it helps high-end users (IT departments, HPC,
etc.) to share limited hardware resources among hundreds or even
thousands virtual machines and hence reduce the cost. How can we make
these virtual machine monitors utilize the advantage of SR-IOV without
spending too much effort meanwhile remaining architectural correctness?
I believe making VF represent as much closer as a normal PCI device
(struct pci_dev) is the best way in current situation, because this is
not only what the hardware designers expect us to do but also the usage
model that KVM, Xen and other VMMs have already supported.
I agree that API in the SR-IOV pacth is arguable and the concerns such
as lack of PF driver, etc. are also valid. But I personally think these
stuff are not essential problems to me and other SR-IOV driver
developers. People can refine things but don't want to recreate things
in another totally different way especially that way doesn't bring them
obvious benefits.
As I can see that we are now reaching a point that a decision must be
made, I know this is such difficult thing in an open and free community
but fortunately we have a lot of talented and experienced people here.
So let's make it happen, and keep our loyal users happy!
Thanks,
Yu
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <491371F0.7020805@codemonkey.ws>
` (3 preceding siblings ...)
[not found] ` <20081106225854.GA15439@parisc-linux.org>
@ 2008-11-07 15:21 ` Andi Kleen
2008-11-12 22:41 ` Anthony Liguori
[not found] ` <491B5B97.2000407@codemonkey.ws>
2008-11-07 16:01 ` Yu Zhao
[not found] ` <87d4h7pnnm.fsf@basil.nowhere.org>
6 siblings, 2 replies; 54+ messages in thread
From: Andi Kleen @ 2008-11-07 15:21 UTC (permalink / raw)
To: Anthony Liguori
Cc: randy.dunlap@oracle.com, grundler@parisc-linux.org,
Chiang, Alexander, Matthew Wilcox, Greg KH, rdreier@cisco.com,
linux-kernel@vger.kernel.org, jbarnes@virtuousgeek.org,
virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
linux-pci@vger.kernel.org, mingo@elte.hu
Anthony Liguori <anthony@codemonkey.ws> writes:
>
> What we would rather do in KVM, is have the VFs appear in the host as
> standard network devices. We would then like to back our existing PV
> driver to this VF directly bypassing the host networking stack. A key
> feature here is being able to fill the VF's receive queue with guest
> memory instead of host kernel memory so that you can get zero-copy
> receive traffic. This will perform just as well as doing passthrough
> (at least) and avoid all that ugliness of dealing with SR-IOV in the
> guest.
But you shift a lot of ugliness into the host network stack again.
Not sure that is a good trade off.
Also it would always require context switches and I believe one
of the reasons for the PV/VF model is very low latency IO and having
heavyweight switches to the host and back would be against that.
-Andi
--
ak@linux.intel.com
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <491371F0.7020805@codemonkey.ws>
` (4 preceding siblings ...)
2008-11-07 15:21 ` Andi Kleen
@ 2008-11-07 16:01 ` Yu Zhao
[not found] ` <87d4h7pnnm.fsf@basil.nowhere.org>
6 siblings, 0 replies; 54+ messages in thread
From: Yu Zhao @ 2008-11-07 16:01 UTC (permalink / raw)
To: Anthony Liguori
Cc: randy.dunlap@oracle.com, grundler@parisc-linux.org,
Chiang, Alexander, Matthew Wilcox, Greg KH, rdreier@cisco.com,
linux-kernel@vger.kernel.org, jbarnes@virtuousgeek.org,
virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
linux-pci@vger.kernel.org, mingo@elte.hu
Anthony Liguori wrote:
> Matthew Wilcox wrote:
>> [Anna, can you fix your word-wrapping please? Your lines appear to be
>> infinitely long which is most unpleasant to reply to]
>>
>> On Thu, Nov 06, 2008 at 05:38:16PM +0000, Fischer, Anna wrote:
>>
>>>> Where would the VF drivers have to be associated? On the "pci_dev"
>>>> level or on a higher one?
>>>>
>>> A VF appears to the Linux OS as a standard (full, additional) PCI
>>> device. The driver is associated in the same way as for a normal PCI
>>> device. Ideally, you would use SR-IOV devices on a virtualized system,
>>> for example, using Xen. A VF can then be assigned to a guest domain as
>>> a full PCI device.
>>>
>>
>> It's not clear thats the right solution. If the VF devices are _only_
>> going to be used by the guest, then arguably, we don't want to create
>> pci_devs for them in the host. (I think it _is_ the right answer, but I
>> want to make it clear there's multiple opinions on this).
>>
>
> The VFs shouldn't be limited to being used by the guest.
Yes, VF driver running in the host is supported :-)
>
> SR-IOV is actually an incredibly painful thing. You need to have a VF
> driver in the guest, do hardware pass through, have a PV driver stub in
> the guest that's hypervisor specific (a VF is not usable on it's own),
> have a device specific backend in the VMM, and if you want to do live
> migration, have another PV driver in the guest that you can do teaming
> with. Just a mess.
Actually not so mess. VF driver can be a plain PCI device driver and
doesn't require any backend in the VMM, or hypervisor specific
knowledge, if the hardware is properly designed. In this case PF driver
controls hardware resource allocation for VFs and VF driver can work
without any communication to PF driver or VMM.
>
> What we would rather do in KVM, is have the VFs appear in the host as
> standard network devices. We would then like to back our existing PV
> driver to this VF directly bypassing the host networking stack. A key
> feature here is being able to fill the VF's receive queue with guest
> memory instead of host kernel memory so that you can get zero-copy
> receive traffic. This will perform just as well as doing passthrough
> (at least) and avoid all that ugliness of dealing with SR-IOV in the guest.
If the hardware supports both SR-IOV and IOMMU, I wouldn't suggest
people to do so, because they will get better performance by directly
assigning VF to the guest.
However, lots of low-end machines don't have SR-IOV and IOMMU support.
They may have multi queue NIC, which uses built-in L2 switch to dispense
packets to different DMA queue according to MAC address. They definitely
can benefit a lot if there is software support for the DMA queue hooking
virtio-net backend as you suggested.
>
> This eliminates all of the mess of various drivers in the guest and all
> the associated baggage of doing hardware passthrough.
>
> So IMHO, having VFs be usable in the host is absolutely critical because
> I think it's the only reasonable usage model.
Please don't worry, we have take this usage model as well as container
model into account when designing SR-IOV framework for the kernel.
>
> Regards,
>
> Anthony Liguori
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <49145C14.1050409@uniscape.net>
@ 2008-11-07 18:48 ` Greg KH
[not found] ` <20081107184825.GB2320@kroah.com>
1 sibling, 0 replies; 54+ messages in thread
From: Greg KH @ 2008-11-07 18:48 UTC (permalink / raw)
To: Yu Zhao
Cc: randy.dunlap@oracle.com, grundler@parisc-linux.org,
Chiang, Alexander, Matthew Wilcox, linux-pci@vger.kernel.org,
rdreier@cisco.com, Leonid.Grossman, linux-kernel@vger.kernel.org,
jbarnes@virtuousgeek.org,
virtualization@lists.linux-foundation.org, eddie.dong,
keir.fraser, Anthony Liguori, kvm@vger.kernel.org, mingo@elte.hu,
avi
On Fri, Nov 07, 2008 at 11:17:40PM +0800, Yu Zhao wrote:
> While we are arguing what the software model the SR-IOV should be, let me
> ask two simple questions first:
>
> 1, What does the SR-IOV looks like?
> 2, Why do we need to support it?
I don't think we need to worry about those questions, as we can see what
the SR-IOV interface looks like by looking at the PCI spec, and we know
Linux needs to support it, as Linux needs to support everything :)
(note, community members that can not see the PCI specs at this point in
time, please know that we are working on resolving these issues,
hopefully we will have some good news within a month or so.)
> As you know the Linux kernel is the base of various virtual machine
> monitors such as KVM, Xen, OpenVZ and VServer. We need SR-IOV support in
> the kernel because mostly it helps high-end users (IT departments, HPC,
> etc.) to share limited hardware resources among hundreds or even thousands
> virtual machines and hence reduce the cost. How can we make these virtual
> machine monitors utilize the advantage of SR-IOV without spending too much
> effort meanwhile remaining architectural correctness? I believe making VF
> represent as much closer as a normal PCI device (struct pci_dev) is the
> best way in current situation, because this is not only what the hardware
> designers expect us to do but also the usage model that KVM, Xen and other
> VMMs have already supported.
But would such an api really take advantage of the new IOV interfaces
that are exposed by the new device type?
> I agree that API in the SR-IOV pacth is arguable and the concerns such as
> lack of PF driver, etc. are also valid. But I personally think these stuff
> are not essential problems to me and other SR-IOV driver developers.
How can the lack of a PF driver not be a valid concern at this point in
time? Without such a driver written, how can we know that the SR-IOV
interface as created is sufficient, or that it even works properly?
Here's what I see we need to have before we can evaluate if the IOV core
PCI patches are acceptable:
- a driver that uses this interface
- a PF driver that uses this interface.
Without those, we can't determine if the infrastructure provided by the
IOV core even is sufficient, right?
Rumor has it that there is both of the above things floating around, can
someone please post them to the linux-pci list so that we can see how
this all works together?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 54+ messages in thread
* RE: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <20081107184825.GB2320@kroah.com>
@ 2008-11-08 11:09 ` Fischer, Anna
[not found] ` <0199E0D51A61344794750DC57738F58E5E26FF3237@GVW1118EXC.americas.hpqcorp.net>
2008-11-13 7:49 ` Yu Zhao
2 siblings, 0 replies; 54+ messages in thread
From: Fischer, Anna @ 2008-11-08 11:09 UTC (permalink / raw)
To: Greg KH, Yu Zhao
Cc: randy.dunlap@oracle.com, grundler@parisc-linux.org,
Chiang, Alexander, Matthew Wilcox, linux-pci@vger.kernel.org,
rdreier@cisco.com, Leonid.Grossman@neterion.com,
linux-kernel@vger.kernel.org, jbarnes@virtuousgeek.org,
virtualization@lists.linux-foundation.org, eddie.dong@intel.com,
keir.fraser@eu.citrix.com, Anthony Liguori, kvm@vger.kernel.org,
mingo@elte.hu
> Subject: Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
> Importance: High
>
> On Fri, Nov 07, 2008 at 11:17:40PM +0800, Yu Zhao wrote:
> > While we are arguing what the software model the SR-IOV should be,
> let me
> > ask two simple questions first:
> >
> > 1, What does the SR-IOV looks like?
> > 2, Why do we need to support it?
>
> I don't think we need to worry about those questions, as we can see
> what
> the SR-IOV interface looks like by looking at the PCI spec, and we know
> Linux needs to support it, as Linux needs to support everything :)
>
> (note, community members that can not see the PCI specs at this point
> in
> time, please know that we are working on resolving these issues,
> hopefully we will have some good news within a month or so.)
>
> > As you know the Linux kernel is the base of various virtual machine
> > monitors such as KVM, Xen, OpenVZ and VServer. We need SR-IOV support
> in
> > the kernel because mostly it helps high-end users (IT departments,
> HPC,
> > etc.) to share limited hardware resources among hundreds or even
> thousands
> > virtual machines and hence reduce the cost. How can we make these
> virtual
> > machine monitors utilize the advantage of SR-IOV without spending too
> much
> > effort meanwhile remaining architectural correctness? I believe
> making VF
> > represent as much closer as a normal PCI device (struct pci_dev) is
> the
> > best way in current situation, because this is not only what the
> hardware
> > designers expect us to do but also the usage model that KVM, Xen and
> other
> > VMMs have already supported.
>
> But would such an api really take advantage of the new IOV interfaces
> that are exposed by the new device type?
I agree with what Yu says. The idea is to have hardware capabilities to
virtualize a PCI device in a way that those virtual devices can represent
full PCI devices. The advantage of that is that those virtual device can
then be used like any other standard PCI device, meaning we can use existing
OS tools, configuration mechanism etc. to start working with them. Also, when
using a virtualization-based system, e.g. Xen or KVM, we do not need
to introduce new mechanisms to make use of SR-IOV, because we can handle
VFs as full PCI devices.
A virtual PCI device in hardware (a VF) can be as powerful or complex as
you like, or it can be very simple. But the big advantage of SR-IOV is
that hardware presents a complete PCI device to the OS - as opposed to
some resources, or queues, that need specific new configuration and
assignment mechanisms in order to use them with a guest OS (like, for
example, VMDq or similar technologies).
Anna
^ permalink raw reply [flat|nested] 54+ messages in thread
* RE: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <0199E0D51A61344794750DC57738F58E5E26FF3237@GVW1118EXC.americas.hpqcorp.net>
@ 2008-11-08 15:37 ` Leonid Grossman
0 siblings, 0 replies; 54+ messages in thread
From: Leonid Grossman @ 2008-11-08 15:37 UTC (permalink / raw)
To: Fischer, Anna, Greg KH, Yu Zhao
Cc: randy.dunlap, grundler, Chiang, Alexander, Matthew Wilcox,
linux-pci, rdreier, eddie.dong, linux-kernel, jbarnes,
virtualization, keir.fraser, Anthony Liguori, kvm, mingo, avi
> -----Original Message-----
> From: Fischer, Anna [mailto:anna.fischer@hp.com]
> Sent: Saturday, November 08, 2008 3:10 AM
> To: Greg KH; Yu Zhao
> Cc: Matthew Wilcox; Anthony Liguori; H L; randy.dunlap@oracle.com;
> grundler@parisc-linux.org; Chiang, Alexander;
linux-pci@vger.kernel.org;
> rdreier@cisco.com; linux-kernel@vger.kernel.org;
jbarnes@virtuousgeek.org;
> virtualization@lists.linux-foundation.org; kvm@vger.kernel.org;
> mingo@elte.hu; keir.fraser@eu.citrix.com; Leonid Grossman;
> eddie.dong@intel.com; jun.nakajima@intel.com; avi@redhat.com
> Subject: RE: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
>
> > But would such an api really take advantage of the new IOV
interfaces
> > that are exposed by the new device type?
>
> I agree with what Yu says. The idea is to have hardware capabilities
to
> virtualize a PCI device in a way that those virtual devices can
represent
> full PCI devices. The advantage of that is that those virtual device
can
> then be used like any other standard PCI device, meaning we can use
> existing
> OS tools, configuration mechanism etc. to start working with them.
Also,
> when
> using a virtualization-based system, e.g. Xen or KVM, we do not need
> to introduce new mechanisms to make use of SR-IOV, because we can
handle
> VFs as full PCI devices.
>
> A virtual PCI device in hardware (a VF) can be as powerful or complex
as
> you like, or it can be very simple. But the big advantage of SR-IOV is
> that hardware presents a complete PCI device to the OS - as opposed to
> some resources, or queues, that need specific new configuration and
> assignment mechanisms in order to use them with a guest OS (like, for
> example, VMDq or similar technologies).
>
> Anna
Ditto.
Taking netdev interface as an example - a queue pair is a great way to
scale across cpu cores in a single OS image, but it is just not a good
way to share device across multiple OS images.
The best unit of virtualization is a VF that is implemented as a
complete netdev pci device (not a subset of a pci device).
This way, native netdev device drivers can work for direct hw access to
a VF "as is", and most/all Linux networking features (including VMQ)
will work in a guest.
Also, guest migration for netdev interfaces (both direct and virtual)
can be supported via native Linux mechanism (bonding driver), while Dom0
can retain "veto power" over any guest direct interface operation it
deems privileged (vlan, mac address, promisc mode, bandwidth allocation
between VFs, etc.).
Leonid
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <49137255.9010104@codemonkey.ws>
2008-11-07 6:17 ` Greg KH
@ 2008-11-09 6:41 ` Muli Ben-Yehuda
[not found] ` <20081107061700.GD3860@kroah.com>
[not found] ` <20081109064147.GD7123@il.ibm.com>
3 siblings, 0 replies; 54+ messages in thread
From: Muli Ben-Yehuda @ 2008-11-09 6:41 UTC (permalink / raw)
To: Anthony Liguori
Cc: randy.dunlap, Chris Wright, grundler, achiang, Matthew Wilcox,
Greg KH, rdreier, linux-kernel, jbarnes, virtualization, kvm,
linux-pci, mingo
On Thu, Nov 06, 2008 at 04:40:21PM -0600, Anthony Liguori wrote:
> We've been talking about avoiding hardware passthrough entirely and
> just backing a virtio-net backend driver by a dedicated VF in the
> host. That avoids a huge amount of guest-facing complexity, let's
> migration Just Work, and should give the same level of performance.
I don't believe that it will, and every benchmark I've seen or have
done so far shows a significant performance gap between virtio and
direct assignment, even on 1G ethernet. I am willing however to
reserve judgement until someone implements your suggestion and
actually measures it, preferably on 10G ethernet.
No doubt device assignment---and SR-IOV in particular---are complex,
but I hardly think ignoring it as you seem to propose is the right
approach.
Cheers,
Muli
--
The First Workshop on I/O Virtualization (WIOV '08)
Dec 2008, San Diego, CA, http://www.usenix.org/wiov08/
<->
SYSTOR 2009---The Israeli Experimental Systems Conference
http://www.haifa.il.ibm.com/conferences/systor2009/
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <20081106180354.GA17429@kroah.com>
2008-11-06 20:04 ` Fischer, Anna
@ 2008-11-09 12:44 ` Avi Kivity
[not found] ` <4916DB16.2040709@redhat.com>
2 siblings, 0 replies; 54+ messages in thread
From: Avi Kivity @ 2008-11-09 12:44 UTC (permalink / raw)
To: Greg KH
Cc: randy.dunlap@oracle.com, grundler@parisc-linux.org,
Chiang, Alexander, matthew@wil.cx, linux-pci@vger.kernel.org,
rdreier@cisco.com, linux-kernel@vger.kernel.org,
jbarnes@virtuousgeek.org,
virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
mingo@elte.hu
Greg KH wrote:
> It's that "second" part that I'm worried about. How is that going to
> happen? Do you have any patches that show this kind of "assignment"?
>
>
For kvm, this is in 2.6.28-rc.
Note there are two ways to assign a device to a guest:
- run the VF driver in the guest: this has the advantage of best
performance, but requires pinning all guest memory, makes live migration
a tricky proposition, and ties the guest to the underlying hardware.
- run the VF driver in the host, and use virtio to connect the guest to
the host: allows paging the guest and allows straightforward live
migration, but reduces performance, and hides any features not exposed
by virtio from the guest.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <20081106225854.GA15439@parisc-linux.org>
2008-11-07 6:19 ` Greg KH
[not found] ` <20081107061952.GF3860@kroah.com>
@ 2008-11-09 12:47 ` Avi Kivity
2 siblings, 0 replies; 54+ messages in thread
From: Avi Kivity @ 2008-11-09 12:47 UTC (permalink / raw)
To: Matthew Wilcox
Cc: randy.dunlap@oracle.com, kvm@vger.kernel.org,
grundler@parisc-linux.org, Chiang, Alexander, Greg KH,
rdreier@cisco.com, linux-kernel@vger.kernel.org,
jbarnes@virtuousgeek.org,
virtualization@lists.linux-foundation.org, Anthony Liguori,
linux-pci@vger.kernel.org, mingo@elte.hu
Matthew Wilcox wrote:
>> What we would rather do in KVM, is have the VFs appear in the host as
>> standard network devices. We would then like to back our existing PV
>> driver to this VF directly bypassing the host networking stack. A key
>> feature here is being able to fill the VF's receive queue with guest
>> memory instead of host kernel memory so that you can get zero-copy
>> receive traffic. This will perform just as well as doing passthrough
>> (at least) and avoid all that ugliness of dealing with SR-IOV in the guest.
>>
>
> This argues for ignoring the SR-IOV mess completely.
It does, but VF-in-host is not the only model that we want to support.
It's just the most appealing.
There will definitely be people who want to run VF-in-guest.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <87d4h7pnnm.fsf@basil.nowhere.org>
@ 2008-11-09 12:53 ` Avi Kivity
0 siblings, 0 replies; 54+ messages in thread
From: Avi Kivity @ 2008-11-09 12:53 UTC (permalink / raw)
To: Andi Kleen
Cc: randy.dunlap@oracle.com, kvm@vger.kernel.org,
grundler@parisc-linux.org, Chiang, Alexander, Matthew Wilcox,
Greg KH, rdreier@cisco.com, linux-kernel@vger.kernel.org,
jbarnes@virtuousgeek.org,
virtualization@lists.linux-foundation.org, Anthony Liguori,
linux-pci@vger.kernel.org, mingo@elte.hu
Andi Kleen wrote:
> Anthony Liguori <anthony@codemonkey.ws> writes:
>
>> What we would rather do in KVM, is have the VFs appear in the host as
>> standard network devices. We would then like to back our existing PV
>> driver to this VF directly bypassing the host networking stack. A key
>> feature here is being able to fill the VF's receive queue with guest
>> memory instead of host kernel memory so that you can get zero-copy
>> receive traffic. This will perform just as well as doing passthrough
>> (at least) and avoid all that ugliness of dealing with SR-IOV in the
>> guest.
>>
>
> But you shift a lot of ugliness into the host network stack again.
> Not sure that is a good trade off.
>
The net effect will be positive. We will finally have aio networking
from userspace (can send process memory without resorting to
sendfile()), and we'll be able to assign a queue to a process (which
will enable all sorts of interesting high performance things; basically
VJ channels without kernel involvement).
> Also it would always require context switches and I believe one
> of the reasons for the PV/VF model is very low latency IO and having
> heavyweight switches to the host and back would be against that.
>
It's true that latency would suffer (or alternatively cpu consumption
would increase).
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <20081107061700.GD3860@kroah.com>
2008-11-07 7:47 ` Zhao, Yu
@ 2008-11-09 12:58 ` Avi Kivity
[not found] ` <4913F2AA.90405@intel.com>
2 siblings, 0 replies; 54+ messages in thread
From: Avi Kivity @ 2008-11-09 12:58 UTC (permalink / raw)
To: Greg KH
Cc: randy.dunlap, Chris Wright, grundler, achiang, Matthew Wilcox,
linux-pci, rdreier, linux-kernel, jbarnes, virtualization,
Anthony Liguori, kvm, mingo
Greg KH wrote:
>> We've been talking about avoiding hardware passthrough entirely and
>> just backing a virtio-net backend driver by a dedicated VF in the
>> host. That avoids a huge amount of guest-facing complexity, let's
>> migration Just Work, and should give the same level of performance.
>>
>
> Does that involve this patch set? Or a different type of interface.
>
So long as the VF is exposed as a standalone PCI device, it's the same
interface. In fact you can take a random PCI card and expose it to a
guest this way; it doesn't have to be SR-IOV. Of course, with a
standard PCI card you won't get much sharing (a quad port NIC will be
good for four guests).
We'll need other changes in the network stack, but these are orthogonal
to SR-IOV.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <20081109064147.GD7123@il.ibm.com>
@ 2008-11-09 13:03 ` Avi Kivity
0 siblings, 0 replies; 54+ messages in thread
From: Avi Kivity @ 2008-11-09 13:03 UTC (permalink / raw)
To: Muli Ben-Yehuda
Cc: randy.dunlap, kvm, grundler, achiang, Matthew Wilcox, Greg KH,
rdreier, linux-kernel, jbarnes, virtualization, Chris Wright,
Anthony Liguori, linux-pci, mingo
Muli Ben-Yehuda wrote:
>> We've been talking about avoiding hardware passthrough entirely and
>> just backing a virtio-net backend driver by a dedicated VF in the
>> host. That avoids a huge amount of guest-facing complexity, let's
>> migration Just Work, and should give the same level of performance.
>>
>
> I don't believe that it will, and every benchmark I've seen or have
> done so far shows a significant performance gap between virtio and
> direct assignment, even on 1G ethernet. I am willing however to
> reserve judgement until someone implements your suggestion and
> actually measures it, preferably on 10G ethernet.
>
Right now virtio copies data, and has other inefficiencies. With a
dedicated VF, we can eliminate the copies.
CPU utilization and latency will be worse. If we can limit the
slowdowns to an acceptable amount, the simplicity and other advantages
of VF-in-host may outweigh the performance degradation.
> No doubt device assignment---and SR-IOV in particular---are complex,
> but I hardly think ignoring it as you seem to propose is the right
> approach.
I agree. We should hedge our bets and support both models.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <4916DB16.2040709@redhat.com>
@ 2008-11-09 19:25 ` Greg KH
[not found] ` <20081109192505.GA3091@kroah.com>
1 sibling, 0 replies; 54+ messages in thread
From: Greg KH @ 2008-11-09 19:25 UTC (permalink / raw)
To: Avi Kivity
Cc: randy.dunlap@oracle.com, grundler@parisc-linux.org,
Chiang, Alexander, matthew@wil.cx, linux-pci@vger.kernel.org,
rdreier@cisco.com, linux-kernel@vger.kernel.org,
jbarnes@virtuousgeek.org,
virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
mingo@elte.hu
On Sun, Nov 09, 2008 at 02:44:06PM +0200, Avi Kivity wrote:
> Greg KH wrote:
>> It's that "second" part that I'm worried about. How is that going to
>> happen? Do you have any patches that show this kind of "assignment"?
>>
>>
>
> For kvm, this is in 2.6.28-rc.
Where? I just looked and couldn't find anything, but odds are I was
looking in the wrong place :(
> Note there are two ways to assign a device to a guest:
>
> - run the VF driver in the guest: this has the advantage of best
> performance, but requires pinning all guest memory, makes live migration a
> tricky proposition, and ties the guest to the underlying hardware.
Is this what you would prefer for kvm?
> - run the VF driver in the host, and use virtio to connect the guest to the
> host: allows paging the guest and allows straightforward live migration,
> but reduces performance, and hides any features not exposed by virtio from
> the guest.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <20081109192505.GA3091@kroah.com>
@ 2008-11-09 19:37 ` Avi Kivity
2008-11-11 6:08 ` Greg KH
[not found] ` <20081111060845.GB13025@kroah.com>
0 siblings, 2 replies; 54+ messages in thread
From: Avi Kivity @ 2008-11-09 19:37 UTC (permalink / raw)
To: Greg KH
Cc: randy.dunlap@oracle.com, grundler@parisc-linux.org,
Chiang, Alexander, matthew@wil.cx, linux-pci@vger.kernel.org,
rdreier@cisco.com, linux-kernel@vger.kernel.org,
jbarnes@virtuousgeek.org,
virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
mingo@elte.hu
Greg KH wrote:
> On Sun, Nov 09, 2008 at 02:44:06PM +0200, Avi Kivity wrote:
>
>> Greg KH wrote:
>>
>>> It's that "second" part that I'm worried about. How is that going to
>>> happen? Do you have any patches that show this kind of "assignment"?
>>>
>>>
>>>
>> For kvm, this is in 2.6.28-rc.
>>
>
> Where? I just looked and couldn't find anything, but odds are I was
> looking in the wrong place :(
>
>
arch/x86/kvm/vtd.c: iommu integration (allows assigning the device's
memory resources)
virt/kvm/irq*: interrupt redirection (allows assigning the device's
interrupt resources)
the rest (pci config space, pio redirection) are in userspace.
>> Note there are two ways to assign a device to a guest:
>>
>> - run the VF driver in the guest: this has the advantage of best
>> performance, but requires pinning all guest memory, makes live migration a
>> tricky proposition, and ties the guest to the underlying hardware.
>>
>
> Is this what you would prefer for kvm?
>
>
It's not my personal preference, but it is a supported configuration.
For some use cases it is the only one that makes sense.
Again, VF-in-guest and VF-in-host both have their places. And since
Linux can be both guest and host, it's best if the VF driver knows
nothing about SR-IOV; it's just a pci driver. The PF driver should
emulate anything that SR-IOV does not provide (like missing pci config
space).
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <4913F2AA.90405@intel.com>
@ 2008-11-11 0:18 ` Rusty Russell
2008-11-17 12:01 ` Yu Zhao
0 siblings, 1 reply; 54+ messages in thread
From: Rusty Russell @ 2008-11-11 0:18 UTC (permalink / raw)
To: Zhao, Yu
Cc: randy.dunlap@oracle.com, kvm@vger.kernel.org,
grundler@parisc-linux.org, achiang@hp.com, Matthew Wilcox,
Greg KH, rdreier@cisco.com, Leonid.Grossman,
linux-kernel@vger.kernel.org, jbarnes@virtuousgeek.org,
virtualization@lists.linux-foundation.org, Chris Wright,
Anthony Liguori, linux-pci@vger.kernel.org, mingo@elte.hu
[-- Attachment #1.1: Type: text/plain, Size: 1295 bytes --]
On Friday 07 November 2008 18:17:54 Zhao, Yu wrote:
> Greg KH wrote:
> > On Thu, Nov 06, 2008 at 04:40:21PM -0600, Anthony Liguori wrote:
> >> Greg KH wrote:
> >>> On Thu, Nov 06, 2008 at 10:47:41AM -0700, Matthew Wilcox wrote:
> >>>> I don't think we really know what the One True Usage model is for VF
> >>>> devices. Chris Wright has some ideas, I have some ideas and Yu Zhao
> >>>> has some ideas. I bet there's other people who have other ideas too.
> >>>
> >>> I'd love to hear those ideas.
> >>
> >> We've been talking about avoiding hardware passthrough entirely and
> >> just backing a virtio-net backend driver by a dedicated VF in the
> >> host. That avoids a huge amount of guest-facing complexity, let's
> >> migration Just Work, and should give the same level of performance.
>
> This can be commonly used not only with VF -- devices that have multiple
> DMA queues (e.g., Intel VMDq, Neterion Xframe) and even traditional
> devices can also take the advantage of this.
>
> CC Rusty Russel in case he has more comments.
Yes, even dumb devices could use this mechanism if you wanted to bind an
entire device solely to one guest.
We don't have network infrastructure for this today, but my thought was to do
something in dev_alloc_skb and dev_kfree_skb et al.
Cheers,
Rusty.
[-- Attachment #1.2: Type: text/html, Size: 5805 bytes --]
[-- Attachment #2: Type: text/plain, Size: 184 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
2008-11-09 19:37 ` Avi Kivity
@ 2008-11-11 6:08 ` Greg KH
[not found] ` <20081111060845.GB13025@kroah.com>
1 sibling, 0 replies; 54+ messages in thread
From: Greg KH @ 2008-11-11 6:08 UTC (permalink / raw)
To: Avi Kivity
Cc: randy.dunlap@oracle.com, grundler@parisc-linux.org,
Chiang, Alexander, matthew@wil.cx, linux-pci@vger.kernel.org,
rdreier@cisco.com, linux-kernel@vger.kernel.org,
jbarnes@virtuousgeek.org,
virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
mingo@elte.hu
On Sun, Nov 09, 2008 at 09:37:20PM +0200, Avi Kivity wrote:
> Greg KH wrote:
>> On Sun, Nov 09, 2008 at 02:44:06PM +0200, Avi Kivity wrote:
>>
>>> Greg KH wrote:
>>>
>>>> It's that "second" part that I'm worried about. How is that going to
>>>> happen? Do you have any patches that show this kind of "assignment"?
>>>>
>>>>
>>> For kvm, this is in 2.6.28-rc.
>>>
>>
>> Where? I just looked and couldn't find anything, but odds are I was
>> looking in the wrong place :(
>>
>>
>
> arch/x86/kvm/vtd.c: iommu integration (allows assigning the device's memory
> resources)
That file is not in 2.6.28-rc4 :(
> virt/kvm/irq*: interrupt redirection (allows assigning the device's
> interrupt resources)
I only see virt/kvm/irq_comm.c in 2.6.28-rc4.
> the rest (pci config space, pio redirection) are in userspace.
So you don't need these pci core changes at all?
>>> Note there are two ways to assign a device to a guest:
>>>
>>> - run the VF driver in the guest: this has the advantage of best
>>> performance, but requires pinning all guest memory, makes live migration
>>> a tricky proposition, and ties the guest to the underlying hardware.
>>
>> Is this what you would prefer for kvm?
>>
>
> It's not my personal preference, but it is a supported configuration. For
> some use cases it is the only one that makes sense.
>
> Again, VF-in-guest and VF-in-host both have their places. And since Linux
> can be both guest and host, it's best if the VF driver knows nothing about
> SR-IOV; it's just a pci driver. The PF driver should emulate anything that
> SR-IOV does not provide (like missing pci config space).
Yes, we need both.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <20081111060845.GB13025@kroah.com>
@ 2008-11-11 9:00 ` Avi Kivity
0 siblings, 0 replies; 54+ messages in thread
From: Avi Kivity @ 2008-11-11 9:00 UTC (permalink / raw)
To: Greg KH
Cc: randy.dunlap@oracle.com, grundler@parisc-linux.org,
Chiang, Alexander, matthew@wil.cx, linux-pci@vger.kernel.org,
rdreier@cisco.com, linux-kernel@vger.kernel.org,
jbarnes@virtuousgeek.org,
virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
mingo@elte.hu
Greg KH wrote:
>> arch/x86/kvm/vtd.c: iommu integration (allows assigning the device's memory
>> resources)
>>
>
> That file is not in 2.6.28-rc4 :(
>
>
Sorry, was moved to virt/kvm/ for ia64's benefit.
>
>> virt/kvm/irq*: interrupt redirection (allows assigning the device's
>> interrupt resources)
>>
>
> I only see virt/kvm/irq_comm.c in 2.6.28-rc4.
>
>
kvm_main.c in that directory also has some related bits.
>> the rest (pci config space, pio redirection) are in userspace.
>>
>
> So you don't need these pci core changes at all?
>
>
Not beyond those required for SR-IOV and iommu support.
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
2008-11-07 15:21 ` Andi Kleen
@ 2008-11-12 22:41 ` Anthony Liguori
[not found] ` <491B5B97.2000407@codemonkey.ws>
1 sibling, 0 replies; 54+ messages in thread
From: Anthony Liguori @ 2008-11-12 22:41 UTC (permalink / raw)
To: Andi Kleen
Cc: randy.dunlap@oracle.com, grundler@parisc-linux.org,
Chiang, Alexander, Matthew Wilcox, Greg KH, rdreier@cisco.com,
linux-kernel@vger.kernel.org, jbarnes@virtuousgeek.org,
virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
linux-pci@vger.kernel.org, mingo@elte.hu
Andi Kleen wrote:
> Anthony Liguori <anthony@codemonkey.ws> writes:
>> What we would rather do in KVM, is have the VFs appear in the host as
>> standard network devices. We would then like to back our existing PV
>> driver to this VF directly bypassing the host networking stack. A key
>> feature here is being able to fill the VF's receive queue with guest
>> memory instead of host kernel memory so that you can get zero-copy
>> receive traffic. This will perform just as well as doing passthrough
>> (at least) and avoid all that ugliness of dealing with SR-IOV in the
>> guest.
>
> But you shift a lot of ugliness into the host network stack again.
> Not sure that is a good trade off.
>
> Also it would always require context switches and I believe one
> of the reasons for the PV/VF model is very low latency IO and having
> heavyweight switches to the host and back would be against that.
I don't think it's established that PV/VF will have less latency than
using virtio-net. virtio-net requires a world switch to send a group of
packets. The cost of this (if it stays in kernel) is only a few
thousand cycles on the most modern processors.
Using VT-d means that for every DMA fetch that misses in the IOTLB, you
potentially have to do four memory fetches to main memory. There will
be additional packet latency using VT-d compared to native, it's just
not known how much at this time.
Regards,
Anthony Liguori
> -Andi
>
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <20081107184825.GB2320@kroah.com>
2008-11-08 11:09 ` Fischer, Anna
[not found] ` <0199E0D51A61344794750DC57738F58E5E26FF3237@GVW1118EXC.americas.hpqcorp.net>
@ 2008-11-13 7:49 ` Yu Zhao
2 siblings, 0 replies; 54+ messages in thread
From: Yu Zhao @ 2008-11-13 7:49 UTC (permalink / raw)
To: Greg KH
Cc: randy.dunlap@oracle.com, grundler@parisc-linux.org,
Chiang, Alexander, Matthew Wilcox, linux-pci@vger.kernel.org,
rdreier@cisco.com, Leonid.Grossman@neterion.com, Yu Zhao,
jbarnes@virtuousgeek.org, linux-kernel@vger.kernel.org,
Dong, Eddie, keir.fraser@eu.citrix.com, Anthony Liguori,
kvm@vger.kernel.org, mingo@elte.hu,
virtualization@lists.linux-foundation.org
On Sat, Nov 08, 2008 at 02:48:25AM +0800, Greg KH wrote:
> On Fri, Nov 07, 2008 at 11:17:40PM +0800, Yu Zhao wrote:
> > While we are arguing what the software model the SR-IOV should be, let me
> > ask two simple questions first:
> >
> > 1, What does the SR-IOV looks like?
> > 2, Why do we need to support it?
>
> I don't think we need to worry about those questions, as we can see what
> the SR-IOV interface looks like by looking at the PCI spec, and we know
> Linux needs to support it, as Linux needs to support everything :)
>
> (note, community members that can not see the PCI specs at this point in
> time, please know that we are working on resolving these issues,
> hopefully we will have some good news within a month or so.)
Thanks for doing this!
>
> > As you know the Linux kernel is the base of various virtual machine
> > monitors such as KVM, Xen, OpenVZ and VServer. We need SR-IOV support in
> > the kernel because mostly it helps high-end users (IT departments, HPC,
> > etc.) to share limited hardware resources among hundreds or even thousands
> > virtual machines and hence reduce the cost. How can we make these virtual
> > machine monitors utilize the advantage of SR-IOV without spending too much
> > effort meanwhile remaining architectural correctness? I believe making VF
> > represent as much closer as a normal PCI device (struct pci_dev) is the
> > best way in current situation, because this is not only what the hardware
> > designers expect us to do but also the usage model that KVM, Xen and other
> > VMMs have already supported.
>
> But would such an api really take advantage of the new IOV interfaces
> that are exposed by the new device type?
The SR-IOV is a very straightforward capability -- it can only reside in
the Physical Function's (the real device) config space and controls the
allocation of the Virtual Function by several registers. What we can do
in the PCI layer is to make the SR-IOV device spawn VF upon user request,
and register VF to the PCI core. The functionality of SR-IOV device (both
the PF and VF) can vary at a large range and their drivers (same as normal
PCI device driver) are responsible for handling device specific stuff.
So it looks like we can get all work done in the PCI layer with only two
interfaces: one for the PF driver to register itself as a SR-IOV capable
driver, expose the sysfs (or ioctl) interface to receive user request, and
allocate 'pci_dev' for VF; another one to cleanup all stuff when the PF
driver unregisters itself (e.g., the driver is removed or the device is
going to power-saving mode.).
>
> > I agree that API in the SR-IOV pacth is arguable and the concerns such as
> > lack of PF driver, etc. are also valid. But I personally think these stuff
> > are not essential problems to me and other SR-IOV driver developers.
>
> How can the lack of a PF driver not be a valid concern at this point in
> time? Without such a driver written, how can we know that the SR-IOV
> interface as created is sufficient, or that it even works properly?
>
> Here's what I see we need to have before we can evaluate if the IOV core
> PCI patches are acceptable:
> - a driver that uses this interface
> - a PF driver that uses this interface.
>
> Without those, we can't determine if the infrastructure provided by the
> IOV core even is sufficient, right?
Yes, using a PF driver to evaluate the SR-IOV core is necessary. And only
the PF driver can use the interface since the VF shouldn't have the SR-IOV
capability in its config space according to the spec.
Regards,
Yu
> Rumor has it that there is both of the above things floating around, can
> someone please post them to the linux-pci list so that we can see how
> this all works together?
>
> thanks,
>
> greg k-h
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <491B5B97.2000407@codemonkey.ws>
@ 2008-11-16 16:04 ` Avi Kivity
[not found] ` <492044A7.3080107@redhat.com>
1 sibling, 0 replies; 54+ messages in thread
From: Avi Kivity @ 2008-11-16 16:04 UTC (permalink / raw)
To: Anthony Liguori
Cc: randy.dunlap@oracle.com, grundler@parisc-linux.org,
Chiang, Alexander, Matthew Wilcox, Greg KH, rdreier@cisco.com,
linux-kernel@vger.kernel.org, jbarnes@virtuousgeek.org,
virtualization@lists.linux-foundation.org, Andi Kleen,
kvm@vger.kernel.org, linux-pci@vger.kernel.org, mingo@elte.hu
Anthony Liguori wrote:
> I don't think it's established that PV/VF will have less latency than
> using virtio-net. virtio-net requires a world switch to send a group
> of packets. The cost of this (if it stays in kernel) is only a few
> thousand cycles on the most modern processors.
>
> Using VT-d means that for every DMA fetch that misses in the IOTLB,
> you potentially have to do four memory fetches to main memory. There
> will be additional packet latency using VT-d compared to native, it's
> just not known how much at this time.
If the IOTLB has intermediate TLB entries like the processor, we're
talking just one or two fetches. That's a lot less than the cacheline
bouncing that virtio and kvm interrupt injection incur right now.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
[not found] ` <492044A7.3080107@redhat.com>
@ 2008-11-17 1:46 ` Zhao, Yu
0 siblings, 0 replies; 54+ messages in thread
From: Zhao, Yu @ 2008-11-17 1:46 UTC (permalink / raw)
To: Avi Kivity
Cc: randy.dunlap@oracle.com, grundler@parisc-linux.org,
Chiang, Alexander, Matthew Wilcox, Greg KH, rdreier@cisco.com,
linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
jbarnes@virtuousgeek.org,
virtualization@lists.linux-foundation.org, Andi Kleen,
Anthony Liguori, kvm@vger.kernel.org, mingo@elte.hu
Avi Kivity wrote:
> Anthony Liguori wrote:
>> I don't think it's established that PV/VF will have less latency than
>> using virtio-net. virtio-net requires a world switch to send a group
>> of packets. The cost of this (if it stays in kernel) is only a few
>> thousand cycles on the most modern processors.
>>
>> Using VT-d means that for every DMA fetch that misses in the IOTLB,
>> you potentially have to do four memory fetches to main memory. There
>> will be additional packet latency using VT-d compared to native, it's
>> just not known how much at this time.
>
> If the IOTLB has intermediate TLB entries like the processor, we're
> talking just one or two fetches. That's a lot less than the cacheline
> bouncing that virtio and kvm interrupt injection incur right now.
>
The PCI SIG Address Translation Service (ATS) specifies a way that uses
an Address Translation Cache (ATC) in the Endpoint to reduce the latency.
The Linux kernel support for ATS capability will come soon.
Thanks,
Yu
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support
2008-11-11 0:18 ` Rusty Russell
@ 2008-11-17 12:01 ` Yu Zhao
0 siblings, 0 replies; 54+ messages in thread
From: Yu Zhao @ 2008-11-17 12:01 UTC (permalink / raw)
To: Rusty Russell
Cc: randy.dunlap@oracle.com, kvm@vger.kernel.org,
grundler@parisc-linux.org, achiang@hp.com, Matthew Wilcox,
Greg KH, rdreier@cisco.com, Leonid.Grossman@neterion.com,
linux-kernel@vger.kernel.org, jbarnes@virtuousgeek.org,
virtualization@lists.linux-foundation.org, Chris Wright,
Anthony Liguori, linux-pci@vger.kernel.org, mingo@elte.hu
Rusty Russell wrote:
> On Friday 07 November 2008 18:17:54 Zhao, Yu wrote:
> > Greg KH wrote:
> > > On Thu, Nov 06, 2008 at 04:40:21PM -0600, Anthony Liguori wrote:
> > >> Greg KH wrote:
> > >>> On Thu, Nov 06, 2008 at 10:47:41AM -0700, Matthew Wilcox wrote:
> > >>>> I don't think we really know what the One True Usage model is for VF
> > >>>> devices. Chris Wright has some ideas, I have some ideas and Yu Zhao
> > >>>> has some ideas. I bet there's other people who have other ideas too.
> > >>>
> > >>> I'd love to hear those ideas.
> > >>
> > >> We've been talking about avoiding hardware passthrough entirely and
> > >> just backing a virtio-net backend driver by a dedicated VF in the
> > >> host. That avoids a huge amount of guest-facing complexity, let's
> > >> migration Just Work, and should give the same level of performance.
> >
> > This can be commonly used not only with VF -- devices that have multiple
> > DMA queues (e.g., Intel VMDq, Neterion Xframe) and even traditional
> > devices can also take the advantage of this.
> >
> > CC Rusty Russel in case he has more comments.
>
> Yes, even dumb devices could use this mechanism if you wanted to bind an
> entire device solely to one guest.
>
> We don't have network infrastructure for this today, but my thought was
> to do something in dev_alloc_skb and dev_kfree_skb et al.
Is there any discussion about this on the netdev? Any prototype
available? If not, I'd like to create one and evaluate the performance
of virtio-net solution again the hardware passthrough.
Thanks,
Yu
^ permalink raw reply [flat|nested] 54+ messages in thread
end of thread, other threads:[~2008-11-17 12:01 UTC | newest]
Thread overview: 54+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20081106154351.GA30459@kroah.com>
2008-11-06 16:41 ` [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support H L
[not found] ` <894107.30288.qm@web45108.mail.sp1.yahoo.com>
2008-11-06 16:49 ` Greg KH
[not found] ` <20081106164919.GA4099@kroah.com>
2008-11-06 17:38 ` Fischer, Anna
2008-11-06 17:47 ` Matthew Wilcox
[not found] ` <20081106174741.GC11773@parisc-linux.org>
2008-11-06 17:53 ` Greg KH
[not found] ` <20081106175308.GA17027@kroah.com>
2008-11-06 22:24 ` Simon Horman
2008-11-06 22:40 ` Anthony Liguori
2008-11-06 23:54 ` Chris Wright
[not found] ` <49137255.9010104@codemonkey.ws>
2008-11-07 6:17 ` Greg KH
2008-11-09 6:41 ` Muli Ben-Yehuda
[not found] ` <20081107061700.GD3860@kroah.com>
2008-11-07 7:47 ` Zhao, Yu
2008-11-09 12:58 ` Avi Kivity
[not found] ` <4913F2AA.90405@intel.com>
2008-11-11 0:18 ` Rusty Russell
2008-11-17 12:01 ` Yu Zhao
[not found] ` <20081109064147.GD7123@il.ibm.com>
2008-11-09 13:03 ` Avi Kivity
[not found] ` <20081106235406.GB30790@sequoia.sous-sol.org>
2008-11-07 6:10 ` Greg KH
2008-11-07 7:06 ` Zhao, Yu
[not found] ` <4913E8E8.5040103@intel.com>
2008-11-07 7:29 ` Leonid Grossman
[not found] ` <0199E0D51A61344794750DC57738F58E5E26F996C4@GVW1118EXC.americas.hpqcorp.net>
2008-11-06 18:03 ` Greg KH
2008-11-06 18:36 ` Matthew Wilcox
[not found] ` <20081106180354.GA17429@kroah.com>
2008-11-06 20:04 ` Fischer, Anna
2008-11-09 12:44 ` Avi Kivity
[not found] ` <4916DB16.2040709@redhat.com>
2008-11-09 19:25 ` Greg KH
[not found] ` <20081109192505.GA3091@kroah.com>
2008-11-09 19:37 ` Avi Kivity
2008-11-11 6:08 ` Greg KH
[not found] ` <20081111060845.GB13025@kroah.com>
2008-11-11 9:00 ` Avi Kivity
[not found] ` <20081106183630.GD11773@parisc-linux.org>
2008-11-06 22:38 ` Anthony Liguori
[not found] ` <491371F0.7020805@codemonkey.ws>
2008-11-06 22:58 ` Matthew Wilcox
2008-11-07 1:52 ` Dong, Eddie
2008-11-07 2:08 ` Nakajima, Jun
[not found] ` <20081106225854.GA15439@parisc-linux.org>
2008-11-07 6:19 ` Greg KH
[not found] ` <20081107061952.GF3860@kroah.com>
2008-11-07 15:17 ` Yu Zhao
[not found] ` <49145C14.1050409@uniscape.net>
2008-11-07 18:48 ` Greg KH
[not found] ` <20081107184825.GB2320@kroah.com>
2008-11-08 11:09 ` Fischer, Anna
[not found] ` <0199E0D51A61344794750DC57738F58E5E26FF3237@GVW1118EXC.americas.hpqcorp.net>
2008-11-08 15:37 ` Leonid Grossman
2008-11-13 7:49 ` Yu Zhao
2008-11-09 12:47 ` Avi Kivity
2008-11-07 15:21 ` Andi Kleen
2008-11-12 22:41 ` Anthony Liguori
[not found] ` <491B5B97.2000407@codemonkey.ws>
2008-11-16 16:04 ` Avi Kivity
[not found] ` <492044A7.3080107@redhat.com>
2008-11-17 1:46 ` Zhao, Yu
2008-11-07 16:01 ` Yu Zhao
[not found] ` <87d4h7pnnm.fsf@basil.nowhere.org>
2008-11-09 12:53 ` Avi Kivity
2008-11-06 18:05 ` H L
[not found] ` <392264.50990.qm@web45103.mail.sp1.yahoo.com>
2008-11-06 18:24 ` Greg KH
[not found] ` <20081106182443.GB17782@kroah.com>
2008-11-07 6:03 ` Zhao, Yu
2008-11-06 16:51 ` git repository for SR-IOV development? H L
[not found] ` <1374.36291.qm@web45108.mail.sp1.yahoo.com>
2008-11-06 16:59 ` Greg KH
[not found] <20081022083809.GA3757@yzhao12-linux.sh.intel.com>
2008-11-06 4:48 ` [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support Greg KH
[not found] ` <20081106044828.GA30417@kroah.com>
2008-11-06 15:40 ` H L
[not found] ` <909674.99469.qm@web45112.mail.sp1.yahoo.com>
2008-11-06 15:43 ` Greg KH
2008-11-07 5:18 ` Zhao, Yu
[not found] ` <4913CFBC.4040203@intel.com>
2008-11-07 6:07 ` Greg KH
2008-10-22 8:38 Yu Zhao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).