* [PATCH 0/3] Enable SRIOV with KVM
@ 2009-03-04 7:54 Sheng Yang
2009-03-04 7:54 ` [PATCH 1/3] kvm: fix irq 0 assignment Sheng Yang
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Sheng Yang @ 2009-03-04 7:54 UTC (permalink / raw)
To: Avi Kivity, Marcelo Tosatti; +Cc: kvm
The first patch have been sent already. The other two deal with some minor
problem of SRIOV's virtual function. With these three userspace patches, SRIOV
can be enabled with KVM(of course, with MSI-X and my previous userspace
patches.)
Thanks.
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH 1/3] kvm: fix irq 0 assignment 2009-03-04 7:54 [PATCH 0/3] Enable SRIOV with KVM Sheng Yang @ 2009-03-04 7:54 ` Sheng Yang 2009-03-04 9:53 ` Chris Wedgwood 2009-03-04 19:30 ` Marcelo Tosatti 2009-03-04 7:54 ` [PATCH 2/3] KVM: Fill config with correct VID/DID Sheng Yang 2009-03-04 7:54 ` [PATCH 3/3] kvm: emulate command register for SRIOV virtual function Sheng Yang 2 siblings, 2 replies; 10+ messages in thread From: Sheng Yang @ 2009-03-04 7:54 UTC (permalink / raw) To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Sheng Yang Shouldn't update assigned irq if host irq is 0, which means uninitialized or don't support INTx. Signed-off-by: Sheng Yang <sheng@linux.intel.com> --- qemu/hw/device-assignment.c | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c index 32fba2a..7f9c789 100644 --- a/qemu/hw/device-assignment.c +++ b/qemu/hw/device-assignment.c @@ -574,6 +574,10 @@ static int assign_irq(AssignedDevInfo *adev) AssignedDevice *dev = adev->assigned_dev; int irq, r = 0; + /* irq 0 means uninitialized */ + if (dev->real_device.irq == 0) + return 0; + irq = pci_map_irq(&dev->dev, dev->intpin); irq = piix_get_irq(irq); -- 1.5.4.5 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] kvm: fix irq 0 assignment 2009-03-04 7:54 ` [PATCH 1/3] kvm: fix irq 0 assignment Sheng Yang @ 2009-03-04 9:53 ` Chris Wedgwood 2009-03-04 9:58 ` Sheng Yang 2009-03-04 19:30 ` Marcelo Tosatti 1 sibling, 1 reply; 10+ messages in thread From: Chris Wedgwood @ 2009-03-04 9:53 UTC (permalink / raw) To: Sheng Yang; +Cc: Avi Kivity, Marcelo Tosatti, kvm On Wed, Mar 04, 2009 at 03:54:27PM +0800, Sheng Yang wrote: > Shouldn't update assigned irq if host irq is 0, which means > uninitialized or don't support INTx. Is that generally true? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] kvm: fix irq 0 assignment 2009-03-04 9:53 ` Chris Wedgwood @ 2009-03-04 9:58 ` Sheng Yang 2009-03-04 10:02 ` Sheng Yang 0 siblings, 1 reply; 10+ messages in thread From: Sheng Yang @ 2009-03-04 9:58 UTC (permalink / raw) To: Chris Wedgwood; +Cc: Avi Kivity, Marcelo Tosatti, kvm On Wednesday 04 March 2009 17:53:52 Chris Wedgwood wrote: > On Wed, Mar 04, 2009 at 03:54:27PM +0800, Sheng Yang wrote: > > Shouldn't update assigned irq if host irq is 0, which means > > uninitialized or don't support INTx. > > Is that generally true? Host irq 0 is reserved for PIT timer, at least for x86. -- regards Yang, Sheng ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] kvm: fix irq 0 assignment 2009-03-04 9:58 ` Sheng Yang @ 2009-03-04 10:02 ` Sheng Yang 0 siblings, 0 replies; 10+ messages in thread From: Sheng Yang @ 2009-03-04 10:02 UTC (permalink / raw) To: Chris Wedgwood; +Cc: Avi Kivity, Marcelo Tosatti, kvm On Wednesday 04 March 2009 17:58:31 Sheng Yang wrote: > On Wednesday 04 March 2009 17:53:52 Chris Wedgwood wrote: > > On Wed, Mar 04, 2009 at 03:54:27PM +0800, Sheng Yang wrote: > > > Shouldn't update assigned irq if host irq is 0, which means > > > uninitialized or don't support INTx. > > > > Is that generally true? > > Host irq 0 is reserved for PIT timer, at least for x86. Or in some other condition it would be used for pci/pcie device? And please notice that currently we only support assign with x86 and IA64. (I really feel there is some reason for you ask. :) ) -- regards Yang, Sheng ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] kvm: fix irq 0 assignment 2009-03-04 7:54 ` [PATCH 1/3] kvm: fix irq 0 assignment Sheng Yang 2009-03-04 9:53 ` Chris Wedgwood @ 2009-03-04 19:30 ` Marcelo Tosatti 2009-03-04 19:41 ` Chris Wright 1 sibling, 1 reply; 10+ messages in thread From: Marcelo Tosatti @ 2009-03-04 19:30 UTC (permalink / raw) To: Sheng Yang; +Cc: Avi Kivity, kvm, Chris Wright On Wed, Mar 04, 2009 at 03:54:27PM +0800, Sheng Yang wrote: > Shouldn't update assigned irq if host irq is 0, which means uninitialized > or don't support INTx. > > Signed-off-by: Sheng Yang <sheng@linux.intel.com> > --- > qemu/hw/device-assignment.c | 4 ++++ > 1 files changed, 4 insertions(+), 0 deletions(-) > > diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c > index 32fba2a..7f9c789 100644 > --- a/qemu/hw/device-assignment.c > +++ b/qemu/hw/device-assignment.c > @@ -574,6 +574,10 @@ static int assign_irq(AssignedDevInfo *adev) > AssignedDevice *dev = adev->assigned_dev; > int irq, r = 0; > > + /* irq 0 means uninitialized */ > + if (dev->real_device.irq == 0) > + return 0; > + > irq = pci_map_irq(&dev->dev, dev->intpin); > irq = piix_get_irq(irq); Sheng, Please do not special case irq 0. The fact that only x86/IA64 are supported at the moment does not mean other architectures can't use it. Also, the kernel code to handle dev/irq assignment is quite messy. It should be only a mechanism, with userspace controlling the policy. For example: if (match->irq_requested_type & KVM_ASSIGNED_DEV_MSIX) current_flags |= KVM_DEV_IRQ_ASSIGN_ENABLE_MSIX; else if ((match->irq_requested_type & KVM_ASSIGNED_DEV_HOST_MSI) && (match->irq_requested_type & KVM_ASSIGNED_DEV_GUEST_MSI)) current_flags |= KVM_DEV_IRQ_ASSIGN_ENABLE_MSI; And then in userspace you have + if (*ctrl_word & PCI_MSIX_ENABLE) { + assigned_irq_data.flags = KVM_DEV_IRQ_ASSIGN_ENABLE_MSIX; + if (assigned_dev_update_msix_mmio(pci_dev) < 0) { + perror("assigned_dev_update_msix_mmio"); + } + } We really need to fix this before merging anything else in this area. So ideally the kernel only provides ioctl commands that do one thing at a time: - assign host device - unassign guest device - assign host irq (INTx or MSI) - assign dev irq (INTx or MSI) - unassign host irq - unassign dev irq So you don't have to make policy decisions like } else { /* * Guest require to disable device MSI, we disable MSI * and * re-enable INTx by default again. Notice it's only for * non-msi2intx. */ assigned_device_update_intx(kvm, adev, airq); return 0; } Because you do in userspace. However, I do not think it is necessary to create new ioctl commands and deprecate the old ones (it seems possible to do this using the flags passed in "struct kvm_assigned_irq") However, the current userspace code probably relies on implicit behaviour by the kernel part. IMHO it is fair to break older userspace. Better change it sooner than later. For example, don't do this if ((changed_flags & KVM_DEV_IRQ_ASSIGN_MSI_ACTION) || (msi2intx && match->dev->msi_enabled)) { But just enable MSI in either host or guest (one at a time). Return an error if you can't. And don't make such assumptions: } else if (assigned_irq->host_irq == 0 && match->dev->irq == 0) /* Host device IRQ 0 means don't support INTx */ To use the current ioctl commands, maybe enforce the "one thing at a time" nature and fail otherwise. Can you please investigate about this possibility? Because talk is cheap and I'm not sure whether you can always separate host/guest IRQ assignment, but probably as long as done carefully. Am I missing any reason why policy can't live outside the kernel? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] kvm: fix irq 0 assignment 2009-03-04 19:30 ` Marcelo Tosatti @ 2009-03-04 19:41 ` Chris Wright 2009-03-05 2:47 ` Sheng Yang 0 siblings, 1 reply; 10+ messages in thread From: Chris Wright @ 2009-03-04 19:41 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Sheng Yang, Avi Kivity, kvm, Chris Wright * Marcelo Tosatti (mtosatti@redhat.com) wrote: > Please do not special case irq 0. The fact that only x86/IA64 are > supported at the moment does not mean other architectures can't > use it. Seems logical to use a flag instead of overloading the irq value. > Also, the kernel code to handle dev/irq assignment is quite messy. It > should be only a mechanism, with userspace controlling the policy. I really agree here. I think some work to simplify/clarify would be time well-spent! ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] kvm: fix irq 0 assignment 2009-03-04 19:41 ` Chris Wright @ 2009-03-05 2:47 ` Sheng Yang 0 siblings, 0 replies; 10+ messages in thread From: Sheng Yang @ 2009-03-05 2:47 UTC (permalink / raw) To: Chris Wright; +Cc: Marcelo Tosatti, Avi Kivity, kvm On Thursday 05 March 2009 03:41:41 Chris Wright wrote: > * Marcelo Tosatti (mtosatti@redhat.com) wrote: > > Please do not special case irq 0. The fact that only x86/IA64 are > > supported at the moment does not mean other architectures can't > > use it. > > Seems logical to use a flag instead of overloading the irq value. > > > Also, the kernel code to handle dev/irq assignment is quite messy. It > > should be only a mechanism, with userspace controlling the policy. > > I really agree here. I think some work to simplify/clarify would be > time well-spent! So do I... After discussion with Marcelo, I would work with him on this clean up. :) -- regards Yang, Sheng ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/3] KVM: Fill config with correct VID/DID 2009-03-04 7:54 [PATCH 0/3] Enable SRIOV with KVM Sheng Yang 2009-03-04 7:54 ` [PATCH 1/3] kvm: fix irq 0 assignment Sheng Yang @ 2009-03-04 7:54 ` Sheng Yang 2009-03-04 7:54 ` [PATCH 3/3] kvm: emulate command register for SRIOV virtual function Sheng Yang 2 siblings, 0 replies; 10+ messages in thread From: Sheng Yang @ 2009-03-04 7:54 UTC (permalink / raw) To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Sheng Yang SRIOV's virtual function didn't show correct Vendor ID/Device ID in config, so we have to fill them manually according to device/vendor file in sysfs. Signed-off-by: Sheng Yang <sheng@linux.intel.com> --- qemu/hw/device-assignment.c | 31 ++++++++++++++++++++++++++++++- 1 files changed, 30 insertions(+), 1 deletions(-) diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c index 7f9c789..867229d 100644 --- a/qemu/hw/device-assignment.c +++ b/qemu/hw/device-assignment.c @@ -317,7 +317,8 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice *d, uint32_t address, ssize_t ret; AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev); - if ((address >= 0x10 && address <= 0x24) || address == 0x34 || + if (address < 0x4 || + (address >= 0x10 && address <= 0x24) || address == 0x34 || address == 0x3c || address == 0x3d || pci_access_cap_config(d, address, len)) { val = pci_default_read_config(d, address, len); @@ -429,6 +430,7 @@ static int get_real_device(AssignedDevice *pci_dev, uint8_t r_bus, int fd, r = 0; FILE *f; unsigned long long start, end, size, flags; + unsigned long id; PCIRegion *rp; PCIDevRegions *dev = &pci_dev->real_device; @@ -488,6 +490,33 @@ again: DEBUG("region %d size %d start 0x%llx type %d resource_fd %d\n", r, rp->size, start, rp->type, rp->resource_fd); } + + fclose(f); + + /* read and fill device ID */ + snprintf(name, sizeof(name), "%svendor", dir); + f = fopen(name, "r"); + if (f == NULL) { + fprintf(stderr, "%s: %s: %m\n", __func__, name); + return 1; + } + if (fscanf(f, "%li\n", &id) == 1) { + pci_dev->dev.config[0] = id & 0xff; + pci_dev->dev.config[1] = (id & 0xff00) >> 8; + } + fclose(f); + + /* read and fill vendor ID */ + snprintf(name, sizeof(name), "%sdevice", dir); + f = fopen(name, "r"); + if (f == NULL) { + fprintf(stderr, "%s: %s: %m\n", __func__, name); + return 1; + } + if (fscanf(f, "%li\n", &id) == 1) { + pci_dev->dev.config[2] = id & 0xff; + pci_dev->dev.config[3] = (id & 0xff00) >> 8; + } fclose(f); dev->region_number = r; -- 1.5.4.5 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/3] kvm: emulate command register for SRIOV virtual function 2009-03-04 7:54 [PATCH 0/3] Enable SRIOV with KVM Sheng Yang 2009-03-04 7:54 ` [PATCH 1/3] kvm: fix irq 0 assignment Sheng Yang 2009-03-04 7:54 ` [PATCH 2/3] KVM: Fill config with correct VID/DID Sheng Yang @ 2009-03-04 7:54 ` Sheng Yang 2 siblings, 0 replies; 10+ messages in thread From: Sheng Yang @ 2009-03-04 7:54 UTC (permalink / raw) To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Sheng Yang MMIO enable byte would be checked when enabling virtual function, but in fact, the whole virtual function's command register is hard-wired to zero... So when guest read from command register it would only get 0, specially for MMIO enable bit. Then we relay on QEmu to provide a reasonable command register content to guest. Signed-off-by: Sheng Yang <sheng@linux.intel.com> --- qemu/hw/device-assignment.c | 13 ++++++++++++- qemu/hw/device-assignment.h | 1 + 2 files changed, 13 insertions(+), 1 deletions(-) diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c index 867229d..c88214c 100644 --- a/qemu/hw/device-assignment.c +++ b/qemu/hw/device-assignment.c @@ -26,7 +26,10 @@ * Copyright (C) 2008, IBM, Muli Ben-Yehuda (muli@il.ibm.com) */ #include <stdio.h> +#include <unistd.h> #include <sys/io.h> +#include <sys/types.h> +#include <sys/stat.h> #include "qemu-kvm.h" #include "hw.h" #include "pc.h" @@ -317,7 +320,7 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice *d, uint32_t address, ssize_t ret; AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev); - if (address < 0x4 || + if (address < 0x4 || (pci_dev->need_emulate_cmd && address == 0x4) || (address >= 0x10 && address <= 0x24) || address == 0x34 || address == 0x3c || address == 0x3d || pci_access_cap_config(d, address, len)) { @@ -431,6 +434,7 @@ static int get_real_device(AssignedDevice *pci_dev, uint8_t r_bus, FILE *f; unsigned long long start, end, size, flags; unsigned long id; + struct stat statbuf; PCIRegion *rp; PCIDevRegions *dev = &pci_dev->real_device; @@ -519,6 +523,13 @@ again: } fclose(f); + /* dealing with virtual function device */ + snprintf(name, sizeof(name), "%sphysfn/", dir); + if (!stat(name, &statbuf)) + pci_dev->need_emulate_cmd = 1; + else + pci_dev->need_emulate_cmd = 0; + dev->region_number = r; return 0; } diff --git a/qemu/hw/device-assignment.h b/qemu/hw/device-assignment.h index fc1af8f..cdebaa5 100644 --- a/qemu/hw/device-assignment.h +++ b/qemu/hw/device-assignment.h @@ -94,6 +94,7 @@ typedef struct { void *msix_table_page; target_phys_addr_t msix_table_addr; int mmio_index; + int need_emulate_cmd; } AssignedDevice; typedef struct AssignedDevInfo AssignedDevInfo; -- 1.5.4.5 ^ permalink raw reply related [flat|nested] 10+ messages in thread
end of thread, other threads:[~2009-03-05 2:47 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-03-04 7:54 [PATCH 0/3] Enable SRIOV with KVM Sheng Yang 2009-03-04 7:54 ` [PATCH 1/3] kvm: fix irq 0 assignment Sheng Yang 2009-03-04 9:53 ` Chris Wedgwood 2009-03-04 9:58 ` Sheng Yang 2009-03-04 10:02 ` Sheng Yang 2009-03-04 19:30 ` Marcelo Tosatti 2009-03-04 19:41 ` Chris Wright 2009-03-05 2:47 ` Sheng Yang 2009-03-04 7:54 ` [PATCH 2/3] KVM: Fill config with correct VID/DID Sheng Yang 2009-03-04 7:54 ` [PATCH 3/3] kvm: emulate command register for SRIOV virtual function Sheng Yang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox