public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Enable SRIOV with KVM
@ 2009-03-04  7:54 Sheng Yang
  2009-03-04  7:54 ` [PATCH 1/3] kvm: fix irq 0 assignment Sheng Yang
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Sheng Yang @ 2009-03-04  7:54 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm

The first patch have been sent already. The other two deal with some minor
problem of SRIOV's virtual function. With these three userspace patches, SRIOV
can be enabled with KVM(of course, with MSI-X and my previous userspace
patches.)

Thanks.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/3] kvm: fix irq 0 assignment
  2009-03-04  7:54 [PATCH 0/3] Enable SRIOV with KVM Sheng Yang
@ 2009-03-04  7:54 ` Sheng Yang
  2009-03-04  9:53   ` Chris Wedgwood
  2009-03-04 19:30   ` Marcelo Tosatti
  2009-03-04  7:54 ` [PATCH 2/3] KVM: Fill config with correct VID/DID Sheng Yang
  2009-03-04  7:54 ` [PATCH 3/3] kvm: emulate command register for SRIOV virtual function Sheng Yang
  2 siblings, 2 replies; 10+ messages in thread
From: Sheng Yang @ 2009-03-04  7:54 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Sheng Yang

Shouldn't update assigned irq if host irq is 0, which means uninitialized
or don't support INTx.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
 qemu/hw/device-assignment.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c
index 32fba2a..7f9c789 100644
--- a/qemu/hw/device-assignment.c
+++ b/qemu/hw/device-assignment.c
@@ -574,6 +574,10 @@ static int assign_irq(AssignedDevInfo *adev)
     AssignedDevice *dev = adev->assigned_dev;
     int irq, r = 0;
 
+    /* irq 0 means uninitialized */
+    if (dev->real_device.irq == 0)
+        return 0;
+
     irq = pci_map_irq(&dev->dev, dev->intpin);
     irq = piix_get_irq(irq);
 
-- 
1.5.4.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/3] KVM: Fill config with correct VID/DID
  2009-03-04  7:54 [PATCH 0/3] Enable SRIOV with KVM Sheng Yang
  2009-03-04  7:54 ` [PATCH 1/3] kvm: fix irq 0 assignment Sheng Yang
@ 2009-03-04  7:54 ` Sheng Yang
  2009-03-04  7:54 ` [PATCH 3/3] kvm: emulate command register for SRIOV virtual function Sheng Yang
  2 siblings, 0 replies; 10+ messages in thread
From: Sheng Yang @ 2009-03-04  7:54 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Sheng Yang

SRIOV's virtual function didn't show correct Vendor ID/Device ID in config, so
we have to fill them manually according to device/vendor file in sysfs.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
 qemu/hw/device-assignment.c |   31 ++++++++++++++++++++++++++++++-
 1 files changed, 30 insertions(+), 1 deletions(-)

diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c
index 7f9c789..867229d 100644
--- a/qemu/hw/device-assignment.c
+++ b/qemu/hw/device-assignment.c
@@ -317,7 +317,8 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice *d, uint32_t address,
     ssize_t ret;
     AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev);
 
-    if ((address >= 0x10 && address <= 0x24) || address == 0x34 ||
+    if (address < 0x4 ||
+	(address >= 0x10 && address <= 0x24) || address == 0x34 ||
         address == 0x3c || address == 0x3d ||
         pci_access_cap_config(d, address, len)) {
         val = pci_default_read_config(d, address, len);
@@ -429,6 +430,7 @@ static int get_real_device(AssignedDevice *pci_dev, uint8_t r_bus,
     int fd, r = 0;
     FILE *f;
     unsigned long long start, end, size, flags;
+    unsigned long id;
     PCIRegion *rp;
     PCIDevRegions *dev = &pci_dev->real_device;
 
@@ -488,6 +490,33 @@ again:
         DEBUG("region %d size %d start 0x%llx type %d resource_fd %d\n",
               r, rp->size, start, rp->type, rp->resource_fd);
     }
+
+    fclose(f);
+
+    /* read and fill device ID */
+    snprintf(name, sizeof(name), "%svendor", dir);
+    f = fopen(name, "r");
+    if (f == NULL) {
+        fprintf(stderr, "%s: %s: %m\n", __func__, name);
+        return 1;
+    }
+    if (fscanf(f, "%li\n", &id) == 1) {
+	pci_dev->dev.config[0] = id & 0xff;
+	pci_dev->dev.config[1] = (id & 0xff00) >> 8;
+    }
+    fclose(f);
+
+    /* read and fill vendor ID */
+    snprintf(name, sizeof(name), "%sdevice", dir);
+    f = fopen(name, "r");
+    if (f == NULL) {
+        fprintf(stderr, "%s: %s: %m\n", __func__, name);
+        return 1;
+    }
+    if (fscanf(f, "%li\n", &id) == 1) {
+	pci_dev->dev.config[2] = id & 0xff;
+	pci_dev->dev.config[3] = (id & 0xff00) >> 8;
+    }
     fclose(f);
 
     dev->region_number = r;
-- 
1.5.4.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/3] kvm: emulate command register for SRIOV virtual function
  2009-03-04  7:54 [PATCH 0/3] Enable SRIOV with KVM Sheng Yang
  2009-03-04  7:54 ` [PATCH 1/3] kvm: fix irq 0 assignment Sheng Yang
  2009-03-04  7:54 ` [PATCH 2/3] KVM: Fill config with correct VID/DID Sheng Yang
@ 2009-03-04  7:54 ` Sheng Yang
  2 siblings, 0 replies; 10+ messages in thread
From: Sheng Yang @ 2009-03-04  7:54 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Sheng Yang

MMIO enable byte would be checked when enabling virtual function, but in fact,
the whole virtual function's command register is hard-wired to zero... So when
guest read from command register it would only get 0, specially for MMIO enable
bit.

Then we relay on QEmu to provide a reasonable command register content to guest.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
 qemu/hw/device-assignment.c |   13 ++++++++++++-
 qemu/hw/device-assignment.h |    1 +
 2 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c
index 867229d..c88214c 100644
--- a/qemu/hw/device-assignment.c
+++ b/qemu/hw/device-assignment.c
@@ -26,7 +26,10 @@
  *  Copyright (C) 2008, IBM, Muli Ben-Yehuda (muli@il.ibm.com)
  */
 #include <stdio.h>
+#include <unistd.h>
 #include <sys/io.h>
+#include <sys/types.h>
+#include <sys/stat.h>
 #include "qemu-kvm.h"
 #include "hw.h"
 #include "pc.h"
@@ -317,7 +320,7 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice *d, uint32_t address,
     ssize_t ret;
     AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev);
 
-    if (address < 0x4 ||
+    if (address < 0x4 || (pci_dev->need_emulate_cmd && address == 0x4) ||
 	(address >= 0x10 && address <= 0x24) || address == 0x34 ||
         address == 0x3c || address == 0x3d ||
         pci_access_cap_config(d, address, len)) {
@@ -431,6 +434,7 @@ static int get_real_device(AssignedDevice *pci_dev, uint8_t r_bus,
     FILE *f;
     unsigned long long start, end, size, flags;
     unsigned long id;
+    struct stat statbuf;
     PCIRegion *rp;
     PCIDevRegions *dev = &pci_dev->real_device;
 
@@ -519,6 +523,13 @@ again:
     }
     fclose(f);
 
+    /* dealing with virtual function device */
+    snprintf(name, sizeof(name), "%sphysfn/", dir);
+    if (!stat(name, &statbuf))
+	    pci_dev->need_emulate_cmd = 1;
+    else
+	    pci_dev->need_emulate_cmd = 0;
+
     dev->region_number = r;
     return 0;
 }
diff --git a/qemu/hw/device-assignment.h b/qemu/hw/device-assignment.h
index fc1af8f..cdebaa5 100644
--- a/qemu/hw/device-assignment.h
+++ b/qemu/hw/device-assignment.h
@@ -94,6 +94,7 @@ typedef struct {
     void *msix_table_page;
     target_phys_addr_t msix_table_addr;
     int mmio_index;
+    int need_emulate_cmd;
 } AssignedDevice;
 
 typedef struct AssignedDevInfo AssignedDevInfo;
-- 
1.5.4.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] kvm: fix irq 0 assignment
  2009-03-04  7:54 ` [PATCH 1/3] kvm: fix irq 0 assignment Sheng Yang
@ 2009-03-04  9:53   ` Chris Wedgwood
  2009-03-04  9:58     ` Sheng Yang
  2009-03-04 19:30   ` Marcelo Tosatti
  1 sibling, 1 reply; 10+ messages in thread
From: Chris Wedgwood @ 2009-03-04  9:53 UTC (permalink / raw)
  To: Sheng Yang; +Cc: Avi Kivity, Marcelo Tosatti, kvm

On Wed, Mar 04, 2009 at 03:54:27PM +0800, Sheng Yang wrote:

> Shouldn't update assigned irq if host irq is 0, which means
> uninitialized or don't support INTx.

Is that generally true?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] kvm: fix irq 0 assignment
  2009-03-04  9:53   ` Chris Wedgwood
@ 2009-03-04  9:58     ` Sheng Yang
  2009-03-04 10:02       ` Sheng Yang
  0 siblings, 1 reply; 10+ messages in thread
From: Sheng Yang @ 2009-03-04  9:58 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Avi Kivity, Marcelo Tosatti, kvm

On Wednesday 04 March 2009 17:53:52 Chris Wedgwood wrote:
> On Wed, Mar 04, 2009 at 03:54:27PM +0800, Sheng Yang wrote:
> > Shouldn't update assigned irq if host irq is 0, which means
> > uninitialized or don't support INTx.
>
> Is that generally true?

Host irq 0 is reserved for PIT timer, at least for x86.

-- 
regards
Yang, Sheng
  


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] kvm: fix irq 0 assignment
  2009-03-04  9:58     ` Sheng Yang
@ 2009-03-04 10:02       ` Sheng Yang
  0 siblings, 0 replies; 10+ messages in thread
From: Sheng Yang @ 2009-03-04 10:02 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Avi Kivity, Marcelo Tosatti, kvm

On Wednesday 04 March 2009 17:58:31 Sheng Yang wrote:
> On Wednesday 04 March 2009 17:53:52 Chris Wedgwood wrote:
> > On Wed, Mar 04, 2009 at 03:54:27PM +0800, Sheng Yang wrote:
> > > Shouldn't update assigned irq if host irq is 0, which means
> > > uninitialized or don't support INTx.
> >
> > Is that generally true?
>
> Host irq 0 is reserved for PIT timer, at least for x86.

Or in some other condition it would be used for pci/pcie device? And please 
notice that currently we only support assign with x86 and IA64. (I really feel 
there is some reason for you ask. :) )

-- 
regards
Yang, Sheng


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] kvm: fix irq 0 assignment
  2009-03-04  7:54 ` [PATCH 1/3] kvm: fix irq 0 assignment Sheng Yang
  2009-03-04  9:53   ` Chris Wedgwood
@ 2009-03-04 19:30   ` Marcelo Tosatti
  2009-03-04 19:41     ` Chris Wright
  1 sibling, 1 reply; 10+ messages in thread
From: Marcelo Tosatti @ 2009-03-04 19:30 UTC (permalink / raw)
  To: Sheng Yang; +Cc: Avi Kivity, kvm, Chris Wright

On Wed, Mar 04, 2009 at 03:54:27PM +0800, Sheng Yang wrote:
> Shouldn't update assigned irq if host irq is 0, which means uninitialized
> or don't support INTx.
> 
> Signed-off-by: Sheng Yang <sheng@linux.intel.com>
> ---
>  qemu/hw/device-assignment.c |    4 ++++
>  1 files changed, 4 insertions(+), 0 deletions(-)
> 
> diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c
> index 32fba2a..7f9c789 100644
> --- a/qemu/hw/device-assignment.c
> +++ b/qemu/hw/device-assignment.c
> @@ -574,6 +574,10 @@ static int assign_irq(AssignedDevInfo *adev)
>      AssignedDevice *dev = adev->assigned_dev;
>      int irq, r = 0;
>  
> +    /* irq 0 means uninitialized */
> +    if (dev->real_device.irq == 0)
> +        return 0;
> +
>      irq = pci_map_irq(&dev->dev, dev->intpin);
>      irq = piix_get_irq(irq);

Sheng,

Please do not special case irq 0. The fact that only x86/IA64 are
supported at the moment does not mean other architectures can't 
use it.

Also, the kernel code to handle dev/irq assignment is quite messy. It
should be only a mechanism, with userspace controlling the policy.

For example:

        if (match->irq_requested_type & KVM_ASSIGNED_DEV_MSIX)
                current_flags |= KVM_DEV_IRQ_ASSIGN_ENABLE_MSIX;
        else if ((match->irq_requested_type & KVM_ASSIGNED_DEV_HOST_MSI) &&
                 (match->irq_requested_type & KVM_ASSIGNED_DEV_GUEST_MSI))
                current_flags |= KVM_DEV_IRQ_ASSIGN_ENABLE_MSI;

And then in userspace you have

+    if (*ctrl_word & PCI_MSIX_ENABLE) {
+        assigned_irq_data.flags = KVM_DEV_IRQ_ASSIGN_ENABLE_MSIX;
+        if (assigned_dev_update_msix_mmio(pci_dev) < 0) {
+            perror("assigned_dev_update_msix_mmio");
+        }
+    }

We really need to fix this before merging anything else in this area.

So ideally the kernel only provides ioctl commands that do one thing 
at a time:

- assign host device
- unassign guest device

- assign host irq (INTx or MSI)
- assign dev irq (INTx or MSI)

- unassign host irq
- unassign dev irq

So you don't have to make policy decisions like

        } else {
                /*
                 * Guest require to disable device MSI, we disable MSI
                 * and
                 * re-enable INTx by default again. Notice it's only for
                 * non-msi2intx.
                 */
                assigned_device_update_intx(kvm, adev, airq);
                return 0;
        }

Because you do in userspace. However, I do not think it is necessary to
create new ioctl commands and deprecate the old ones (it seems possible
to do this using the flags passed in "struct kvm_assigned_irq")

However, the current userspace code probably relies on implicit
behaviour by the kernel part. IMHO it is fair to break older 
userspace. Better change it sooner than later.

For example, don't do this

        if ((changed_flags & KVM_DEV_IRQ_ASSIGN_MSI_ACTION) ||
            (msi2intx && match->dev->msi_enabled)) {

But just enable MSI in either host or guest (one at a time). Return an
error if you can't. And don't make such assumptions:

        } else if (assigned_irq->host_irq == 0 && match->dev->irq == 0)
                /* Host device IRQ 0 means don't support INTx */

To use the current ioctl commands, maybe enforce the "one thing at a
time" nature and fail otherwise.

Can you please investigate about this possibility? Because talk is
cheap and I'm not sure whether you can always separate host/guest IRQ
assignment, but probably as long as done carefully.

Am I missing any reason why policy can't live outside the kernel?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] kvm: fix irq 0 assignment
  2009-03-04 19:30   ` Marcelo Tosatti
@ 2009-03-04 19:41     ` Chris Wright
  2009-03-05  2:47       ` Sheng Yang
  0 siblings, 1 reply; 10+ messages in thread
From: Chris Wright @ 2009-03-04 19:41 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Sheng Yang, Avi Kivity, kvm, Chris Wright

* Marcelo Tosatti (mtosatti@redhat.com) wrote:
> Please do not special case irq 0. The fact that only x86/IA64 are
> supported at the moment does not mean other architectures can't 
> use it.

Seems logical to use a flag instead of overloading the irq value.

> Also, the kernel code to handle dev/irq assignment is quite messy. It
> should be only a mechanism, with userspace controlling the policy.

I really agree here.  I think some work to simplify/clarify would be
time well-spent!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] kvm: fix irq 0 assignment
  2009-03-04 19:41     ` Chris Wright
@ 2009-03-05  2:47       ` Sheng Yang
  0 siblings, 0 replies; 10+ messages in thread
From: Sheng Yang @ 2009-03-05  2:47 UTC (permalink / raw)
  To: Chris Wright; +Cc: Marcelo Tosatti, Avi Kivity, kvm

On Thursday 05 March 2009 03:41:41 Chris Wright wrote:
> * Marcelo Tosatti (mtosatti@redhat.com) wrote:
> > Please do not special case irq 0. The fact that only x86/IA64 are
> > supported at the moment does not mean other architectures can't
> > use it.
>
> Seems logical to use a flag instead of overloading the irq value.
>
> > Also, the kernel code to handle dev/irq assignment is quite messy. It
> > should be only a mechanism, with userspace controlling the policy.
>
> I really agree here.  I think some work to simplify/clarify would be
> time well-spent!

So do I... After discussion with Marcelo, I would work with him on this clean 
up. :)

-- 
regards
Yang, Sheng



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-03-05  2:47 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-04  7:54 [PATCH 0/3] Enable SRIOV with KVM Sheng Yang
2009-03-04  7:54 ` [PATCH 1/3] kvm: fix irq 0 assignment Sheng Yang
2009-03-04  9:53   ` Chris Wedgwood
2009-03-04  9:58     ` Sheng Yang
2009-03-04 10:02       ` Sheng Yang
2009-03-04 19:30   ` Marcelo Tosatti
2009-03-04 19:41     ` Chris Wright
2009-03-05  2:47       ` Sheng Yang
2009-03-04  7:54 ` [PATCH 2/3] KVM: Fill config with correct VID/DID Sheng Yang
2009-03-04  7:54 ` [PATCH 3/3] kvm: emulate command register for SRIOV virtual function Sheng Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox