kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] MSI-X mask supporting for assigned device(QEmu)
@ 2010-11-04  6:18 Sheng Yang
  2010-11-04  6:18 ` [PATCH 1/3] qemu-kvm: Ioctl for in-kernel mask support Sheng Yang
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Sheng Yang @ 2010-11-04  6:18 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: Michael S. Tsirkin, kvm, Sheng Yang

Sheng Yang (3):
  qemu-kvm: Ioctl for in-kernel mask support
  qemu-kvm: device assignment: Some clean up about MSI-X code
  qemu-kvm: device assignment: emulate MSI-X mask bits

 hw/device-assignment.c |  236 ++++++++++++++++++++++++++++++++++++++++++------
 qemu-kvm.c             |   15 +++
 qemu-kvm.h             |    6 ++
 3 files changed, 227 insertions(+), 30 deletions(-)


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/3] qemu-kvm: Ioctl for in-kernel mask support
  2010-11-04  6:18 [PATCH 0/3] MSI-X mask supporting for assigned device(QEmu) Sheng Yang
@ 2010-11-04  6:18 ` Sheng Yang
  2010-11-04  9:47   ` Michael S. Tsirkin
  2010-11-04  6:18 ` [PATCH 2/3] qemu-kvm: device assignment: Some clean up about MSI-X code Sheng Yang
  2010-11-04  6:18 ` [PATCH 3/3] qemu-kvm: device assignment: emulate MSI-X mask bits Sheng Yang
  2 siblings, 1 reply; 20+ messages in thread
From: Sheng Yang @ 2010-11-04  6:18 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: Michael S. Tsirkin, kvm, Sheng Yang


Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
 qemu-kvm.c |   15 +++++++++++++++
 qemu-kvm.h |    6 ++++++
 2 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index 733d0a9..ba6db51 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -1092,6 +1092,21 @@ int kvm_assign_set_msix_entry(kvm_context_t kvm,
 {
     return kvm_vm_ioctl(kvm_state, KVM_ASSIGN_SET_MSIX_ENTRY, entry);
 }
+
+#ifdef KVM_CAP_DEVICE_MSIX_MASK
+int kvm_assign_get_msix_entry(kvm_context_t kvm,
+                              struct kvm_assigned_msix_entry *entry)
+{
+    return kvm_vm_ioctl(kvm_state, KVM_ASSIGN_GET_MSIX_ENTRY, entry);
+}
+
+int kvm_assign_reg_msix_mmio(kvm_context_t kvm,
+                           struct kvm_assigned_msix_mmio *msix_mmio)
+{
+    return kvm_vm_ioctl(kvm_state, KVM_ASSIGN_REG_MSIX_MMIO, msix_mmio);
+}
+#endif
+
 #endif
 
 #if defined(KVM_CAP_IRQFD) && defined(CONFIG_EVENTFD)
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 9c08ab4..1afdd42 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -743,6 +743,12 @@ int kvm_assign_set_msix_nr(kvm_context_t kvm,
                            struct kvm_assigned_msix_nr *msix_nr);
 int kvm_assign_set_msix_entry(kvm_context_t kvm,
                               struct kvm_assigned_msix_entry *entry);
+#ifdef KVM_CAP_DEVICE_MSIX_MASK
+int kvm_assign_get_msix_entry(kvm_context_t kvm,
+                              struct kvm_assigned_msix_entry *entry);
+int kvm_assign_reg_msix_mmio(kvm_context_t kvm,
+                           struct kvm_assigned_msix_mmio *msix_mmio);
+#endif
 #endif
 
 #else                           /* !CONFIG_KVM */
-- 
1.7.0.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/3] qemu-kvm: device assignment: Some clean up about MSI-X code
  2010-11-04  6:18 [PATCH 0/3] MSI-X mask supporting for assigned device(QEmu) Sheng Yang
  2010-11-04  6:18 ` [PATCH 1/3] qemu-kvm: Ioctl for in-kernel mask support Sheng Yang
@ 2010-11-04  6:18 ` Sheng Yang
  2010-11-04  9:47   ` Michael S. Tsirkin
  2010-11-04  6:18 ` [PATCH 3/3] qemu-kvm: device assignment: emulate MSI-X mask bits Sheng Yang
  2 siblings, 1 reply; 20+ messages in thread
From: Sheng Yang @ 2010-11-04  6:18 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: Michael S. Tsirkin, kvm, Sheng Yang


Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
 hw/device-assignment.c |   77 ++++++++++++++++++++++++++++++++---------------
 1 files changed, 52 insertions(+), 25 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 2605bd1..8a98876 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1106,15 +1106,10 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev, unsigned int ctrl_pos)
 #endif
 
 #ifdef KVM_CAP_DEVICE_MSIX
-static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
+static int get_msix_entries_max_nr(AssignedDevice *adev)
 {
-    AssignedDevice *adev = container_of(pci_dev, AssignedDevice, dev);
-    uint16_t entries_nr = 0, entries_max_nr;
-    int pos = 0, i, r = 0;
-    uint32_t msg_addr, msg_upper_addr, msg_data, msg_ctrl;
-    struct kvm_assigned_msix_nr msix_nr;
-    struct kvm_assigned_msix_entry msix_entry;
-    void *va = adev->msix_table_page;
+    int pos, entries_max_nr;
+    PCIDevice *pci_dev = &adev->dev;
 
     if (adev->cap.available & ASSIGNED_DEVICE_CAP_MSI)
         pos = pci_dev->cap.start + PCI_CAPABILITY_CONFIG_MSI_LENGTH;
@@ -1125,6 +1120,17 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
     entries_max_nr &= PCI_MSIX_TABSIZE;
     entries_max_nr += 1;
 
+    return entries_max_nr;
+}
+
+static int get_msix_valid_entries_nr(AssignedDevice *adev,
+				     uint16_t entries_max_nr)
+{
+    void *va = adev->msix_table_page;
+    uint32_t msg_data, msg_ctrl;
+    uint16_t entries_nr = 0;
+    int i;
+
     /* Get the usable entry number for allocating */
     for (i = 0; i < entries_max_nr; i++) {
         memcpy(&msg_ctrl, va + i * 16 + 12, 4);
@@ -1134,11 +1140,20 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
             continue;
         entries_nr ++;
     }
+    return entries_nr;
+}
+
+static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev,
+                                         uint16_t entries_nr,
+                                         uint16_t entries_max_nr)
+{
+    AssignedDevice *adev = container_of(pci_dev, AssignedDevice, dev);
+    int i, r = 0;
+    uint32_t msg_addr, msg_upper_addr, msg_data, msg_ctrl;
+    struct kvm_assigned_msix_nr msix_nr;
+    struct kvm_assigned_msix_entry msix_entry;
+    void *va = adev->msix_table_page;
 
-    if (entries_nr == 0) {
-        fprintf(stderr, "MSI-X entry number is zero!\n");
-        return -EINVAL;
-    }
     msix_nr.assigned_dev_id = calc_assigned_dev_id(adev->h_segnr, adev->h_busnr,
                                           (uint8_t)adev->h_devfn);
     msix_nr.entry_nr = entries_nr;
@@ -1180,8 +1195,8 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
         adev->entry[entries_nr].u.msi.address_lo = msg_addr;
         adev->entry[entries_nr].u.msi.address_hi = msg_upper_addr;
         adev->entry[entries_nr].u.msi.data = msg_data;
-        DEBUG("MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!", msg_data, msg_addr);
-	kvm_add_routing_entry(&adev->entry[entries_nr]);
+        DEBUG("MSI-X data 0x%x, MSI-X addr_lo 0x%x!\n", msg_data, msg_addr);
+        kvm_add_routing_entry(&adev->entry[entries_nr]);
 
         msix_entry.gsi = adev->entry[entries_nr].gsi;
         msix_entry.entry = i;
@@ -1190,7 +1205,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
             fprintf(stderr, "fail to set MSI-X entry! %s\n", strerror(-r));
             break;
         }
-        DEBUG("MSI-X entry gsi 0x%x, entry %d\n!",
+        DEBUG("MSI-X entry gsi 0x%x, entry %d!\n",
                 msix_entry.gsi, msix_entry.entry);
         entries_nr ++;
     }
@@ -1203,12 +1218,12 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
     return r;
 }
 
-static void assigned_dev_update_msix(PCIDevice *pci_dev, unsigned int ctrl_pos)
+static void assigned_dev_update_msix(PCIDevice *pci_dev, int enable_msix)
 {
     struct kvm_assigned_irq assigned_irq_data;
     AssignedDevice *assigned_dev = container_of(pci_dev, AssignedDevice, dev);
-    uint16_t *ctrl_word = (uint16_t *)(pci_dev->config + ctrl_pos);
     int r;
+    uint16_t entries_nr, entries_max_nr;
 
     memset(&assigned_irq_data, 0, sizeof assigned_irq_data);
     assigned_irq_data.assigned_dev_id  =
@@ -1219,8 +1234,7 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev, unsigned int ctrl_pos)
      * try to catch this by only deassigning irqs if the guest is using
      * MSIX or intends to start. */
     if ((assigned_dev->irq_requested_type & KVM_DEV_IRQ_GUEST_MSIX) ||
-        (*ctrl_word & PCI_MSIX_ENABLE)) {
-
+            enable_msix) {
         assigned_irq_data.flags = assigned_dev->irq_requested_type;
         free_dev_irq_entries(assigned_dev);
         r = kvm_deassign_irq(kvm_context, &assigned_irq_data);
@@ -1231,14 +1245,25 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev, unsigned int ctrl_pos)
         assigned_dev->irq_requested_type = 0;
     }
 
-    if (*ctrl_word & PCI_MSIX_ENABLE) {
-        assigned_irq_data.flags = KVM_DEV_IRQ_HOST_MSIX |
-                                  KVM_DEV_IRQ_GUEST_MSIX;
-
-        if (assigned_dev_update_msix_mmio(pci_dev) < 0) {
+    entries_max_nr = get_msix_entries_max_nr(assigned_dev);
+    if (entries_max_nr == 0) {
+        fprintf(stderr, "assigned_dev_update_msix: MSI-X entries_max_nr == 0");
+        return;
+    }
+    entries_nr = get_msix_valid_entries_nr(assigned_dev, entries_max_nr);
+    if (entries_nr == 0) {
+        if (enable_msix)
+            fprintf(stderr, "MSI-X entry number is zero!\n");
+        return;
+    }
+    if (enable_msix) {
+        if (assigned_dev_update_msix_mmio(pci_dev,
+                    entries_nr, entries_max_nr) < 0) {
             perror("assigned_dev_update_msix_mmio");
             return;
         }
+        assigned_irq_data.flags = KVM_DEV_IRQ_HOST_MSIX |
+                                  KVM_DEV_IRQ_GUEST_MSIX;
         if (kvm_assign_irq(kvm_context, &assigned_irq_data) < 0) {
             perror("assigned_dev_enable_msix: assign irq");
             return;
@@ -1254,6 +1279,7 @@ static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev, uint32_t ad
 {
     AssignedDevice *assigned_dev = container_of(pci_dev, AssignedDevice, dev);
     unsigned int pos = pci_dev->cap.start, ctrl_pos;
+    uint16_t *ctrl_word;
 
     pci_default_cap_write_config(pci_dev, address, val, len);
 #ifdef KVM_CAP_IRQ_ROUTING
@@ -1270,7 +1296,8 @@ static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev, uint32_t ad
         ctrl_pos = pos + 3;
         if (address <= ctrl_pos && address + len > ctrl_pos) {
             ctrl_pos--; /* control is word long */
-            assigned_dev_update_msix(pci_dev, ctrl_pos);
+            ctrl_word = (uint16_t *)(pci_dev->config + ctrl_pos);
+            assigned_dev_update_msix(pci_dev, (*ctrl_word & PCI_MSIX_ENABLE));
 	}
         pos += PCI_CAPABILITY_CONFIG_MSIX_LENGTH;
     }
-- 
1.7.0.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/3] qemu-kvm: device assignment: emulate MSI-X mask bits
  2010-11-04  6:18 [PATCH 0/3] MSI-X mask supporting for assigned device(QEmu) Sheng Yang
  2010-11-04  6:18 ` [PATCH 1/3] qemu-kvm: Ioctl for in-kernel mask support Sheng Yang
  2010-11-04  6:18 ` [PATCH 2/3] qemu-kvm: device assignment: Some clean up about MSI-X code Sheng Yang
@ 2010-11-04  6:18 ` Sheng Yang
  2010-11-04  9:44   ` Michael S. Tsirkin
  2010-11-04 10:04   ` Michael S. Tsirkin
  2 siblings, 2 replies; 20+ messages in thread
From: Sheng Yang @ 2010-11-04  6:18 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti; +Cc: Michael S. Tsirkin, kvm, Sheng Yang

This patch emulated MSI-X per vector mask bit on assigned device.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
 hw/device-assignment.c |  161 ++++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 155 insertions(+), 6 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 8a98876..639aa0b 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -62,6 +62,11 @@ static void assigned_dev_load_option_rom(AssignedDevice *dev);
 
 static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev);
 
+static uint32_t calc_assigned_dev_id(uint16_t seg, uint8_t bus, uint8_t devfn)
+{
+    return (uint32_t)seg << 16 | (uint32_t)bus << 8 | (uint32_t)devfn;
+}
+
 static uint32_t assigned_dev_ioport_rw(AssignedDevRegion *dev_region,
                                        uint32_t addr, int len, uint32_t *val)
 {
@@ -264,6 +269,9 @@ static void assigned_dev_iomem_map(PCIDevice *pci_dev, int region_num,
     AssignedDevRegion *region = &r_dev->v_addrs[region_num];
     PCIRegion *real_region = &r_dev->real_device.regions[region_num];
     int ret = 0;
+#ifdef KVM_CAP_DEVICE_MSIX_MASK
+    struct kvm_assigned_msix_mmio msix_mmio;
+#endif
 
     DEBUG("e_phys=%08" FMT_PCIBUS " r_virt=%p type=%d len=%08" FMT_PCIBUS " region_num=%d \n",
           e_phys, region->u.r_virtbase, type, e_size, region_num);
@@ -282,6 +290,16 @@ static void assigned_dev_iomem_map(PCIDevice *pci_dev, int region_num,
 
             cpu_register_physical_memory(e_phys + offset,
                     TARGET_PAGE_SIZE, r_dev->mmio_index);
+#ifdef KVM_CAP_DEVICE_MSIX_MASK
+	    memset(&msix_mmio, 0, sizeof(struct kvm_assigned_msix_mmio));
+	    msix_mmio.assigned_dev_id = calc_assigned_dev_id(r_dev->h_segnr,
+			    r_dev->h_busnr, r_dev->h_devfn);
+	    msix_mmio.base_addr = e_phys + offset;
+            /* We required kernel MSI-X support */
+	    ret = kvm_assign_reg_msix_mmio(kvm_context, &msix_mmio);
+	    if (ret)
+                fprintf(stderr, "fail to register in-kernel msix_mmio!\n");
+#endif
         }
     }
 
@@ -824,11 +842,6 @@ static void free_assigned_device(AssignedDevice *dev)
     }
 }
 
-static uint32_t calc_assigned_dev_id(uint16_t seg, uint8_t bus, uint8_t devfn)
-{
-    return (uint32_t)seg << 16 | (uint32_t)bus << 8 | (uint32_t)devfn;
-}
-
 static void assign_failed_examine(AssignedDevice *dev)
 {
     char name[PATH_MAX], dir[PATH_MAX], driver[PATH_MAX] = {}, *ns;
@@ -1123,6 +1136,27 @@ static int get_msix_entries_max_nr(AssignedDevice *adev)
     return entries_max_nr;
 }
 
+#ifdef KVM_CAP_DEVICE_MSIX_MASK
+static int assigned_dev_msix_entry_masked(AssignedDevice *adev, int entry)
+{
+    struct kvm_assigned_msix_entry msix_entry;
+    int r;
+
+    memset(&msix_entry, 0, sizeof msix_entry);
+    msix_entry.assigned_dev_id = calc_assigned_dev_id(adev->h_segnr,
+            adev->h_busnr, (uint8_t)adev->h_devfn);
+    msix_entry.entry = entry;
+    msix_entry.flags = KVM_MSIX_FLAG_QUERY_MASK;
+    r = kvm_assign_get_msix_entry(kvm_context, &msix_entry);
+    if (r) {
+        fprintf(stderr, "assigned_dev_msix_entry_masked: "
+			"Fail to get mask bit of entry %d\n", entry);
+        return 1;
+    }
+    return (msix_entry.flags & KVM_MSIX_FLAG_MASK);
+}
+#endif
+
 static int get_msix_valid_entries_nr(AssignedDevice *adev,
 				     uint16_t entries_max_nr)
 {
@@ -1136,7 +1170,11 @@ static int get_msix_valid_entries_nr(AssignedDevice *adev,
         memcpy(&msg_ctrl, va + i * 16 + 12, 4);
         memcpy(&msg_data, va + i * 16 + 8, 4);
         /* Ignore unused entry even it's unmasked */
+#ifdef KVM_CAP_DEVICE_MSIX_MASK
+        if (assigned_dev_msix_entry_masked(adev, i))
+#else
         if (msg_data == 0)
+#endif
             continue;
         entries_nr ++;
     }
@@ -1165,6 +1203,8 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev,
     }
 
     free_dev_irq_entries(adev);
+    memset(pci_dev->msix_entry_used, 0, KVM_MAX_MSIX_PER_DEV *
+                                        sizeof(*pci_dev->msix_entry_used));
     adev->irq_entries_nr = entries_nr;
     adev->entry = calloc(entries_nr, sizeof(struct kvm_irq_routing_entry));
     if (!adev->entry) {
@@ -1179,7 +1219,11 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev,
             break;
         memcpy(&msg_ctrl, va + i * 16 + 12, 4);
         memcpy(&msg_data, va + i * 16 + 8, 4);
+#ifdef KVM_CAP_DEVICE_MSIX_MASK
+        if (assigned_dev_msix_entry_masked(adev, i))
+#else
         if (msg_data == 0)
+#endif
             continue;
 
         memcpy(&msg_addr, va + i * 16, 4);
@@ -1200,6 +1244,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev,
 
         msix_entry.gsi = adev->entry[entries_nr].gsi;
         msix_entry.entry = i;
+        pci_dev->msix_entry_used[i] = 1;
         r = kvm_assign_set_msix_entry(kvm_context, &msix_entry);
         if (r) {
             fprintf(stderr, "fail to set MSI-X entry! %s\n", strerror(-r));
@@ -1243,6 +1288,8 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev, int enable_msix)
             perror("assigned_dev_update_msix: deassign irq");
 
         assigned_dev->irq_requested_type = 0;
+        memset(pci_dev->msix_entry_used, 0, KVM_MAX_MSIX_PER_DEV *
+                                        sizeof(*pci_dev->msix_entry_used));
     }
 
     entries_max_nr = get_msix_entries_max_nr(assigned_dev);
@@ -1250,10 +1297,16 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev, int enable_msix)
         fprintf(stderr, "assigned_dev_update_msix: MSI-X entries_max_nr == 0");
         return;
     }
+    /*
+     * Guest may try to enable MSI-X before setting MSI-X entry done, so
+     * let's wait until guest unmask the entries.
+     */
     entries_nr = get_msix_valid_entries_nr(assigned_dev, entries_max_nr);
     if (entries_nr == 0) {
+#ifndef KVM_CAP_DEVICE_MSIX_MASK
         if (enable_msix)
             fprintf(stderr, "MSI-X entry number is zero!\n");
+#endif
         return;
     }
     if (enable_msix) {
@@ -1297,7 +1350,8 @@ static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev, uint32_t ad
         if (address <= ctrl_pos && address + len > ctrl_pos) {
             ctrl_pos--; /* control is word long */
             ctrl_word = (uint16_t *)(pci_dev->config + ctrl_pos);
-            assigned_dev_update_msix(pci_dev, (*ctrl_word & PCI_MSIX_ENABLE));
+            assigned_dev_update_msix(pci_dev,
+                    (*ctrl_word & PCI_MSIX_ENABLE) && !(*ctrl_word & PCI_MSIX_MASK));
 	}
         pos += PCI_CAPABILITY_CONFIG_MSIX_LENGTH;
     }
@@ -1395,10 +1449,101 @@ static void msix_mmio_writel(void *opaque,
     AssignedDevice *adev = opaque;
     unsigned int offset = addr & 0xfff;
     void *page = adev->msix_table_page;
+#ifdef KVM_CAP_DEVICE_MSIX_MASK
+    int pos, ctrl_word, index;
+    struct kvm_irq_routing_entry new_entry = {};
+    int entry_idx, entries_max_nr, r = 0, i;
+    uint32_t msg_ctrl, msg_data, msg_upper_addr, msg_addr;
+#endif
 
     DEBUG("write to MSI-X entry table mmio offset 0x%lx, val 0x%x\n",
 		    addr, val);
     memcpy((void *)((char *)page + offset), &val, 4);
+
+#ifdef KVM_CAP_DEVICE_MSIX_MASK
+    index = offset / 16;
+
+    /* Check if mask bit is being accessed */
+    memcpy(&msg_addr, (char *)page + index * 16, 4);
+    memcpy(&msg_upper_addr, (char *)page + index * 16 + 4, 4);
+    memcpy(&msg_data, (char *)page + index * 16 + 8, 4);
+    memcpy(&msg_ctrl, (char *)page + index * 16 + 12, 4);
+    DEBUG("MSI-X entries index %d: "
+            "msg_addr 0x%x, msg_upper_addr 0x%x, msg_data 0x%x, vec_ctl 0x%x\n",
+            index, msg_addr, msg_upper_addr, msg_data, msg_ctrl);
+
+    if (adev->cap.available & ASSIGNED_DEVICE_CAP_MSI)
+        pos = adev->dev.cap.start + PCI_CAPABILITY_CONFIG_MSI_LENGTH;
+    else
+        pos = adev->dev.cap.start;
+
+    ctrl_word = *(uint16_t *)(adev->dev.config + pos + 2);
+
+    if (!((ctrl_word & PCI_MSIX_ENABLE) && !(ctrl_word & PCI_MSIX_MASK)))
+        return;
+
+    if (!assigned_dev_msix_entry_masked(adev, index)) {
+        if (!adev->dev.msix_entry_used[index]) {
+            DEBUG("Try to modify unenabled MSI-X entry %d's mask. "
+                    "Reenable MSI-X.\n",
+                    index);
+            assigned_dev_update_msix(&adev->dev, 1);
+        }
+        return;
+    }
+
+    if (!adev->dev.msix_entry_used[index])
+        return;
+
+    /*
+     * We're here only because guest want to modify MSI data/addr, and
+     * kernel would filter those writing with mask bit unset.
+     */
+    entries_max_nr = get_msix_entries_max_nr(adev);
+
+    /*
+     * Find the index of routing entry, it can be different from 'index' if
+     * empty entry existed in between
+     */
+    entry_idx = -1;
+    for (i = 0; i <= index; i++) {
+        if (adev->dev.msix_entry_used[i])
+            entry_idx ++;
+    }
+    if (entry_idx >= entries_max_nr || entry_idx == -1) {
+        fprintf(stderr, "msix_mmio_writel: Entry idx %d exceed limit!\n",
+			entry_idx);
+        return;
+    }
+
+    if (!assigned_dev_msix_entry_masked(adev, index)) {
+        fprintf(stderr, "msix_mmio_writel: Trying write to unmasked entry!\n");
+        return;
+    }
+
+    new_entry.gsi = adev->entry[entry_idx].gsi;
+    new_entry.type = KVM_IRQ_ROUTING_MSI;
+    new_entry.flags = 0;
+    new_entry.u.msi.address_lo = msg_addr;
+    new_entry.u.msi.address_hi = msg_upper_addr;
+    new_entry.u.msi.data = msg_data;
+    if (memcmp(&adev->entry[entry_idx].u.msi, &new_entry.u.msi,
+                sizeof new_entry.u.msi)) {
+        r = kvm_update_routing_entry(&adev->entry[entry_idx], &new_entry);
+        if (r) {
+            perror("msix_mmio_writel: kvm_update_routing_entry failed\n");
+            return;
+        }
+        r = kvm_commit_irq_routes();
+        if (r) {
+            perror("msix_mmio_writel: kvm_commit_irq_routes failed\n");
+            return;
+        }
+    }
+    adev->entry[entry_idx].u.msi.address_lo = msg_addr;
+    adev->entry[entry_idx].u.msi.address_hi = msg_upper_addr;
+    adev->entry[entry_idx].u.msi.data = msg_data;
+#endif
 }
 
 static void msix_mmio_writew(void *opaque,
@@ -1436,6 +1581,8 @@ static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
     memset(dev->msix_table_page, 0, 0x1000);
     dev->mmio_index = cpu_register_io_memory(
                         msix_mmio_read, msix_mmio_write, dev);
+    dev->dev.msix_entry_used = qemu_mallocz(KVM_MAX_MSIX_PER_DEV *
+                                        sizeof *dev->dev.msix_entry_used);
     return 0;
 }
 
@@ -1452,6 +1599,8 @@ static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
                 strerror(errno));
     }
     dev->msix_table_page = NULL;
+    free(dev->dev.msix_entry_used);
+    dev->dev.msix_entry_used = NULL;
 }
 
 static int assigned_initfn(struct PCIDevice *pci_dev)
-- 
1.7.0.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/3] qemu-kvm: device assignment: emulate MSI-X mask bits
  2010-11-04  6:18 ` [PATCH 3/3] qemu-kvm: device assignment: emulate MSI-X mask bits Sheng Yang
@ 2010-11-04  9:44   ` Michael S. Tsirkin
  2010-11-05  3:20     ` Sheng Yang
  2010-11-04 10:04   ` Michael S. Tsirkin
  1 sibling, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2010-11-04  9:44 UTC (permalink / raw)
  To: Sheng Yang; +Cc: Avi Kivity, Marcelo Tosatti, kvm

On Thu, Nov 04, 2010 at 02:18:21PM +0800, Sheng Yang wrote:
> This patch emulated MSI-X per vector mask bit on assigned device.
> 
> Signed-off-by: Sheng Yang <sheng@linux.intel.com>
> ---
>  hw/device-assignment.c |  161 ++++++++++++++++++++++++++++++++++++++++++++++--
>  1 files changed, 155 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> index 8a98876..639aa0b 100644
> --- a/hw/device-assignment.c
> +++ b/hw/device-assignment.c
> @@ -62,6 +62,11 @@ static void assigned_dev_load_option_rom(AssignedDevice *dev);
>  
>  static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev);
>  
> +static uint32_t calc_assigned_dev_id(uint16_t seg, uint8_t bus, uint8_t devfn)
> +{
> +    return (uint32_t)seg << 16 | (uint32_t)bus << 8 | (uint32_t)devfn;
> +}
> +
>  static uint32_t assigned_dev_ioport_rw(AssignedDevRegion *dev_region,
>                                         uint32_t addr, int len, uint32_t *val)
>  {
> @@ -264,6 +269,9 @@ static void assigned_dev_iomem_map(PCIDevice *pci_dev, int region_num,
>      AssignedDevRegion *region = &r_dev->v_addrs[region_num];
>      PCIRegion *real_region = &r_dev->real_device.regions[region_num];
>      int ret = 0;
> +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> +    struct kvm_assigned_msix_mmio msix_mmio;
> +#endif
>  
>      DEBUG("e_phys=%08" FMT_PCIBUS " r_virt=%p type=%d len=%08" FMT_PCIBUS " region_num=%d \n",
>            e_phys, region->u.r_virtbase, type, e_size, region_num);
> @@ -282,6 +290,16 @@ static void assigned_dev_iomem_map(PCIDevice *pci_dev, int region_num,
>  
>              cpu_register_physical_memory(e_phys + offset,
>                      TARGET_PAGE_SIZE, r_dev->mmio_index);
> +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> +	    memset(&msix_mmio, 0, sizeof(struct kvm_assigned_msix_mmio));
> +	    msix_mmio.assigned_dev_id = calc_assigned_dev_id(r_dev->h_segnr,
> +			    r_dev->h_busnr, r_dev->h_devfn);
> +	    msix_mmio.base_addr = e_phys + offset;
> +            /* We required kernel MSI-X support */
> +	    ret = kvm_assign_reg_msix_mmio(kvm_context, &msix_mmio);
> +	    if (ret)
> +                fprintf(stderr, "fail to register in-kernel msix_mmio!\n");
> +#endif
>          }
>      }
>  
> @@ -824,11 +842,6 @@ static void free_assigned_device(AssignedDevice *dev)
>      }
>  }
>  
> -static uint32_t calc_assigned_dev_id(uint16_t seg, uint8_t bus, uint8_t devfn)
> -{
> -    return (uint32_t)seg << 16 | (uint32_t)bus << 8 | (uint32_t)devfn;
> -}
> -
>  static void assign_failed_examine(AssignedDevice *dev)
>  {
>      char name[PATH_MAX], dir[PATH_MAX], driver[PATH_MAX] = {}, *ns;
> @@ -1123,6 +1136,27 @@ static int get_msix_entries_max_nr(AssignedDevice *adev)
>      return entries_max_nr;
>  }
>  
> +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> +static int assigned_dev_msix_entry_masked(AssignedDevice *adev, int entry)
> +{
> +    struct kvm_assigned_msix_entry msix_entry;
> +    int r;
> +
> +    memset(&msix_entry, 0, sizeof msix_entry);
> +    msix_entry.assigned_dev_id = calc_assigned_dev_id(adev->h_segnr,
> +            adev->h_busnr, (uint8_t)adev->h_devfn);
> +    msix_entry.entry = entry;
> +    msix_entry.flags = KVM_MSIX_FLAG_QUERY_MASK;
> +    r = kvm_assign_get_msix_entry(kvm_context, &msix_entry);
> +    if (r) {
> +        fprintf(stderr, "assigned_dev_msix_entry_masked: "
> +			"Fail to get mask bit of entry %d\n", entry);
> +        return 1;

This error handling seems pretty useless. assert?

> +    }
> +    return (msix_entry.flags & KVM_MSIX_FLAG_MASK);
> +}
> +#endif
> +
>  static int get_msix_valid_entries_nr(AssignedDevice *adev,
>  				     uint16_t entries_max_nr)
>  {
> @@ -1136,7 +1170,11 @@ static int get_msix_valid_entries_nr(AssignedDevice *adev,
>          memcpy(&msg_ctrl, va + i * 16 + 12, 4);
>          memcpy(&msg_data, va + i * 16 + 8, 4);
>          /* Ignore unused entry even it's unmasked */
> +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> +        if (assigned_dev_msix_entry_masked(adev, i))
> +#else
>          if (msg_data == 0)
> +#endif

So, we are replacing msg_data == 0 check with masked check?
If yes why not do this for non-KVM_CAP_DEVICE_MSIX_MASK too?


>              continue;
>          entries_nr ++;
>      }
> @@ -1165,6 +1203,8 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev,
>      }
>  
>      free_dev_irq_entries(adev);
> +    memset(pci_dev->msix_entry_used, 0, KVM_MAX_MSIX_PER_DEV *
> +                                        sizeof(*pci_dev->msix_entry_used));
>      adev->irq_entries_nr = entries_nr;
>      adev->entry = calloc(entries_nr, sizeof(struct kvm_irq_routing_entry));
>      if (!adev->entry) {
> @@ -1179,7 +1219,11 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev,
>              break;
>          memcpy(&msg_ctrl, va + i * 16 + 12, 4);
>          memcpy(&msg_data, va + i * 16 + 8, 4);
> +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> +        if (assigned_dev_msix_entry_masked(adev, i))
> +#else
>          if (msg_data == 0)
> +#endif

You can't use ifdef to check that kernel supports an ioctl.
You must check this at runtime.

>              continue;
>  
>          memcpy(&msg_addr, va + i * 16, 4);
> @@ -1200,6 +1244,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev,
>  
>          msix_entry.gsi = adev->entry[entries_nr].gsi;
>          msix_entry.entry = i;
> +        pci_dev->msix_entry_used[i] = 1;
>          r = kvm_assign_set_msix_entry(kvm_context, &msix_entry);
>          if (r) {
>              fprintf(stderr, "fail to set MSI-X entry! %s\n", strerror(-r));
> @@ -1243,6 +1288,8 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev, int enable_msix)
>              perror("assigned_dev_update_msix: deassign irq");
>  
>          assigned_dev->irq_requested_type = 0;
> +        memset(pci_dev->msix_entry_used, 0, KVM_MAX_MSIX_PER_DEV *
> +                                        sizeof(*pci_dev->msix_entry_used));
>      }
>  
>      entries_max_nr = get_msix_entries_max_nr(assigned_dev);
> @@ -1250,10 +1297,16 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev, int enable_msix)
>          fprintf(stderr, "assigned_dev_update_msix: MSI-X entries_max_nr == 0");
>          return;
>      }
> +    /*
> +     * Guest may try to enable MSI-X before setting MSI-X entry done, so
> +     * let's wait until guest unmask the entries.
> +     */

Well it can also set up any number of entries, enable msix then
set up more entries. Now what?

>      entries_nr = get_msix_valid_entries_nr(assigned_dev, entries_max_nr);
>      if (entries_nr == 0) {
> +#ifndef KVM_CAP_DEVICE_MSIX_MASK
>          if (enable_msix)
>              fprintf(stderr, "MSI-X entry number is zero!\n");

And what happens then?

> +#endif
>          return;
>      }
>      if (enable_msix) {
> @@ -1297,7 +1350,8 @@ static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev, uint32_t ad
>          if (address <= ctrl_pos && address + len > ctrl_pos) {
>              ctrl_pos--; /* control is word long */
>              ctrl_word = (uint16_t *)(pci_dev->config + ctrl_pos);
> -            assigned_dev_update_msix(pci_dev, (*ctrl_word & PCI_MSIX_ENABLE));
> +            assigned_dev_update_msix(pci_dev,
> +                    (*ctrl_word & PCI_MSIX_ENABLE) && !(*ctrl_word & PCI_MSIX_MASK));
>  	}
>          pos += PCI_CAPABILITY_CONFIG_MSIX_LENGTH;
>      }
> @@ -1395,10 +1449,101 @@ static void msix_mmio_writel(void *opaque,
>      AssignedDevice *adev = opaque;
>      unsigned int offset = addr & 0xfff;
>      void *page = adev->msix_table_page;
> +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> +    int pos, ctrl_word, index;
> +    struct kvm_irq_routing_entry new_entry = {};
> +    int entry_idx, entries_max_nr, r = 0, i;
> +    uint32_t msg_ctrl, msg_data, msg_upper_addr, msg_addr;
> +#endif
>  
>      DEBUG("write to MSI-X entry table mmio offset 0x%lx, val 0x%x\n",
>  		    addr, val);
>      memcpy((void *)((char *)page + offset), &val, 4);
> +
> +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> +    index = offset / 16;
> +
> +    /* Check if mask bit is being accessed */
> +    memcpy(&msg_addr, (char *)page + index * 16, 4);
> +    memcpy(&msg_upper_addr, (char *)page + index * 16 + 4, 4);
> +    memcpy(&msg_data, (char *)page + index * 16 + 8, 4);
> +    memcpy(&msg_ctrl, (char *)page + index * 16 + 12, 4);
> +    DEBUG("MSI-X entries index %d: "
> +            "msg_addr 0x%x, msg_upper_addr 0x%x, msg_data 0x%x, vec_ctl 0x%x\n",
> +            index, msg_addr, msg_upper_addr, msg_data, msg_ctrl);
> +
> +    if (adev->cap.available & ASSIGNED_DEVICE_CAP_MSI)
> +        pos = adev->dev.cap.start + PCI_CAPABILITY_CONFIG_MSI_LENGTH;
> +    else
> +        pos = adev->dev.cap.start;
> +
> +    ctrl_word = *(uint16_t *)(adev->dev.config + pos + 2);
> +
> +    if (!((ctrl_word & PCI_MSIX_ENABLE) && !(ctrl_word & PCI_MSIX_MASK)))
> +        return;
> +
> +    if (!assigned_dev_msix_entry_masked(adev, index)) {
> +        if (!adev->dev.msix_entry_used[index]) {
> +            DEBUG("Try to modify unenabled MSI-X entry %d's mask. "
> +                    "Reenable MSI-X.\n",
> +                    index);
> +            assigned_dev_update_msix(&adev->dev, 1);
> +        }
> +        return;
> +    }
> +
> +    if (!adev->dev.msix_entry_used[index])
> +        return;
> +
> +    /*
> +     * We're here only because guest want to modify MSI data/addr, and
> +     * kernel would filter those writing with mask bit unset.
> +     */
> +    entries_max_nr = get_msix_entries_max_nr(adev);
> +
> +    /*
> +     * Find the index of routing entry, it can be different from 'index' if
> +     * empty entry existed in between
> +     */
> +    entry_idx = -1;
> +    for (i = 0; i <= index; i++) {
> +        if (adev->dev.msix_entry_used[i])
> +            entry_idx ++;
> +    }
> +    if (entry_idx >= entries_max_nr || entry_idx == -1) {
> +        fprintf(stderr, "msix_mmio_writel: Entry idx %d exceed limit!\n",
> +			entry_idx);
> +        return;
> +    }
> +
> +    if (!assigned_dev_msix_entry_masked(adev, index)) {
> +        fprintf(stderr, "msix_mmio_writel: Trying write to unmasked entry!\n");
> +        return;
> +    }
> +
> +    new_entry.gsi = adev->entry[entry_idx].gsi;
> +    new_entry.type = KVM_IRQ_ROUTING_MSI;
> +    new_entry.flags = 0;
> +    new_entry.u.msi.address_lo = msg_addr;
> +    new_entry.u.msi.address_hi = msg_upper_addr;
> +    new_entry.u.msi.data = msg_data;
> +    if (memcmp(&adev->entry[entry_idx].u.msi, &new_entry.u.msi,
> +                sizeof new_entry.u.msi)) {
> +        r = kvm_update_routing_entry(&adev->entry[entry_idx], &new_entry);
> +        if (r) {
> +            perror("msix_mmio_writel: kvm_update_routing_entry failed\n");
> +            return;
> +        }
> +        r = kvm_commit_irq_routes();
> +        if (r) {
> +            perror("msix_mmio_writel: kvm_commit_irq_routes failed\n");
> +            return;
> +        }
> +    }
> +    adev->entry[entry_idx].u.msi.address_lo = msg_addr;
> +    adev->entry[entry_idx].u.msi.address_hi = msg_upper_addr;
> +    adev->entry[entry_idx].u.msi.data = msg_data;
> +#endif
>  }
>  
>  static void msix_mmio_writew(void *opaque,
> @@ -1436,6 +1581,8 @@ static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
>      memset(dev->msix_table_page, 0, 0x1000);
>      dev->mmio_index = cpu_register_io_memory(
>                          msix_mmio_read, msix_mmio_write, dev);
> +    dev->dev.msix_entry_used = qemu_mallocz(KVM_MAX_MSIX_PER_DEV *
> +                                        sizeof *dev->dev.msix_entry_used);
>      return 0;
>  }
>  
> @@ -1452,6 +1599,8 @@ static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
>                  strerror(errno));
>      }
>      dev->msix_table_page = NULL;
> +    free(dev->dev.msix_entry_used);
> +    dev->dev.msix_entry_used = NULL;
>  }
>  
>  static int assigned_initfn(struct PCIDevice *pci_dev)
> -- 
> 1.7.0.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] qemu-kvm: device assignment: Some clean up about MSI-X code
  2010-11-04  6:18 ` [PATCH 2/3] qemu-kvm: device assignment: Some clean up about MSI-X code Sheng Yang
@ 2010-11-04  9:47   ` Michael S. Tsirkin
  2010-11-05  3:08     ` Sheng Yang
  0 siblings, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2010-11-04  9:47 UTC (permalink / raw)
  To: Sheng Yang; +Cc: Avi Kivity, Marcelo Tosatti, kvm

On Thu, Nov 04, 2010 at 02:18:20PM +0800, Sheng Yang wrote:
> 
> Signed-off-by: Sheng Yang <sheng@linux.intel.com>

If you have the energy for cleanups, how about
removing most of msi/msix code in device-assignment and using msi.c and
msix.c instead?

> ---
>  hw/device-assignment.c |   77 ++++++++++++++++++++++++++++++++---------------
>  1 files changed, 52 insertions(+), 25 deletions(-)
> 
> diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> index 2605bd1..8a98876 100644
> --- a/hw/device-assignment.c
> +++ b/hw/device-assignment.c
> @@ -1106,15 +1106,10 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev, unsigned int ctrl_pos)
>  #endif
>  
>  #ifdef KVM_CAP_DEVICE_MSIX
> -static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
> +static int get_msix_entries_max_nr(AssignedDevice *adev)
>  {
> -    AssignedDevice *adev = container_of(pci_dev, AssignedDevice, dev);
> -    uint16_t entries_nr = 0, entries_max_nr;
> -    int pos = 0, i, r = 0;
> -    uint32_t msg_addr, msg_upper_addr, msg_data, msg_ctrl;
> -    struct kvm_assigned_msix_nr msix_nr;
> -    struct kvm_assigned_msix_entry msix_entry;
> -    void *va = adev->msix_table_page;
> +    int pos, entries_max_nr;
> +    PCIDevice *pci_dev = &adev->dev;
>  
>      if (adev->cap.available & ASSIGNED_DEVICE_CAP_MSI)
>          pos = pci_dev->cap.start + PCI_CAPABILITY_CONFIG_MSI_LENGTH;
> @@ -1125,6 +1120,17 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
>      entries_max_nr &= PCI_MSIX_TABSIZE;
>      entries_max_nr += 1;
>  
> +    return entries_max_nr;
> +}
> +
> +static int get_msix_valid_entries_nr(AssignedDevice *adev,
> +				     uint16_t entries_max_nr)
> +{
> +    void *va = adev->msix_table_page;
> +    uint32_t msg_data, msg_ctrl;
> +    uint16_t entries_nr = 0;
> +    int i;
> +
>      /* Get the usable entry number for allocating */
>      for (i = 0; i < entries_max_nr; i++) {
>          memcpy(&msg_ctrl, va + i * 16 + 12, 4);
> @@ -1134,11 +1140,20 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
>              continue;
>          entries_nr ++;
>      }
> +    return entries_nr;
> +}
> +
> +static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev,
> +                                         uint16_t entries_nr,
> +                                         uint16_t entries_max_nr)
> +{
> +    AssignedDevice *adev = container_of(pci_dev, AssignedDevice, dev);
> +    int i, r = 0;
> +    uint32_t msg_addr, msg_upper_addr, msg_data, msg_ctrl;
> +    struct kvm_assigned_msix_nr msix_nr;
> +    struct kvm_assigned_msix_entry msix_entry;
> +    void *va = adev->msix_table_page;
>  
> -    if (entries_nr == 0) {
> -        fprintf(stderr, "MSI-X entry number is zero!\n");
> -        return -EINVAL;
> -    }
>      msix_nr.assigned_dev_id = calc_assigned_dev_id(adev->h_segnr, adev->h_busnr,
>                                            (uint8_t)adev->h_devfn);
>      msix_nr.entry_nr = entries_nr;
> @@ -1180,8 +1195,8 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
>          adev->entry[entries_nr].u.msi.address_lo = msg_addr;
>          adev->entry[entries_nr].u.msi.address_hi = msg_upper_addr;
>          adev->entry[entries_nr].u.msi.data = msg_data;
> -        DEBUG("MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!", msg_data, msg_addr);
> -	kvm_add_routing_entry(&adev->entry[entries_nr]);
> +        DEBUG("MSI-X data 0x%x, MSI-X addr_lo 0x%x!\n", msg_data, msg_addr);
> +        kvm_add_routing_entry(&adev->entry[entries_nr]);
>  
>          msix_entry.gsi = adev->entry[entries_nr].gsi;
>          msix_entry.entry = i;
> @@ -1190,7 +1205,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
>              fprintf(stderr, "fail to set MSI-X entry! %s\n", strerror(-r));
>              break;
>          }
> -        DEBUG("MSI-X entry gsi 0x%x, entry %d\n!",
> +        DEBUG("MSI-X entry gsi 0x%x, entry %d!\n",
>                  msix_entry.gsi, msix_entry.entry);
>          entries_nr ++;
>      }
> @@ -1203,12 +1218,12 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
>      return r;
>  }
>  
> -static void assigned_dev_update_msix(PCIDevice *pci_dev, unsigned int ctrl_pos)
> +static void assigned_dev_update_msix(PCIDevice *pci_dev, int enable_msix)
>  {
>      struct kvm_assigned_irq assigned_irq_data;
>      AssignedDevice *assigned_dev = container_of(pci_dev, AssignedDevice, dev);
> -    uint16_t *ctrl_word = (uint16_t *)(pci_dev->config + ctrl_pos);
>      int r;
> +    uint16_t entries_nr, entries_max_nr;
>  
>      memset(&assigned_irq_data, 0, sizeof assigned_irq_data);
>      assigned_irq_data.assigned_dev_id  =
> @@ -1219,8 +1234,7 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev, unsigned int ctrl_pos)
>       * try to catch this by only deassigning irqs if the guest is using
>       * MSIX or intends to start. */
>      if ((assigned_dev->irq_requested_type & KVM_DEV_IRQ_GUEST_MSIX) ||
> -        (*ctrl_word & PCI_MSIX_ENABLE)) {
> -
> +            enable_msix) {
>          assigned_irq_data.flags = assigned_dev->irq_requested_type;
>          free_dev_irq_entries(assigned_dev);
>          r = kvm_deassign_irq(kvm_context, &assigned_irq_data);
> @@ -1231,14 +1245,25 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev, unsigned int ctrl_pos)
>          assigned_dev->irq_requested_type = 0;
>      }
>  
> -    if (*ctrl_word & PCI_MSIX_ENABLE) {
> -        assigned_irq_data.flags = KVM_DEV_IRQ_HOST_MSIX |
> -                                  KVM_DEV_IRQ_GUEST_MSIX;
> -
> -        if (assigned_dev_update_msix_mmio(pci_dev) < 0) {
> +    entries_max_nr = get_msix_entries_max_nr(assigned_dev);
> +    if (entries_max_nr == 0) {
> +        fprintf(stderr, "assigned_dev_update_msix: MSI-X entries_max_nr == 0");
> +        return;
> +    }
> +    entries_nr = get_msix_valid_entries_nr(assigned_dev, entries_max_nr);
> +    if (entries_nr == 0) {
> +        if (enable_msix)
> +            fprintf(stderr, "MSI-X entry number is zero!\n");
> +        return;
> +    }
> +    if (enable_msix) {
> +        if (assigned_dev_update_msix_mmio(pci_dev,
> +                    entries_nr, entries_max_nr) < 0) {
>              perror("assigned_dev_update_msix_mmio");
>              return;
>          }
> +        assigned_irq_data.flags = KVM_DEV_IRQ_HOST_MSIX |
> +                                  KVM_DEV_IRQ_GUEST_MSIX;
>          if (kvm_assign_irq(kvm_context, &assigned_irq_data) < 0) {
>              perror("assigned_dev_enable_msix: assign irq");
>              return;
> @@ -1254,6 +1279,7 @@ static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev, uint32_t ad
>  {
>      AssignedDevice *assigned_dev = container_of(pci_dev, AssignedDevice, dev);
>      unsigned int pos = pci_dev->cap.start, ctrl_pos;
> +    uint16_t *ctrl_word;
>  
>      pci_default_cap_write_config(pci_dev, address, val, len);
>  #ifdef KVM_CAP_IRQ_ROUTING
> @@ -1270,7 +1296,8 @@ static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev, uint32_t ad
>          ctrl_pos = pos + 3;
>          if (address <= ctrl_pos && address + len > ctrl_pos) {
>              ctrl_pos--; /* control is word long */
> -            assigned_dev_update_msix(pci_dev, ctrl_pos);
> +            ctrl_word = (uint16_t *)(pci_dev->config + ctrl_pos);
> +            assigned_dev_update_msix(pci_dev, (*ctrl_word & PCI_MSIX_ENABLE));
>  	}
>          pos += PCI_CAPABILITY_CONFIG_MSIX_LENGTH;
>      }
> -- 
> 1.7.0.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/3] qemu-kvm: Ioctl for in-kernel mask support
  2010-11-04  6:18 ` [PATCH 1/3] qemu-kvm: Ioctl for in-kernel mask support Sheng Yang
@ 2010-11-04  9:47   ` Michael S. Tsirkin
  2010-11-05  2:59     ` Sheng Yang
  0 siblings, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2010-11-04  9:47 UTC (permalink / raw)
  To: Sheng Yang; +Cc: Avi Kivity, Marcelo Tosatti, kvm

On Thu, Nov 04, 2010 at 02:18:19PM +0800, Sheng Yang wrote:
> 
> Signed-off-by: Sheng Yang <sheng@linux.intel.com>
> ---
>  qemu-kvm.c |   15 +++++++++++++++
>  qemu-kvm.h |    6 ++++++
>  2 files changed, 21 insertions(+), 0 deletions(-)
> 
> diff --git a/qemu-kvm.c b/qemu-kvm.c
> index 733d0a9..ba6db51 100644
> --- a/qemu-kvm.c
> +++ b/qemu-kvm.c
> @@ -1092,6 +1092,21 @@ int kvm_assign_set_msix_entry(kvm_context_t kvm,
>  {
>      return kvm_vm_ioctl(kvm_state, KVM_ASSIGN_SET_MSIX_ENTRY, entry);
>  }
> +
> +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> +int kvm_assign_get_msix_entry(kvm_context_t kvm,
> +                              struct kvm_assigned_msix_entry *entry)
> +{
> +    return kvm_vm_ioctl(kvm_state, KVM_ASSIGN_GET_MSIX_ENTRY, entry);
> +}
> +
> +int kvm_assign_reg_msix_mmio(kvm_context_t kvm,
> +                           struct kvm_assigned_msix_mmio *msix_mmio)
> +{
> +    return kvm_vm_ioctl(kvm_state, KVM_ASSIGN_REG_MSIX_MMIO, msix_mmio);
> +}
> +#endif
> +
>  #endif
>  
>  #if defined(KVM_CAP_IRQFD) && defined(CONFIG_EVENTFD)
> diff --git a/qemu-kvm.h b/qemu-kvm.h
> index 9c08ab4..1afdd42 100644
> --- a/qemu-kvm.h
> +++ b/qemu-kvm.h
> @@ -743,6 +743,12 @@ int kvm_assign_set_msix_nr(kvm_context_t kvm,
>                             struct kvm_assigned_msix_nr *msix_nr);
>  int kvm_assign_set_msix_entry(kvm_context_t kvm,
>                                struct kvm_assigned_msix_entry *entry);
> +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> +int kvm_assign_get_msix_entry(kvm_context_t kvm,
> +                              struct kvm_assigned_msix_entry *entry);
> +int kvm_assign_reg_msix_mmio(kvm_context_t kvm,
> +                           struct kvm_assigned_msix_mmio *msix_mmio);
> +#endif
>  #endif
>  
>  #else                           /* !CONFIG_KVM */
> -- 
> 1.7.0.1


We are trying to move away from using ifdefs. Stub these out instead?

-- 
MST

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/3] qemu-kvm: device assignment: emulate MSI-X mask bits
  2010-11-04  6:18 ` [PATCH 3/3] qemu-kvm: device assignment: emulate MSI-X mask bits Sheng Yang
  2010-11-04  9:44   ` Michael S. Tsirkin
@ 2010-11-04 10:04   ` Michael S. Tsirkin
  2010-11-05  3:14     ` Sheng Yang
  1 sibling, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2010-11-04 10:04 UTC (permalink / raw)
  To: Sheng Yang; +Cc: Avi Kivity, Marcelo Tosatti, kvm

On Thu, Nov 04, 2010 at 02:18:21PM +0800, Sheng Yang wrote:
> This patch emulated MSI-X per vector mask bit on assigned device.
> 
> Signed-off-by: Sheng Yang <sheng@linux.intel.com>

Also pls update the in-tree header for the new ioctls.

> ---
>  hw/device-assignment.c |  161 ++++++++++++++++++++++++++++++++++++++++++++++--
>  1 files changed, 155 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> index 8a98876..639aa0b 100644
> --- a/hw/device-assignment.c
> +++ b/hw/device-assignment.c
> @@ -62,6 +62,11 @@ static void assigned_dev_load_option_rom(AssignedDevice *dev);
>  
>  static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev);
>  
> +static uint32_t calc_assigned_dev_id(uint16_t seg, uint8_t bus, uint8_t devfn)
> +{
> +    return (uint32_t)seg << 16 | (uint32_t)bus << 8 | (uint32_t)devfn;
> +}
> +
>  static uint32_t assigned_dev_ioport_rw(AssignedDevRegion *dev_region,
>                                         uint32_t addr, int len, uint32_t *val)
>  {
> @@ -264,6 +269,9 @@ static void assigned_dev_iomem_map(PCIDevice *pci_dev, int region_num,
>      AssignedDevRegion *region = &r_dev->v_addrs[region_num];
>      PCIRegion *real_region = &r_dev->real_device.regions[region_num];
>      int ret = 0;
> +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> +    struct kvm_assigned_msix_mmio msix_mmio;
> +#endif
>  
>      DEBUG("e_phys=%08" FMT_PCIBUS " r_virt=%p type=%d len=%08" FMT_PCIBUS " region_num=%d \n",
>            e_phys, region->u.r_virtbase, type, e_size, region_num);
> @@ -282,6 +290,16 @@ static void assigned_dev_iomem_map(PCIDevice *pci_dev, int region_num,
>  
>              cpu_register_physical_memory(e_phys + offset,
>                      TARGET_PAGE_SIZE, r_dev->mmio_index);
> +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> +	    memset(&msix_mmio, 0, sizeof(struct kvm_assigned_msix_mmio));
> +	    msix_mmio.assigned_dev_id = calc_assigned_dev_id(r_dev->h_segnr,
> +			    r_dev->h_busnr, r_dev->h_devfn);
> +	    msix_mmio.base_addr = e_phys + offset;
> +            /* We required kernel MSI-X support */
> +	    ret = kvm_assign_reg_msix_mmio(kvm_context, &msix_mmio);
> +	    if (ret)
> +                fprintf(stderr, "fail to register in-kernel msix_mmio!\n");
> +#endif
>          }
>      }
>  
> @@ -824,11 +842,6 @@ static void free_assigned_device(AssignedDevice *dev)
>      }
>  }
>  
> -static uint32_t calc_assigned_dev_id(uint16_t seg, uint8_t bus, uint8_t devfn)
> -{
> -    return (uint32_t)seg << 16 | (uint32_t)bus << 8 | (uint32_t)devfn;
> -}
> -
>  static void assign_failed_examine(AssignedDevice *dev)
>  {
>      char name[PATH_MAX], dir[PATH_MAX], driver[PATH_MAX] = {}, *ns;
> @@ -1123,6 +1136,27 @@ static int get_msix_entries_max_nr(AssignedDevice *adev)
>      return entries_max_nr;
>  }
>  
> +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> +static int assigned_dev_msix_entry_masked(AssignedDevice *adev, int entry)
> +{
> +    struct kvm_assigned_msix_entry msix_entry;
> +    int r;
> +
> +    memset(&msix_entry, 0, sizeof msix_entry);
> +    msix_entry.assigned_dev_id = calc_assigned_dev_id(adev->h_segnr,
> +            adev->h_busnr, (uint8_t)adev->h_devfn);
> +    msix_entry.entry = entry;
> +    msix_entry.flags = KVM_MSIX_FLAG_QUERY_MASK;
> +    r = kvm_assign_get_msix_entry(kvm_context, &msix_entry);
> +    if (r) {
> +        fprintf(stderr, "assigned_dev_msix_entry_masked: "
> +			"Fail to get mask bit of entry %d\n", entry);
> +        return 1;
> +    }
> +    return (msix_entry.flags & KVM_MSIX_FLAG_MASK);
> +}
> +#endif
> +
>  static int get_msix_valid_entries_nr(AssignedDevice *adev,
>  				     uint16_t entries_max_nr)
>  {
> @@ -1136,7 +1170,11 @@ static int get_msix_valid_entries_nr(AssignedDevice *adev,
>          memcpy(&msg_ctrl, va + i * 16 + 12, 4);
>          memcpy(&msg_data, va + i * 16 + 8, 4);
>          /* Ignore unused entry even it's unmasked */
> +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> +        if (assigned_dev_msix_entry_masked(adev, i))
> +#else
>          if (msg_data == 0)
> +#endif
>              continue;
>          entries_nr ++;
>      }
> @@ -1165,6 +1203,8 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev,
>      }
>  
>      free_dev_irq_entries(adev);
> +    memset(pci_dev->msix_entry_used, 0, KVM_MAX_MSIX_PER_DEV *
> +                                        sizeof(*pci_dev->msix_entry_used));
>      adev->irq_entries_nr = entries_nr;
>      adev->entry = calloc(entries_nr, sizeof(struct kvm_irq_routing_entry));
>      if (!adev->entry) {
> @@ -1179,7 +1219,11 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev,
>              break;
>          memcpy(&msg_ctrl, va + i * 16 + 12, 4);
>          memcpy(&msg_data, va + i * 16 + 8, 4);
> +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> +        if (assigned_dev_msix_entry_masked(adev, i))
> +#else
>          if (msg_data == 0)
> +#endif
>              continue;
>  
>          memcpy(&msg_addr, va + i * 16, 4);
> @@ -1200,6 +1244,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev,
>  
>          msix_entry.gsi = adev->entry[entries_nr].gsi;
>          msix_entry.entry = i;
> +        pci_dev->msix_entry_used[i] = 1;
>          r = kvm_assign_set_msix_entry(kvm_context, &msix_entry);
>          if (r) {
>              fprintf(stderr, "fail to set MSI-X entry! %s\n", strerror(-r));
> @@ -1243,6 +1288,8 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev, int enable_msix)
>              perror("assigned_dev_update_msix: deassign irq");
>  
>          assigned_dev->irq_requested_type = 0;
> +        memset(pci_dev->msix_entry_used, 0, KVM_MAX_MSIX_PER_DEV *
> +                                        sizeof(*pci_dev->msix_entry_used));
>      }
>  
>      entries_max_nr = get_msix_entries_max_nr(assigned_dev);
> @@ -1250,10 +1297,16 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev, int enable_msix)
>          fprintf(stderr, "assigned_dev_update_msix: MSI-X entries_max_nr == 0");
>          return;
>      }
> +    /*
> +     * Guest may try to enable MSI-X before setting MSI-X entry done, so
> +     * let's wait until guest unmask the entries.
> +     */
>      entries_nr = get_msix_valid_entries_nr(assigned_dev, entries_max_nr);
>      if (entries_nr == 0) {
> +#ifndef KVM_CAP_DEVICE_MSIX_MASK
>          if (enable_msix)
>              fprintf(stderr, "MSI-X entry number is zero!\n");
> +#endif
>          return;
>      }
>      if (enable_msix) {
> @@ -1297,7 +1350,8 @@ static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev, uint32_t ad
>          if (address <= ctrl_pos && address + len > ctrl_pos) {
>              ctrl_pos--; /* control is word long */
>              ctrl_word = (uint16_t *)(pci_dev->config + ctrl_pos);
> -            assigned_dev_update_msix(pci_dev, (*ctrl_word & PCI_MSIX_ENABLE));
> +            assigned_dev_update_msix(pci_dev,
> +                    (*ctrl_word & PCI_MSIX_ENABLE) && !(*ctrl_word & PCI_MSIX_MASK));
>  	}
>          pos += PCI_CAPABILITY_CONFIG_MSIX_LENGTH;
>      }
> @@ -1395,10 +1449,101 @@ static void msix_mmio_writel(void *opaque,
>      AssignedDevice *adev = opaque;
>      unsigned int offset = addr & 0xfff;
>      void *page = adev->msix_table_page;
> +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> +    int pos, ctrl_word, index;
> +    struct kvm_irq_routing_entry new_entry = {};
> +    int entry_idx, entries_max_nr, r = 0, i;
> +    uint32_t msg_ctrl, msg_data, msg_upper_addr, msg_addr;
> +#endif
>  
>      DEBUG("write to MSI-X entry table mmio offset 0x%lx, val 0x%x\n",
>  		    addr, val);
>      memcpy((void *)((char *)page + offset), &val, 4);
> +
> +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> +    index = offset / 16;
> +
> +    /* Check if mask bit is being accessed */
> +    memcpy(&msg_addr, (char *)page + index * 16, 4);
> +    memcpy(&msg_upper_addr, (char *)page + index * 16 + 4, 4);
> +    memcpy(&msg_data, (char *)page + index * 16 + 8, 4);
> +    memcpy(&msg_ctrl, (char *)page + index * 16 + 12, 4);
> +    DEBUG("MSI-X entries index %d: "
> +            "msg_addr 0x%x, msg_upper_addr 0x%x, msg_data 0x%x, vec_ctl 0x%x\n",
> +            index, msg_addr, msg_upper_addr, msg_data, msg_ctrl);
> +
> +    if (adev->cap.available & ASSIGNED_DEVICE_CAP_MSI)
> +        pos = adev->dev.cap.start + PCI_CAPABILITY_CONFIG_MSI_LENGTH;
> +    else
> +        pos = adev->dev.cap.start;
> +
> +    ctrl_word = *(uint16_t *)(adev->dev.config + pos + 2);
> +
> +    if (!((ctrl_word & PCI_MSIX_ENABLE) && !(ctrl_word & PCI_MSIX_MASK)))
> +        return;
> +
> +    if (!assigned_dev_msix_entry_masked(adev, index)) {
> +        if (!adev->dev.msix_entry_used[index]) {
> +            DEBUG("Try to modify unenabled MSI-X entry %d's mask. "
> +                    "Reenable MSI-X.\n",
> +                    index);
> +            assigned_dev_update_msix(&adev->dev, 1);
> +        }
> +        return;
> +    }
> +
> +    if (!adev->dev.msix_entry_used[index])
> +        return;
> +
> +    /*
> +     * We're here only because guest want to modify MSI data/addr, and
> +     * kernel would filter those writing with mask bit unset.
> +     */
> +    entries_max_nr = get_msix_entries_max_nr(adev);
> +
> +    /*
> +     * Find the index of routing entry, it can be different from 'index' if
> +     * empty entry existed in between
> +     */
> +    entry_idx = -1;
> +    for (i = 0; i <= index; i++) {
> +        if (adev->dev.msix_entry_used[i])
> +            entry_idx ++;
> +    }
> +    if (entry_idx >= entries_max_nr || entry_idx == -1) {
> +        fprintf(stderr, "msix_mmio_writel: Entry idx %d exceed limit!\n",
> +			entry_idx);
> +        return;
> +    }
> +
> +    if (!assigned_dev_msix_entry_masked(adev, index)) {
> +        fprintf(stderr, "msix_mmio_writel: Trying write to unmasked entry!\n");
> +        return;
> +    }
> +
> +    new_entry.gsi = adev->entry[entry_idx].gsi;
> +    new_entry.type = KVM_IRQ_ROUTING_MSI;
> +    new_entry.flags = 0;
> +    new_entry.u.msi.address_lo = msg_addr;
> +    new_entry.u.msi.address_hi = msg_upper_addr;
> +    new_entry.u.msi.data = msg_data;
> +    if (memcmp(&adev->entry[entry_idx].u.msi, &new_entry.u.msi,
> +                sizeof new_entry.u.msi)) {
> +        r = kvm_update_routing_entry(&adev->entry[entry_idx], &new_entry);
> +        if (r) {
> +            perror("msix_mmio_writel: kvm_update_routing_entry failed\n");
> +            return;
> +        }
> +        r = kvm_commit_irq_routes();
> +        if (r) {
> +            perror("msix_mmio_writel: kvm_commit_irq_routes failed\n");
> +            return;
> +        }
> +    }
> +    adev->entry[entry_idx].u.msi.address_lo = msg_addr;
> +    adev->entry[entry_idx].u.msi.address_hi = msg_upper_addr;
> +    adev->entry[entry_idx].u.msi.data = msg_data;
> +#endif
>  }
>  
>  static void msix_mmio_writew(void *opaque,
> @@ -1436,6 +1581,8 @@ static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
>      memset(dev->msix_table_page, 0, 0x1000);
>      dev->mmio_index = cpu_register_io_memory(
>                          msix_mmio_read, msix_mmio_write, dev);
> +    dev->dev.msix_entry_used = qemu_mallocz(KVM_MAX_MSIX_PER_DEV *
> +                                        sizeof *dev->dev.msix_entry_used);
>      return 0;
>  }
>  
> @@ -1452,6 +1599,8 @@ static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
>                  strerror(errno));
>      }
>      dev->msix_table_page = NULL;
> +    free(dev->dev.msix_entry_used);
> +    dev->dev.msix_entry_used = NULL;
>  }
>  
>  static int assigned_initfn(struct PCIDevice *pci_dev)
> -- 
> 1.7.0.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/3] qemu-kvm: Ioctl for in-kernel mask support
  2010-11-04  9:47   ` Michael S. Tsirkin
@ 2010-11-05  2:59     ` Sheng Yang
  2010-11-05  8:52       ` Michael S. Tsirkin
  0 siblings, 1 reply; 20+ messages in thread
From: Sheng Yang @ 2010-11-05  2:59 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Avi Kivity, Marcelo Tosatti, kvm

On Thursday 04 November 2010 17:47:58 Michael S. Tsirkin wrote:
> On Thu, Nov 04, 2010 at 02:18:19PM +0800, Sheng Yang wrote:
> > Signed-off-by: Sheng Yang <sheng@linux.intel.com>
> > ---
> > 
> >  qemu-kvm.c |   15 +++++++++++++++
> >  qemu-kvm.h |    6 ++++++
> >  2 files changed, 21 insertions(+), 0 deletions(-)
> > 
> > diff --git a/qemu-kvm.c b/qemu-kvm.c
> > index 733d0a9..ba6db51 100644
> > --- a/qemu-kvm.c
> > +++ b/qemu-kvm.c
> > @@ -1092,6 +1092,21 @@ int kvm_assign_set_msix_entry(kvm_context_t kvm,
> > 
> >  {
> >  
> >      return kvm_vm_ioctl(kvm_state, KVM_ASSIGN_SET_MSIX_ENTRY, entry);
> >  
> >  }
> > 
> > +
> > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > +int kvm_assign_get_msix_entry(kvm_context_t kvm,
> > +                              struct kvm_assigned_msix_entry *entry)
> > +{
> > +    return kvm_vm_ioctl(kvm_state, KVM_ASSIGN_GET_MSIX_ENTRY, entry);
> > +}
> > +
> > +int kvm_assign_reg_msix_mmio(kvm_context_t kvm,
> > +                           struct kvm_assigned_msix_mmio *msix_mmio)
> > +{
> > +    return kvm_vm_ioctl(kvm_state, KVM_ASSIGN_REG_MSIX_MMIO, msix_mmio);
> > +}
> > +#endif
> > +
> > 
> >  #endif
> >  
> >  #if defined(KVM_CAP_IRQFD) && defined(CONFIG_EVENTFD)
> > 
> > diff --git a/qemu-kvm.h b/qemu-kvm.h
> > index 9c08ab4..1afdd42 100644
> > --- a/qemu-kvm.h
> > +++ b/qemu-kvm.h
> > @@ -743,6 +743,12 @@ int kvm_assign_set_msix_nr(kvm_context_t kvm,
> > 
> >                             struct kvm_assigned_msix_nr *msix_nr);
> >  
> >  int kvm_assign_set_msix_entry(kvm_context_t kvm,
> >  
> >                                struct kvm_assigned_msix_entry *entry);
> > 
> > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > +int kvm_assign_get_msix_entry(kvm_context_t kvm,
> > +                              struct kvm_assigned_msix_entry *entry);
> > +int kvm_assign_reg_msix_mmio(kvm_context_t kvm,
> > +                           struct kvm_assigned_msix_mmio *msix_mmio);
> > +#endif
> > 
> >  #endif
> >  
> >  #else                           /* !CONFIG_KVM */
> 
> We are trying to move away from using ifdefs. Stub these out instead?

Example?

--
regards
Yang, Sheng

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] qemu-kvm: device assignment: Some clean up about MSI-X code
  2010-11-04  9:47   ` Michael S. Tsirkin
@ 2010-11-05  3:08     ` Sheng Yang
  0 siblings, 0 replies; 20+ messages in thread
From: Sheng Yang @ 2010-11-05  3:08 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Avi Kivity, Marcelo Tosatti, kvm

On Thursday 04 November 2010 17:47:16 Michael S. Tsirkin wrote:
> On Thu, Nov 04, 2010 at 02:18:20PM +0800, Sheng Yang wrote:
> > Signed-off-by: Sheng Yang <sheng@linux.intel.com>
> 
> If you have the energy for cleanups, how about
> removing most of msi/msix code in device-assignment and using msi.c and
> msix.c instead?

Ah, this one is only used to reduce the review difficulty. I won't mind if you want 
to merge it with the last one...

Use msi.c/msix.c totally another big clean up work to do. Let put it later...

--
regards
Yang, Sheng

> 
> > ---
> > 
> >  hw/device-assignment.c |   77
> >  ++++++++++++++++++++++++++++++++--------------- 1 files changed, 52
> >  insertions(+), 25 deletions(-)
> > 
> > diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> > index 2605bd1..8a98876 100644
> > --- a/hw/device-assignment.c
> > +++ b/hw/device-assignment.c
> > @@ -1106,15 +1106,10 @@ static void assigned_dev_update_msi(PCIDevice
> > *pci_dev, unsigned int ctrl_pos)
> > 
> >  #endif
> >  
> >  #ifdef KVM_CAP_DEVICE_MSIX
> > 
> > -static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
> > +static int get_msix_entries_max_nr(AssignedDevice *adev)
> > 
> >  {
> > 
> > -    AssignedDevice *adev = container_of(pci_dev, AssignedDevice, dev);
> > -    uint16_t entries_nr = 0, entries_max_nr;
> > -    int pos = 0, i, r = 0;
> > -    uint32_t msg_addr, msg_upper_addr, msg_data, msg_ctrl;
> > -    struct kvm_assigned_msix_nr msix_nr;
> > -    struct kvm_assigned_msix_entry msix_entry;
> > -    void *va = adev->msix_table_page;
> > +    int pos, entries_max_nr;
> > +    PCIDevice *pci_dev = &adev->dev;
> > 
> >      if (adev->cap.available & ASSIGNED_DEVICE_CAP_MSI)
> >      
> >          pos = pci_dev->cap.start + PCI_CAPABILITY_CONFIG_MSI_LENGTH;
> > 
> > @@ -1125,6 +1120,17 @@ static int assigned_dev_update_msix_mmio(PCIDevice
> > *pci_dev)
> > 
> >      entries_max_nr &= PCI_MSIX_TABSIZE;
> >      entries_max_nr += 1;
> > 
> > +    return entries_max_nr;
> > +}
> > +
> > +static int get_msix_valid_entries_nr(AssignedDevice *adev,
> > +				     uint16_t entries_max_nr)
> > +{
> > +    void *va = adev->msix_table_page;
> > +    uint32_t msg_data, msg_ctrl;
> > +    uint16_t entries_nr = 0;
> > +    int i;
> > +
> > 
> >      /* Get the usable entry number for allocating */
> >      for (i = 0; i < entries_max_nr; i++) {
> >      
> >          memcpy(&msg_ctrl, va + i * 16 + 12, 4);
> > 
> > @@ -1134,11 +1140,20 @@ static int
> > assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
> > 
> >              continue;
> >          
> >          entries_nr ++;
> >      
> >      }
> > 
> > +    return entries_nr;
> > +}
> > +
> > +static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev,
> > +                                         uint16_t entries_nr,
> > +                                         uint16_t entries_max_nr)
> > +{
> > +    AssignedDevice *adev = container_of(pci_dev, AssignedDevice, dev);
> > +    int i, r = 0;
> > +    uint32_t msg_addr, msg_upper_addr, msg_data, msg_ctrl;
> > +    struct kvm_assigned_msix_nr msix_nr;
> > +    struct kvm_assigned_msix_entry msix_entry;
> > +    void *va = adev->msix_table_page;
> > 
> > -    if (entries_nr == 0) {
> > -        fprintf(stderr, "MSI-X entry number is zero!\n");
> > -        return -EINVAL;
> > -    }
> > 
> >      msix_nr.assigned_dev_id = calc_assigned_dev_id(adev->h_segnr,
> >      adev->h_busnr,
> >      
> >                                            (uint8_t)adev->h_devfn);
> >      
> >      msix_nr.entry_nr = entries_nr;
> > 
> > @@ -1180,8 +1195,8 @@ static int assigned_dev_update_msix_mmio(PCIDevice
> > *pci_dev)
> > 
> >          adev->entry[entries_nr].u.msi.address_lo = msg_addr;
> >          adev->entry[entries_nr].u.msi.address_hi = msg_upper_addr;
> >          adev->entry[entries_nr].u.msi.data = msg_data;
> > 
> > -        DEBUG("MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!", msg_data,
> > msg_addr); -	kvm_add_routing_entry(&adev->entry[entries_nr]);
> > +        DEBUG("MSI-X data 0x%x, MSI-X addr_lo 0x%x!\n", msg_data,
> > msg_addr); +        kvm_add_routing_entry(&adev->entry[entries_nr]);
> > 
> >          msix_entry.gsi = adev->entry[entries_nr].gsi;
> >          msix_entry.entry = i;
> > 
> > @@ -1190,7 +1205,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice
> > *pci_dev)
> > 
> >              fprintf(stderr, "fail to set MSI-X entry! %s\n",
> >              strerror(-r)); break;
> >          
> >          }
> > 
> > -        DEBUG("MSI-X entry gsi 0x%x, entry %d\n!",
> > +        DEBUG("MSI-X entry gsi 0x%x, entry %d!\n",
> > 
> >                  msix_entry.gsi, msix_entry.entry);
> >          
> >          entries_nr ++;
> >      
> >      }
> > 
> > @@ -1203,12 +1218,12 @@ static int
> > assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
> > 
> >      return r;
> >  
> >  }
> > 
> > -static void assigned_dev_update_msix(PCIDevice *pci_dev, unsigned int
> > ctrl_pos) +static void assigned_dev_update_msix(PCIDevice *pci_dev, int
> > enable_msix)
> > 
> >  {
> >  
> >      struct kvm_assigned_irq assigned_irq_data;
> >      AssignedDevice *assigned_dev = container_of(pci_dev, AssignedDevice,
> >      dev);
> > 
> > -    uint16_t *ctrl_word = (uint16_t *)(pci_dev->config + ctrl_pos);
> > 
> >      int r;
> > 
> > +    uint16_t entries_nr, entries_max_nr;
> > 
> >      memset(&assigned_irq_data, 0, sizeof assigned_irq_data);
> >      assigned_irq_data.assigned_dev_id  =
> > 
> > @@ -1219,8 +1234,7 @@ static void assigned_dev_update_msix(PCIDevice
> > *pci_dev, unsigned int ctrl_pos)
> > 
> >       * try to catch this by only deassigning irqs if the guest is using
> >       * MSIX or intends to start. */
> >      
> >      if ((assigned_dev->irq_requested_type & KVM_DEV_IRQ_GUEST_MSIX) ||
> > 
> > -        (*ctrl_word & PCI_MSIX_ENABLE)) {
> > -
> > +            enable_msix) {
> > 
> >          assigned_irq_data.flags = assigned_dev->irq_requested_type;
> >          free_dev_irq_entries(assigned_dev);
> >          r = kvm_deassign_irq(kvm_context, &assigned_irq_data);
> > 
> > @@ -1231,14 +1245,25 @@ static void assigned_dev_update_msix(PCIDevice
> > *pci_dev, unsigned int ctrl_pos)
> > 
> >          assigned_dev->irq_requested_type = 0;
> >      
> >      }
> > 
> > -    if (*ctrl_word & PCI_MSIX_ENABLE) {
> > -        assigned_irq_data.flags = KVM_DEV_IRQ_HOST_MSIX |
> > -                                  KVM_DEV_IRQ_GUEST_MSIX;
> > -
> > -        if (assigned_dev_update_msix_mmio(pci_dev) < 0) {
> > +    entries_max_nr = get_msix_entries_max_nr(assigned_dev);
> > +    if (entries_max_nr == 0) {
> > +        fprintf(stderr, "assigned_dev_update_msix: MSI-X entries_max_nr
> > == 0"); +        return;
> > +    }
> > +    entries_nr = get_msix_valid_entries_nr(assigned_dev,
> > entries_max_nr); +    if (entries_nr == 0) {
> > +        if (enable_msix)
> > +            fprintf(stderr, "MSI-X entry number is zero!\n");
> > +        return;
> > +    }
> > +    if (enable_msix) {
> > +        if (assigned_dev_update_msix_mmio(pci_dev,
> > +                    entries_nr, entries_max_nr) < 0) {
> > 
> >              perror("assigned_dev_update_msix_mmio");
> >              return;
> >          
> >          }
> > 
> > +        assigned_irq_data.flags = KVM_DEV_IRQ_HOST_MSIX |
> > +                                  KVM_DEV_IRQ_GUEST_MSIX;
> > 
> >          if (kvm_assign_irq(kvm_context, &assigned_irq_data) < 0) {
> >          
> >              perror("assigned_dev_enable_msix: assign irq");
> >              return;
> > 
> > @@ -1254,6 +1279,7 @@ static void
> > assigned_device_pci_cap_write_config(PCIDevice *pci_dev, uint32_t ad
> > 
> >  {
> >  
> >      AssignedDevice *assigned_dev = container_of(pci_dev, AssignedDevice,
> >      dev); unsigned int pos = pci_dev->cap.start, ctrl_pos;
> > 
> > +    uint16_t *ctrl_word;
> > 
> >      pci_default_cap_write_config(pci_dev, address, val, len);
> >  
> >  #ifdef KVM_CAP_IRQ_ROUTING
> > 
> > @@ -1270,7 +1296,8 @@ static void
> > assigned_device_pci_cap_write_config(PCIDevice *pci_dev, uint32_t ad
> > 
> >          ctrl_pos = pos + 3;
> >          if (address <= ctrl_pos && address + len > ctrl_pos) {
> >          
> >              ctrl_pos--; /* control is word long */
> > 
> > -            assigned_dev_update_msix(pci_dev, ctrl_pos);
> > +            ctrl_word = (uint16_t *)(pci_dev->config + ctrl_pos);
> > +            assigned_dev_update_msix(pci_dev, (*ctrl_word &
> > PCI_MSIX_ENABLE));
> > 
> >  	}
> >  	
> >          pos += PCI_CAPABILITY_CONFIG_MSIX_LENGTH;
> >      
> >      }

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/3] qemu-kvm: device assignment: emulate MSI-X mask bits
  2010-11-04 10:04   ` Michael S. Tsirkin
@ 2010-11-05  3:14     ` Sheng Yang
  2010-11-05  8:59       ` Michael S. Tsirkin
  0 siblings, 1 reply; 20+ messages in thread
From: Sheng Yang @ 2010-11-05  3:14 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Avi Kivity, Marcelo Tosatti, kvm

On Thursday 04 November 2010 18:04:08 Michael S. Tsirkin wrote:
> On Thu, Nov 04, 2010 at 02:18:21PM +0800, Sheng Yang wrote:
> > This patch emulated MSI-X per vector mask bit on assigned device.
> > 
> > Signed-off-by: Sheng Yang <sheng@linux.intel.com>
> 
> Also pls update the in-tree header for the new ioctls.

I thought this should be done by maintainer? 

I just use one script to sync them everytime. I don't mind to send the diff out.

--
regards
Yang, Sheng

> 
> > ---
> > 
> >  hw/device-assignment.c |  161
> >  ++++++++++++++++++++++++++++++++++++++++++++++-- 1 files 
changed, 155
> >  insertions(+), 6 deletions(-)
> > 
> > diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> > index 8a98876..639aa0b 100644
> > --- a/hw/device-assignment.c
> > +++ b/hw/device-assignment.c
> > @@ -62,6 +62,11 @@ static void
> > assigned_dev_load_option_rom(AssignedDevice *dev);
> > 
> >  static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev);
> > 
> > +static uint32_t calc_assigned_dev_id(uint16_t seg, uint8_t bus, uint8_t
> > devfn) +{
> > +    return (uint32_t)seg << 16 | (uint32_t)bus << 8 | (uint32_t)devfn;
> > +}
> > +
> > 
> >  static uint32_t assigned_dev_ioport_rw(AssignedDevRegion *dev_region,
> >  
> >                                         uint32_t addr, int len, uint32_t
> >                                         *val)
> >  
> >  {
> > 
> > @@ -264,6 +269,9 @@ static void assigned_dev_iomem_map(PCIDevice
> > *pci_dev, int region_num,
> > 
> >      AssignedDevRegion *region = &r_dev->v_addrs[region_num];
> >      PCIRegion *real_region = &r_dev->real_device.regions[region_num];
> >      int ret = 0;
> > 
> > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > +    struct kvm_assigned_msix_mmio msix_mmio;
> > +#endif
> > 
> >      DEBUG("e_phys=%08" FMT_PCIBUS " r_virt=%p type=%d len=%08"
> >      FMT_PCIBUS " region_num=%d \n",
> >      
> >            e_phys, region->u.r_virtbase, type, e_size, region_num);
> > 
> > @@ -282,6 +290,16 @@ static void assigned_dev_iomem_map(PCIDevice
> > *pci_dev, int region_num,
> > 
> >              cpu_register_physical_memory(e_phys + offset,
> >              
> >                      TARGET_PAGE_SIZE, r_dev->mmio_index);
> > 
> > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > +	    memset(&msix_mmio, 0, sizeof(struct kvm_assigned_msix_mmio));
> > +	    msix_mmio.assigned_dev_id = calc_assigned_dev_id(r_dev->h_segnr,
> > +			    r_dev->h_busnr, r_dev->h_devfn);
> > +	    msix_mmio.base_addr = e_phys + offset;
> > +            /* We required kernel MSI-X support */
> > +	    ret = kvm_assign_reg_msix_mmio(kvm_context, &msix_mmio);
> > +	    if (ret)
> > +                fprintf(stderr, "fail to register in-kernel
> > msix_mmio!\n"); +#endif
> > 
> >          }
> >      
> >      }
> > 
> > @@ -824,11 +842,6 @@ static void free_assigned_device(AssignedDevice
> > *dev)
> > 
> >      }
> >  
> >  }
> > 
> > -static uint32_t calc_assigned_dev_id(uint16_t seg, uint8_t bus, uint8_t
> > devfn) -{
> > -    return (uint32_t)seg << 16 | (uint32_t)bus << 8 | (uint32_t)devfn;
> > -}
> > -
> > 
> >  static void assign_failed_examine(AssignedDevice *dev)
> >  {
> >  
> >      char name[PATH_MAX], dir[PATH_MAX], driver[PATH_MAX] = {}, *ns;
> > 
> > @@ -1123,6 +1136,27 @@ static int get_msix_entries_max_nr(AssignedDevice
> > *adev)
> > 
> >      return entries_max_nr;
> >  
> >  }
> > 
> > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > +static int assigned_dev_msix_entry_masked(AssignedDevice *adev, int
> > entry) +{
> > +    struct kvm_assigned_msix_entry msix_entry;
> > +    int r;
> > +
> > +    memset(&msix_entry, 0, sizeof msix_entry);
> > +    msix_entry.assigned_dev_id = calc_assigned_dev_id(adev->h_segnr,
> > +            adev->h_busnr, (uint8_t)adev->h_devfn);
> > +    msix_entry.entry = entry;
> > +    msix_entry.flags = KVM_MSIX_FLAG_QUERY_MASK;
> > +    r = kvm_assign_get_msix_entry(kvm_context, &msix_entry);
> > +    if (r) {
> > +        fprintf(stderr, "assigned_dev_msix_entry_masked: "
> > +			"Fail to get mask bit of entry %d\n", entry);
> > +        return 1;
> > +    }
> > +    return (msix_entry.flags & KVM_MSIX_FLAG_MASK);
> > +}
> > +#endif
> > +
> > 
> >  static int get_msix_valid_entries_nr(AssignedDevice *adev,
> >  
> >  				     uint16_t entries_max_nr)
> >  
> >  {
> > 
> > @@ -1136,7 +1170,11 @@ static int
> > get_msix_valid_entries_nr(AssignedDevice *adev,
> > 
> >          memcpy(&msg_ctrl, va + i * 16 + 12, 4);
> >          memcpy(&msg_data, va + i * 16 + 8, 4);
> >          /* Ignore unused entry even it's unmasked */
> > 
> > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > +        if (assigned_dev_msix_entry_masked(adev, i))
> > +#else
> > 
> >          if (msg_data == 0)
> > 
> > +#endif
> > 
> >              continue;
> >          
> >          entries_nr ++;
> >      
> >      }
> > 
> > @@ -1165,6 +1203,8 @@ static int assigned_dev_update_msix_mmio(PCIDevice
> > *pci_dev,
> > 
> >      }
> >      
> >      free_dev_irq_entries(adev);
> > 
> > +    memset(pci_dev->msix_entry_used, 0, KVM_MAX_MSIX_PER_DEV *
> > +                                       
> > sizeof(*pci_dev->msix_entry_used));
> > 
> >      adev->irq_entries_nr = entries_nr;
> >      adev->entry = calloc(entries_nr, sizeof(struct
> >      kvm_irq_routing_entry)); if (!adev->entry) {
> > 
> > @@ -1179,7 +1219,11 @@ static int assigned_dev_update_msix_mmio(PCIDevice
> > *pci_dev,
> > 
> >              break;
> >          
> >          memcpy(&msg_ctrl, va + i * 16 + 12, 4);
> >          memcpy(&msg_data, va + i * 16 + 8, 4);
> > 
> > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > +        if (assigned_dev_msix_entry_masked(adev, i))
> > +#else
> > 
> >          if (msg_data == 0)
> > 
> > +#endif
> > 
> >              continue;
> >          
> >          memcpy(&msg_addr, va + i * 16, 4);
> > 
> > @@ -1200,6 +1244,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice
> > *pci_dev,
> > 
> >          msix_entry.gsi = adev->entry[entries_nr].gsi;
> >          msix_entry.entry = i;
> > 
> > +        pci_dev->msix_entry_used[i] = 1;
> > 
> >          r = kvm_assign_set_msix_entry(kvm_context, &msix_entry);
> >          if (r) {
> >          
> >              fprintf(stderr, "fail to set MSI-X entry! %s\n",
> >              strerror(-r));
> > 
> > @@ -1243,6 +1288,8 @@ static void assigned_dev_update_msix(PCIDevice
> > *pci_dev, int enable_msix)
> > 
> >              perror("assigned_dev_update_msix: deassign irq");
> >          
> >          assigned_dev->irq_requested_type = 0;
> > 
> > +        memset(pci_dev->msix_entry_used, 0, KVM_MAX_MSIX_PER_DEV *
> > +                                       
> > sizeof(*pci_dev->msix_entry_used));
> > 
> >      }
> >      
> >      entries_max_nr = get_msix_entries_max_nr(assigned_dev);
> > 
> > @@ -1250,10 +1297,16 @@ static void assigned_dev_update_msix(PCIDevice
> > *pci_dev, int enable_msix)
> > 
> >          fprintf(stderr, "assigned_dev_update_msix: MSI-X entries_max_nr
> >          == 0"); return;
> >      
> >      }
> > 
> > +    /*
> > +     * Guest may try to enable MSI-X before setting MSI-X entry done, so
> > +     * let's wait until guest unmask the entries.
> > +     */
> > 
> >      entries_nr = get_msix_valid_entries_nr(assigned_dev,
> >      entries_max_nr); if (entries_nr == 0) {
> > 
> > +#ifndef KVM_CAP_DEVICE_MSIX_MASK
> > 
> >          if (enable_msix)
> >          
> >              fprintf(stderr, "MSI-X entry number is zero!\n");
> > 
> > +#endif
> > 
> >          return;
> >      
> >      }
> >      if (enable_msix) {
> > 
> > @@ -1297,7 +1350,8 @@ static void
> > assigned_device_pci_cap_write_config(PCIDevice *pci_dev, uint32_t ad
> > 
> >          if (address <= ctrl_pos && address + len > ctrl_pos) {
> >          
> >              ctrl_pos--; /* control is word long */
> >              ctrl_word = (uint16_t *)(pci_dev->config + ctrl_pos);
> > 
> > -            assigned_dev_update_msix(pci_dev, (*ctrl_word &
> > PCI_MSIX_ENABLE)); +            assigned_dev_update_msix(pci_dev,
> > +                    (*ctrl_word & PCI_MSIX_ENABLE) && !(*ctrl_word &
> > PCI_MSIX_MASK));
> > 
> >  	}
> >  	
> >          pos += PCI_CAPABILITY_CONFIG_MSIX_LENGTH;
> >      
> >      }
> > 
> > @@ -1395,10 +1449,101 @@ static void msix_mmio_writel(void *opaque,
> > 
> >      AssignedDevice *adev = opaque;
> >      unsigned int offset = addr & 0xfff;
> >      void *page = adev->msix_table_page;
> > 
> > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > +    int pos, ctrl_word, index;
> > +    struct kvm_irq_routing_entry new_entry = {};
> > +    int entry_idx, entries_max_nr, r = 0, i;
> > +    uint32_t msg_ctrl, msg_data, msg_upper_addr, msg_addr;
> > +#endif
> > 
> >      DEBUG("write to MSI-X entry table mmio offset 0x%lx, val 0x%x\n",
> >      
> >  		    addr, val);
> >      
> >      memcpy((void *)((char *)page + offset), &val, 4);
> > 
> > +
> > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > +    index = offset / 16;
> > +
> > +    /* Check if mask bit is being accessed */
> > +    memcpy(&msg_addr, (char *)page + index * 16, 4);
> > +    memcpy(&msg_upper_addr, (char *)page + index * 16 + 4, 4);
> > +    memcpy(&msg_data, (char *)page + index * 16 + 8, 4);
> > +    memcpy(&msg_ctrl, (char *)page + index * 16 + 12, 4);
> > +    DEBUG("MSI-X entries index %d: "
> > +            "msg_addr 0x%x, msg_upper_addr 0x%x, msg_data 0x%x, vec_ctl
> > 0x%x\n", +            index, msg_addr, msg_upper_addr, msg_data,
> > msg_ctrl); +
> > +    if (adev->cap.available & ASSIGNED_DEVICE_CAP_MSI)
> > +        pos = adev->dev.cap.start + PCI_CAPABILITY_CONFIG_MSI_LENGTH;
> > +    else
> > +        pos = adev->dev.cap.start;
> > +
> > +    ctrl_word = *(uint16_t *)(adev->dev.config + pos + 2);
> > +
> > +    if (!((ctrl_word & PCI_MSIX_ENABLE) && !(ctrl_word &
> > PCI_MSIX_MASK))) +        return;
> > +
> > +    if (!assigned_dev_msix_entry_masked(adev, index)) {
> > +        if (!adev->dev.msix_entry_used[index]) {
> > +            DEBUG("Try to modify unenabled MSI-X entry %d's mask. "
> > +                    "Reenable MSI-X.\n",
> > +                    index);
> > +            assigned_dev_update_msix(&adev->dev, 1);
> > +        }
> > +        return;
> > +    }
> > +
> > +    if (!adev->dev.msix_entry_used[index])
> > +        return;
> > +
> > +    /*
> > +     * We're here only because guest want to modify MSI data/addr, and
> > +     * kernel would filter those writing with mask bit unset.
> > +     */
> > +    entries_max_nr = get_msix_entries_max_nr(adev);
> > +
> > +    /*
> > +     * Find the index of routing entry, it can be different from 'index'
> > if +     * empty entry existed in between
> > +     */
> > +    entry_idx = -1;
> > +    for (i = 0; i <= index; i++) {
> > +        if (adev->dev.msix_entry_used[i])
> > +            entry_idx ++;
> > +    }
> > +    if (entry_idx >= entries_max_nr || entry_idx == -1) {
> > +        fprintf(stderr, "msix_mmio_writel: Entry idx %d exceed
> > limit!\n", +			entry_idx);
> > +        return;
> > +    }
> > +
> > +    if (!assigned_dev_msix_entry_masked(adev, index)) {
> > +        fprintf(stderr, "msix_mmio_writel: Trying write to unmasked
> > entry!\n"); +        return;
> > +    }
> > +
> > +    new_entry.gsi = adev->entry[entry_idx].gsi;
> > +    new_entry.type = KVM_IRQ_ROUTING_MSI;
> > +    new_entry.flags = 0;
> > +    new_entry.u.msi.address_lo = msg_addr;
> > +    new_entry.u.msi.address_hi = msg_upper_addr;
> > +    new_entry.u.msi.data = msg_data;
> > +    if (memcmp(&adev->entry[entry_idx].u.msi, &new_entry.u.msi,
> > +                sizeof new_entry.u.msi)) {
> > +        r = kvm_update_routing_entry(&adev->entry[entry_idx],
> > &new_entry); +        if (r) {
> > +            perror("msix_mmio_writel: kvm_update_routing_entry
> > failed\n"); +            return;
> > +        }
> > +        r = kvm_commit_irq_routes();
> > +        if (r) {
> > +            perror("msix_mmio_writel: kvm_commit_irq_routes failed\n");
> > +            return;
> > +        }
> > +    }
> > +    adev->entry[entry_idx].u.msi.address_lo = msg_addr;
> > +    adev->entry[entry_idx].u.msi.address_hi = msg_upper_addr;
> > +    adev->entry[entry_idx].u.msi.data = msg_data;
> > +#endif
> > 
> >  }
> >  
> >  static void msix_mmio_writew(void *opaque,
> > 
> > @@ -1436,6 +1581,8 @@ static int
> > assigned_dev_register_msix_mmio(AssignedDevice *dev)
> > 
> >      memset(dev->msix_table_page, 0, 0x1000);
> >      dev->mmio_index = cpu_register_io_memory(
> >      
> >                          msix_mmio_read, msix_mmio_write, dev);
> > 
> > +    dev->dev.msix_entry_used = qemu_mallocz(KVM_MAX_MSIX_PER_DEV *
> > +                                        sizeof
> > *dev->dev.msix_entry_used);
> > 
> >      return 0;
> >  
> >  }
> > 
> > @@ -1452,6 +1599,8 @@ static void
> > assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
> > 
> >                  strerror(errno));
> >      
> >      }
> >      dev->msix_table_page = NULL;
> > 
> > +    free(dev->dev.msix_entry_used);
> > +    dev->dev.msix_entry_used = NULL;
> > 
> >  }
> >  
> >  static int assigned_initfn(struct PCIDevice *pci_dev)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/3] qemu-kvm: device assignment: emulate MSI-X mask bits
  2010-11-04  9:44   ` Michael S. Tsirkin
@ 2010-11-05  3:20     ` Sheng Yang
  2010-11-05  9:05       ` Michael S. Tsirkin
  0 siblings, 1 reply; 20+ messages in thread
From: Sheng Yang @ 2010-11-05  3:20 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Avi Kivity, Marcelo Tosatti, kvm

On Thursday 04 November 2010 17:44:27 Michael S. Tsirkin wrote:

Thanks very much for reviewing this! Seems nobody cares about userspace one 
before...

> On Thu, Nov 04, 2010 at 02:18:21PM +0800, Sheng Yang wrote:
> > This patch emulated MSI-X per vector mask bit on assigned device.
> > 
> > Signed-off-by: Sheng Yang <sheng@linux.intel.com>
> > ---
> > 
> >  hw/device-assignment.c |  161
> >  ++++++++++++++++++++++++++++++++++++++++++++++-- 1 files 
changed, 155
> >  insertions(+), 6 deletions(-)
> > 
> > diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> > index 8a98876..639aa0b 100644
> > --- a/hw/device-assignment.c
> > +++ b/hw/device-assignment.c
> > @@ -62,6 +62,11 @@ static void
> > assigned_dev_load_option_rom(AssignedDevice *dev);
> > 
> >  static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev);
> > 
> > +static uint32_t calc_assigned_dev_id(uint16_t seg, uint8_t bus, uint8_t
> > devfn) +{
> > +    return (uint32_t)seg << 16 | (uint32_t)bus << 8 | (uint32_t)devfn;
> > +}
> > +
> > 
> >  static uint32_t assigned_dev_ioport_rw(AssignedDevRegion *dev_region,
> >  
> >                                         uint32_t addr, int len, uint32_t
> >                                         *val)
> >  
> >  {
> > 
> > @@ -264,6 +269,9 @@ static void assigned_dev_iomem_map(PCIDevice
> > *pci_dev, int region_num,
> > 
> >      AssignedDevRegion *region = &r_dev->v_addrs[region_num];
> >      PCIRegion *real_region = &r_dev->real_device.regions[region_num];
> >      int ret = 0;
> > 
> > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > +    struct kvm_assigned_msix_mmio msix_mmio;
> > +#endif
> > 
> >      DEBUG("e_phys=%08" FMT_PCIBUS " r_virt=%p type=%d len=%08"
> >      FMT_PCIBUS " region_num=%d \n",
> >      
> >            e_phys, region->u.r_virtbase, type, e_size, region_num);
> > 
> > @@ -282,6 +290,16 @@ static void assigned_dev_iomem_map(PCIDevice
> > *pci_dev, int region_num,
> > 
> >              cpu_register_physical_memory(e_phys + offset,
> >              
> >                      TARGET_PAGE_SIZE, r_dev->mmio_index);
> > 
> > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > +	    memset(&msix_mmio, 0, sizeof(struct kvm_assigned_msix_mmio));
> > +	    msix_mmio.assigned_dev_id = calc_assigned_dev_id(r_dev->h_segnr,
> > +			    r_dev->h_busnr, r_dev->h_devfn);
> > +	    msix_mmio.base_addr = e_phys + offset;
> > +            /* We required kernel MSI-X support */
> > +	    ret = kvm_assign_reg_msix_mmio(kvm_context, &msix_mmio);
> > +	    if (ret)
> > +                fprintf(stderr, "fail to register in-kernel
> > msix_mmio!\n"); +#endif
> > 
> >          }
> >      
> >      }
> > 
> > @@ -824,11 +842,6 @@ static void free_assigned_device(AssignedDevice
> > *dev)
> > 
> >      }
> >  
> >  }
> > 
> > -static uint32_t calc_assigned_dev_id(uint16_t seg, uint8_t bus, uint8_t
> > devfn) -{
> > -    return (uint32_t)seg << 16 | (uint32_t)bus << 8 | (uint32_t)devfn;
> > -}
> > -
> > 
> >  static void assign_failed_examine(AssignedDevice *dev)
> >  {
> >  
> >      char name[PATH_MAX], dir[PATH_MAX], driver[PATH_MAX] = {}, *ns;
> > 
> > @@ -1123,6 +1136,27 @@ static int get_msix_entries_max_nr(AssignedDevice
> > *adev)
> > 
> >      return entries_max_nr;
> >  
> >  }
> > 
> > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > +static int assigned_dev_msix_entry_masked(AssignedDevice *adev, int
> > entry) +{
> > +    struct kvm_assigned_msix_entry msix_entry;
> > +    int r;
> > +
> > +    memset(&msix_entry, 0, sizeof msix_entry);
> > +    msix_entry.assigned_dev_id = calc_assigned_dev_id(adev->h_segnr,
> > +            adev->h_busnr, (uint8_t)adev->h_devfn);
> > +    msix_entry.entry = entry;
> > +    msix_entry.flags = KVM_MSIX_FLAG_QUERY_MASK;
> > +    r = kvm_assign_get_msix_entry(kvm_context, &msix_entry);
> > +    if (r) {
> > +        fprintf(stderr, "assigned_dev_msix_entry_masked: "
> > +			"Fail to get mask bit of entry %d\n", entry);
> > +        return 1;
> 
> This error handling seems pretty useless. assert?

Maybe we can continue with it. Assert seems a little strict.
> 
> > +    }
> > +    return (msix_entry.flags & KVM_MSIX_FLAG_MASK);
> > +}
> > +#endif
> > +
> > 
> >  static int get_msix_valid_entries_nr(AssignedDevice *adev,
> >  
> >  				     uint16_t entries_max_nr)
> >  
> >  {
> > 
> > @@ -1136,7 +1170,11 @@ static int
> > get_msix_valid_entries_nr(AssignedDevice *adev,
> > 
> >          memcpy(&msg_ctrl, va + i * 16 + 12, 4);
> >          memcpy(&msg_data, va + i * 16 + 8, 4);
> >          /* Ignore unused entry even it's unmasked */
> > 
> > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > +        if (assigned_dev_msix_entry_masked(adev, i))
> > +#else
> > 
> >          if (msg_data == 0)
> > 
> > +#endif
> 
> So, we are replacing msg_data == 0 check with masked check?
> If yes why not do this for non-KVM_CAP_DEVICE_MSIX_MASK too?

Because the old one doesn't have idea about mask bit... 
> 
> >              continue;
> >          
> >          entries_nr ++;
> >      
> >      }
> > 
> > @@ -1165,6 +1203,8 @@ static int assigned_dev_update_msix_mmio(PCIDevice
> > *pci_dev,
> > 
> >      }
> >      
> >      free_dev_irq_entries(adev);
> > 
> > +    memset(pci_dev->msix_entry_used, 0, KVM_MAX_MSIX_PER_DEV *
> > +                                       
> > sizeof(*pci_dev->msix_entry_used));
> > 
> >      adev->irq_entries_nr = entries_nr;
> >      adev->entry = calloc(entries_nr, sizeof(struct
> >      kvm_irq_routing_entry)); if (!adev->entry) {
> > 
> > @@ -1179,7 +1219,11 @@ static int assigned_dev_update_msix_mmio(PCIDevice
> > *pci_dev,
> > 
> >              break;
> >          
> >          memcpy(&msg_ctrl, va + i * 16 + 12, 4);
> >          memcpy(&msg_data, va + i * 16 + 8, 4);
> > 
> > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > +        if (assigned_dev_msix_entry_masked(adev, i))
> > +#else
> > 
> >          if (msg_data == 0)
> > 
> > +#endif
> 
> You can't use ifdef to check that kernel supports an ioctl.
> You must check this at runtime.

Maybe we can convert them in bulk later.
> 
> >              continue;
> >          
> >          memcpy(&msg_addr, va + i * 16, 4);
> > 
> > @@ -1200,6 +1244,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice
> > *pci_dev,
> > 
> >          msix_entry.gsi = adev->entry[entries_nr].gsi;
> >          msix_entry.entry = i;
> > 
> > +        pci_dev->msix_entry_used[i] = 1;
> > 
> >          r = kvm_assign_set_msix_entry(kvm_context, &msix_entry);
> >          if (r) {
> >          
> >              fprintf(stderr, "fail to set MSI-X entry! %s\n",
> >              strerror(-r));
> > 
> > @@ -1243,6 +1288,8 @@ static void assigned_dev_update_msix(PCIDevice
> > *pci_dev, int enable_msix)
> > 
> >              perror("assigned_dev_update_msix: deassign irq");
> >          
> >          assigned_dev->irq_requested_type = 0;
> > 
> > +        memset(pci_dev->msix_entry_used, 0, KVM_MAX_MSIX_PER_DEV *
> > +                                       
> > sizeof(*pci_dev->msix_entry_used));
> > 
> >      }
> >      
> >      entries_max_nr = get_msix_entries_max_nr(assigned_dev);
> > 
> > @@ -1250,10 +1297,16 @@ static void assigned_dev_update_msix(PCIDevice
> > *pci_dev, int enable_msix)
> > 
> >          fprintf(stderr, "assigned_dev_update_msix: MSI-X entries_max_nr
> >          == 0"); return;
> >      
> >      }
> > 
> > +    /*
> > +     * Guest may try to enable MSI-X before setting MSI-X entry done, so
> > +     * let's wait until guest unmask the entries.
> > +     */
> 
> Well it can also set up any number of entries, enable msix then
> set up more entries. Now what?

It's the same. If it want to set up more entries, it have to unmask them. Then we 
would get them.
> 
> >      entries_nr = get_msix_valid_entries_nr(assigned_dev,
> >      entries_max_nr); if (entries_nr == 0) {
> > 
> > +#ifndef KVM_CAP_DEVICE_MSIX_MASK
> > 
> >          if (enable_msix)
> >          
> >              fprintf(stderr, "MSI-X entry number is zero!\n");
> 
> And what happens then?

MSI-X can't work for old ones without MSIX mask support. Reload the guest module 
may help. 

--
regards
Yang, Sheng

> 
> > +#endif
> > 
> >          return;
> >      
> >      }
> >      if (enable_msix) {
> > 
> > @@ -1297,7 +1350,8 @@ static void
> > assigned_device_pci_cap_write_config(PCIDevice *pci_dev, uint32_t ad
> > 
> >          if (address <= ctrl_pos && address + len > ctrl_pos) {
> >          
> >              ctrl_pos--; /* control is word long */
> >              ctrl_word = (uint16_t *)(pci_dev->config + ctrl_pos);
> > 
> > -            assigned_dev_update_msix(pci_dev, (*ctrl_word &
> > PCI_MSIX_ENABLE)); +            assigned_dev_update_msix(pci_dev,
> > +                    (*ctrl_word & PCI_MSIX_ENABLE) && !(*ctrl_word &
> > PCI_MSIX_MASK));
> > 
> >  	}
> >  	
> >          pos += PCI_CAPABILITY_CONFIG_MSIX_LENGTH;
> >      
> >      }
> > 
> > @@ -1395,10 +1449,101 @@ static void msix_mmio_writel(void *opaque,
> > 
> >      AssignedDevice *adev = opaque;
> >      unsigned int offset = addr & 0xfff;
> >      void *page = adev->msix_table_page;
> > 
> > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > +    int pos, ctrl_word, index;
> > +    struct kvm_irq_routing_entry new_entry = {};
> > +    int entry_idx, entries_max_nr, r = 0, i;
> > +    uint32_t msg_ctrl, msg_data, msg_upper_addr, msg_addr;
> > +#endif
> > 
> >      DEBUG("write to MSI-X entry table mmio offset 0x%lx, val 0x%x\n",
> >      
> >  		    addr, val);
> >      
> >      memcpy((void *)((char *)page + offset), &val, 4);
> > 
> > +
> > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > +    index = offset / 16;
> > +
> > +    /* Check if mask bit is being accessed */
> > +    memcpy(&msg_addr, (char *)page + index * 16, 4);
> > +    memcpy(&msg_upper_addr, (char *)page + index * 16 + 4, 4);
> > +    memcpy(&msg_data, (char *)page + index * 16 + 8, 4);
> > +    memcpy(&msg_ctrl, (char *)page + index * 16 + 12, 4);
> > +    DEBUG("MSI-X entries index %d: "
> > +            "msg_addr 0x%x, msg_upper_addr 0x%x, msg_data 0x%x, vec_ctl
> > 0x%x\n", +            index, msg_addr, msg_upper_addr, msg_data,
> > msg_ctrl); +
> > +    if (adev->cap.available & ASSIGNED_DEVICE_CAP_MSI)
> > +        pos = adev->dev.cap.start + PCI_CAPABILITY_CONFIG_MSI_LENGTH;
> > +    else
> > +        pos = adev->dev.cap.start;
> > +
> > +    ctrl_word = *(uint16_t *)(adev->dev.config + pos + 2);
> > +
> > +    if (!((ctrl_word & PCI_MSIX_ENABLE) && !(ctrl_word &
> > PCI_MSIX_MASK))) +        return;
> > +
> > +    if (!assigned_dev_msix_entry_masked(adev, index)) {
> > +        if (!adev->dev.msix_entry_used[index]) {
> > +            DEBUG("Try to modify unenabled MSI-X entry %d's mask. "
> > +                    "Reenable MSI-X.\n",
> > +                    index);
> > +            assigned_dev_update_msix(&adev->dev, 1);
> > +        }
> > +        return;
> > +    }
> > +
> > +    if (!adev->dev.msix_entry_used[index])
> > +        return;
> > +
> > +    /*
> > +     * We're here only because guest want to modify MSI data/addr, and
> > +     * kernel would filter those writing with mask bit unset.
> > +     */
> > +    entries_max_nr = get_msix_entries_max_nr(adev);
> > +
> > +    /*
> > +     * Find the index of routing entry, it can be different from 'index'
> > if +     * empty entry existed in between
> > +     */
> > +    entry_idx = -1;
> > +    for (i = 0; i <= index; i++) {
> > +        if (adev->dev.msix_entry_used[i])
> > +            entry_idx ++;
> > +    }
> > +    if (entry_idx >= entries_max_nr || entry_idx == -1) {
> > +        fprintf(stderr, "msix_mmio_writel: Entry idx %d exceed
> > limit!\n", +			entry_idx);
> > +        return;
> > +    }
> > +
> > +    if (!assigned_dev_msix_entry_masked(adev, index)) {
> > +        fprintf(stderr, "msix_mmio_writel: Trying write to unmasked
> > entry!\n"); +        return;
> > +    }
> > +
> > +    new_entry.gsi = adev->entry[entry_idx].gsi;
> > +    new_entry.type = KVM_IRQ_ROUTING_MSI;
> > +    new_entry.flags = 0;
> > +    new_entry.u.msi.address_lo = msg_addr;
> > +    new_entry.u.msi.address_hi = msg_upper_addr;
> > +    new_entry.u.msi.data = msg_data;
> > +    if (memcmp(&adev->entry[entry_idx].u.msi, &new_entry.u.msi,
> > +                sizeof new_entry.u.msi)) {
> > +        r = kvm_update_routing_entry(&adev->entry[entry_idx],
> > &new_entry); +        if (r) {
> > +            perror("msix_mmio_writel: kvm_update_routing_entry
> > failed\n"); +            return;
> > +        }
> > +        r = kvm_commit_irq_routes();
> > +        if (r) {
> > +            perror("msix_mmio_writel: kvm_commit_irq_routes failed\n");
> > +            return;
> > +        }
> > +    }
> > +    adev->entry[entry_idx].u.msi.address_lo = msg_addr;
> > +    adev->entry[entry_idx].u.msi.address_hi = msg_upper_addr;
> > +    adev->entry[entry_idx].u.msi.data = msg_data;
> > +#endif
> > 
> >  }
> >  
> >  static void msix_mmio_writew(void *opaque,
> > 
> > @@ -1436,6 +1581,8 @@ static int
> > assigned_dev_register_msix_mmio(AssignedDevice *dev)
> > 
> >      memset(dev->msix_table_page, 0, 0x1000);
> >      dev->mmio_index = cpu_register_io_memory(
> >      
> >                          msix_mmio_read, msix_mmio_write, dev);
> > 
> > +    dev->dev.msix_entry_used = qemu_mallocz(KVM_MAX_MSIX_PER_DEV *
> > +                                        sizeof
> > *dev->dev.msix_entry_used);
> > 
> >      return 0;
> >  
> >  }
> > 
> > @@ -1452,6 +1599,8 @@ static void
> > assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
> > 
> >                  strerror(errno));
> >      
> >      }
> >      dev->msix_table_page = NULL;
> > 
> > +    free(dev->dev.msix_entry_used);
> > +    dev->dev.msix_entry_used = NULL;
> > 
> >  }
> >  
> >  static int assigned_initfn(struct PCIDevice *pci_dev)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/3] qemu-kvm: Ioctl for in-kernel mask support
  2010-11-05  2:59     ` Sheng Yang
@ 2010-11-05  8:52       ` Michael S. Tsirkin
  2010-11-11  6:20         ` Sheng Yang
  0 siblings, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2010-11-05  8:52 UTC (permalink / raw)
  To: Sheng Yang; +Cc: Avi Kivity, Marcelo Tosatti, kvm

On Fri, Nov 05, 2010 at 10:59:23AM +0800, Sheng Yang wrote:
> > We are trying to move away from using ifdefs. Stub these out instead?
> 
> Example?

37859f054986fab6b4094cd5950266ae56a6ca6a and
889e30cc18e21f2091b77267dca8096d7dd34f8b that depends on it.

> --
> regards
> Yang, Sheng

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/3] qemu-kvm: device assignment: emulate MSI-X mask bits
  2010-11-05  3:14     ` Sheng Yang
@ 2010-11-05  8:59       ` Michael S. Tsirkin
  0 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2010-11-05  8:59 UTC (permalink / raw)
  To: Sheng Yang; +Cc: Avi Kivity, Marcelo Tosatti, kvm

On Fri, Nov 05, 2010 at 11:14:48AM +0800, Sheng Yang wrote:
> On Thursday 04 November 2010 18:04:08 Michael S. Tsirkin wrote:
> > On Thu, Nov 04, 2010 at 02:18:21PM +0800, Sheng Yang wrote:
> > > This patch emulated MSI-X per vector mask bit on assigned device.
> > > 
> > > Signed-off-by: Sheng Yang <sheng@linux.intel.com>
> > 
> > Also pls update the in-tree header for the new ioctls.
> 
> I thought this should be done by maintainer? 

No idea. I do this for my patches, no one complained.

> I just use one script to sync them everytime. I don't mind to send the diff out.

Just make sure there's no extra cruft which scripts sometimes produce ...


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/3] qemu-kvm: device assignment: emulate MSI-X mask bits
  2010-11-05  3:20     ` Sheng Yang
@ 2010-11-05  9:05       ` Michael S. Tsirkin
  2010-11-05 11:02         ` Sheng Yang
  0 siblings, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2010-11-05  9:05 UTC (permalink / raw)
  To: Sheng Yang; +Cc: Avi Kivity, Marcelo Tosatti, kvm

On Fri, Nov 05, 2010 at 11:20:37AM +0800, Sheng Yang wrote:
> On Thursday 04 November 2010 17:44:27 Michael S. Tsirkin wrote:
> 
> Thanks very much for reviewing this! Seems nobody cares about userspace one 
> before...
> 
> > On Thu, Nov 04, 2010 at 02:18:21PM +0800, Sheng Yang wrote:
> > > This patch emulated MSI-X per vector mask bit on assigned device.
> > > 
> > > Signed-off-by: Sheng Yang <sheng@linux.intel.com>
> > > ---
> > > 
> > >  hw/device-assignment.c |  161
> > >  ++++++++++++++++++++++++++++++++++++++++++++++-- 1 files 
> changed, 155
> > >  insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> > > index 8a98876..639aa0b 100644
> > > --- a/hw/device-assignment.c
> > > +++ b/hw/device-assignment.c
> > > @@ -62,6 +62,11 @@ static void
> > > assigned_dev_load_option_rom(AssignedDevice *dev);
> > > 
> > >  static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev);
> > > 
> > > +static uint32_t calc_assigned_dev_id(uint16_t seg, uint8_t bus, uint8_t
> > > devfn) +{
> > > +    return (uint32_t)seg << 16 | (uint32_t)bus << 8 | (uint32_t)devfn;
> > > +}
> > > +
> > > 
> > >  static uint32_t assigned_dev_ioport_rw(AssignedDevRegion *dev_region,
> > >  
> > >                                         uint32_t addr, int len, uint32_t
> > >                                         *val)
> > >  
> > >  {
> > > 
> > > @@ -264,6 +269,9 @@ static void assigned_dev_iomem_map(PCIDevice
> > > *pci_dev, int region_num,
> > > 
> > >      AssignedDevRegion *region = &r_dev->v_addrs[region_num];
> > >      PCIRegion *real_region = &r_dev->real_device.regions[region_num];
> > >      int ret = 0;
> > > 
> > > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > > +    struct kvm_assigned_msix_mmio msix_mmio;
> > > +#endif
> > > 
> > >      DEBUG("e_phys=%08" FMT_PCIBUS " r_virt=%p type=%d len=%08"
> > >      FMT_PCIBUS " region_num=%d \n",
> > >      
> > >            e_phys, region->u.r_virtbase, type, e_size, region_num);
> > > 
> > > @@ -282,6 +290,16 @@ static void assigned_dev_iomem_map(PCIDevice
> > > *pci_dev, int region_num,
> > > 
> > >              cpu_register_physical_memory(e_phys + offset,
> > >              
> > >                      TARGET_PAGE_SIZE, r_dev->mmio_index);
> > > 
> > > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > > +	    memset(&msix_mmio, 0, sizeof(struct kvm_assigned_msix_mmio));
> > > +	    msix_mmio.assigned_dev_id = calc_assigned_dev_id(r_dev->h_segnr,
> > > +			    r_dev->h_busnr, r_dev->h_devfn);
> > > +	    msix_mmio.base_addr = e_phys + offset;
> > > +            /* We required kernel MSI-X support */
> > > +	    ret = kvm_assign_reg_msix_mmio(kvm_context, &msix_mmio);
> > > +	    if (ret)
> > > +                fprintf(stderr, "fail to register in-kernel
> > > msix_mmio!\n"); +#endif
> > > 
> > >          }
> > >      
> > >      }
> > > 
> > > @@ -824,11 +842,6 @@ static void free_assigned_device(AssignedDevice
> > > *dev)
> > > 
> > >      }
> > >  
> > >  }
> > > 
> > > -static uint32_t calc_assigned_dev_id(uint16_t seg, uint8_t bus, uint8_t
> > > devfn) -{
> > > -    return (uint32_t)seg << 16 | (uint32_t)bus << 8 | (uint32_t)devfn;
> > > -}
> > > -
> > > 
> > >  static void assign_failed_examine(AssignedDevice *dev)
> > >  {
> > >  
> > >      char name[PATH_MAX], dir[PATH_MAX], driver[PATH_MAX] = {}, *ns;
> > > 
> > > @@ -1123,6 +1136,27 @@ static int get_msix_entries_max_nr(AssignedDevice
> > > *adev)
> > > 
> > >      return entries_max_nr;
> > >  
> > >  }
> > > 
> > > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > > +static int assigned_dev_msix_entry_masked(AssignedDevice *adev, int
> > > entry) +{
> > > +    struct kvm_assigned_msix_entry msix_entry;
> > > +    int r;
> > > +
> > > +    memset(&msix_entry, 0, sizeof msix_entry);
> > > +    msix_entry.assigned_dev_id = calc_assigned_dev_id(adev->h_segnr,
> > > +            adev->h_busnr, (uint8_t)adev->h_devfn);
> > > +    msix_entry.entry = entry;
> > > +    msix_entry.flags = KVM_MSIX_FLAG_QUERY_MASK;
> > > +    r = kvm_assign_get_msix_entry(kvm_context, &msix_entry);
> > > +    if (r) {
> > > +        fprintf(stderr, "assigned_dev_msix_entry_masked: "
> > > +			"Fail to get mask bit of entry %d\n", entry);
> > > +        return 1;
> > 
> > This error handling seems pretty useless. assert?
> 
> Maybe we can continue with it. Assert seems a little strict.

Well need to consider whether to return 1 or 0 then.

> > 
> > > +    }
> > > +    return (msix_entry.flags & KVM_MSIX_FLAG_MASK);
> > > +}
> > > +#endif
> > > +
> > > 
> > >  static int get_msix_valid_entries_nr(AssignedDevice *adev,
> > >  
> > >  				     uint16_t entries_max_nr)
> > >  
> > >  {
> > > 
> > > @@ -1136,7 +1170,11 @@ static int
> > > get_msix_valid_entries_nr(AssignedDevice *adev,
> > > 
> > >          memcpy(&msg_ctrl, va + i * 16 + 12, 4);
> > >          memcpy(&msg_data, va + i * 16 + 8, 4);
> > >          /* Ignore unused entry even it's unmasked */
> > > 
> > > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > > +        if (assigned_dev_msix_entry_masked(adev, i))
> > > +#else
> > > 
> > >          if (msg_data == 0)
> > > 
> > > +#endif
> > 
> > So, we are replacing msg_data == 0 check with masked check?
> > If yes why not do this for non-KVM_CAP_DEVICE_MSIX_MASK too?
> 
> Because the old one doesn't have idea about mask bit... 

So track mask bits in userspace.

> > 
> > >              continue;
> > >          
> > >          entries_nr ++;
> > >      
> > >      }
> > > 
> > > @@ -1165,6 +1203,8 @@ static int assigned_dev_update_msix_mmio(PCIDevice
> > > *pci_dev,
> > > 
> > >      }
> > >      
> > >      free_dev_irq_entries(adev);
> > > 
> > > +    memset(pci_dev->msix_entry_used, 0, KVM_MAX_MSIX_PER_DEV *
> > > +                                       
> > > sizeof(*pci_dev->msix_entry_used));
> > > 
> > >      adev->irq_entries_nr = entries_nr;
> > >      adev->entry = calloc(entries_nr, sizeof(struct
> > >      kvm_irq_routing_entry)); if (!adev->entry) {
> > > 
> > > @@ -1179,7 +1219,11 @@ static int assigned_dev_update_msix_mmio(PCIDevice
> > > *pci_dev,
> > > 
> > >              break;
> > >          
> > >          memcpy(&msg_ctrl, va + i * 16 + 12, 4);
> > >          memcpy(&msg_data, va + i * 16 + 8, 4);
> > > 
> > > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > > +        if (assigned_dev_msix_entry_masked(adev, i))
> > > +#else
> > > 
> > >          if (msg_data == 0)
> > > 
> > > +#endif
> > 
> > You can't use ifdef to check that kernel supports an ioctl.
> > You must check this at runtime.
> 
> Maybe we can convert them in bulk later.

Well you are adding a bug, all writes on old kernels
will spew out a ton of stderr.
We can't just assume new kernels.

> > 
> > >              continue;
> > >          
> > >          memcpy(&msg_addr, va + i * 16, 4);
> > > 
> > > @@ -1200,6 +1244,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice
> > > *pci_dev,
> > > 
> > >          msix_entry.gsi = adev->entry[entries_nr].gsi;
> > >          msix_entry.entry = i;
> > > 
> > > +        pci_dev->msix_entry_used[i] = 1;
> > > 
> > >          r = kvm_assign_set_msix_entry(kvm_context, &msix_entry);
> > >          if (r) {
> > >          
> > >              fprintf(stderr, "fail to set MSI-X entry! %s\n",
> > >              strerror(-r));
> > > 
> > > @@ -1243,6 +1288,8 @@ static void assigned_dev_update_msix(PCIDevice
> > > *pci_dev, int enable_msix)
> > > 
> > >              perror("assigned_dev_update_msix: deassign irq");
> > >          
> > >          assigned_dev->irq_requested_type = 0;
> > > 
> > > +        memset(pci_dev->msix_entry_used, 0, KVM_MAX_MSIX_PER_DEV *
> > > +                                       
> > > sizeof(*pci_dev->msix_entry_used));
> > > 
> > >      }
> > >      
> > >      entries_max_nr = get_msix_entries_max_nr(assigned_dev);
> > > 
> > > @@ -1250,10 +1297,16 @@ static void assigned_dev_update_msix(PCIDevice
> > > *pci_dev, int enable_msix)
> > > 
> > >          fprintf(stderr, "assigned_dev_update_msix: MSI-X entries_max_nr
> > >          == 0"); return;
> > >      
> > >      }
> > > 
> > > +    /*
> > > +     * Guest may try to enable MSI-X before setting MSI-X entry done, so
> > > +     * let's wait until guest unmask the entries.
> > > +     */
> > 
> > Well it can also set up any number of entries, enable msix then
> > set up more entries. Now what?
> 
> It's the same. If it want to set up more entries, it have to unmask them. Then we 
> would get them.

Why can't we handle the first enable in the same way?

> > 
> > >      entries_nr = get_msix_valid_entries_nr(assigned_dev,
> > >      entries_max_nr); if (entries_nr == 0) {
> > > 
> > > +#ifndef KVM_CAP_DEVICE_MSIX_MASK
> > > 
> > >          if (enable_msix)
> > >          
> > >              fprintf(stderr, "MSI-X entry number is zero!\n");
> > 
> > And what happens then?
> 
> MSI-X can't work for old ones without MSIX mask support.

Old ones?

> Reload the guest module 
> may help. 

Guest might not have any concept of modules.

> --
> regards
> Yang, Sheng

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/3] qemu-kvm: device assignment: emulate MSI-X mask bits
  2010-11-05  9:05       ` Michael S. Tsirkin
@ 2010-11-05 11:02         ` Sheng Yang
  2010-11-05 13:54           ` Michael S. Tsirkin
  0 siblings, 1 reply; 20+ messages in thread
From: Sheng Yang @ 2010-11-05 11:02 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Avi Kivity, Marcelo Tosatti, kvm

On Friday 05 November 2010 17:05:32 Michael S. Tsirkin wrote:
> On Fri, Nov 05, 2010 at 11:20:37AM +0800, Sheng Yang wrote:
> > On Thursday 04 November 2010 17:44:27 Michael S. Tsirkin wrote:
> > 
> > Thanks very much for reviewing this! Seems nobody cares about userspace
> > one before...
> > 
> > > On Thu, Nov 04, 2010 at 02:18:21PM +0800, Sheng Yang wrote:
> > > > This patch emulated MSI-X per vector mask bit on assigned device.
> > > > 
> > > > Signed-off-by: Sheng Yang <sheng@linux.intel.com>
> > > > ---
> > > > 
> > > >  hw/device-assignment.c |  161
> > > >  ++++++++++++++++++++++++++++++++++++++++++++++-- 1 files
> > 
> > changed, 155
> > 
> > > >  insertions(+), 6 deletions(-)
> > > > 
> > > > diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> > > > index 8a98876..639aa0b 100644
> > > > --- a/hw/device-assignment.c
> > > > +++ b/hw/device-assignment.c
> > > > @@ -62,6 +62,11 @@ static void
> > > > assigned_dev_load_option_rom(AssignedDevice *dev);
> > > > 
> > > >  static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev);
> > > > 
> > > > +static uint32_t calc_assigned_dev_id(uint16_t seg, uint8_t bus,
> > > > uint8_t devfn) +{
> > > > +    return (uint32_t)seg << 16 | (uint32_t)bus << 8 |
> > > > (uint32_t)devfn; +}
> > > > +
> > > > 
> > > >  static uint32_t assigned_dev_ioport_rw(AssignedDevRegion
> > > >  *dev_region,
> > > >  
> > > >                                         uint32_t addr, int len,
> > > >                                         uint32_t *val)
> > > >  
> > > >  {
> > > > 
> > > > @@ -264,6 +269,9 @@ static void assigned_dev_iomem_map(PCIDevice
> > > > *pci_dev, int region_num,
> > > > 
> > > >      AssignedDevRegion *region = &r_dev->v_addrs[region_num];
> > > >      PCIRegion *real_region =
> > > >      &r_dev->real_device.regions[region_num]; int ret = 0;
> > > > 
> > > > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > > > +    struct kvm_assigned_msix_mmio msix_mmio;
> > > > +#endif
> > > > 
> > > >      DEBUG("e_phys=%08" FMT_PCIBUS " r_virt=%p type=%d len=%08"
> > > >      FMT_PCIBUS " region_num=%d \n",
> > > >      
> > > >            e_phys, region->u.r_virtbase, type, e_size, region_num);
> > > > 
> > > > @@ -282,6 +290,16 @@ static void assigned_dev_iomem_map(PCIDevice
> > > > *pci_dev, int region_num,
> > > > 
> > > >              cpu_register_physical_memory(e_phys + offset,
> > > >              
> > > >                      TARGET_PAGE_SIZE, r_dev->mmio_index);
> > > > 
> > > > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > > > +	    memset(&msix_mmio, 0, sizeof(struct kvm_assigned_msix_mmio));
> > > > +	    msix_mmio.assigned_dev_id =
> > > > calc_assigned_dev_id(r_dev->h_segnr, +			    r_dev->h_busnr,
> > > > r_dev->h_devfn);
> > > > +	    msix_mmio.base_addr = e_phys + offset;
> > > > +            /* We required kernel MSI-X support */
> > > > +	    ret = kvm_assign_reg_msix_mmio(kvm_context, &msix_mmio);
> > > > +	    if (ret)
> > > > +                fprintf(stderr, "fail to register in-kernel
> > > > msix_mmio!\n"); +#endif
> > > > 
> > > >          }
> > > >      
> > > >      }
> > > > 
> > > > @@ -824,11 +842,6 @@ static void free_assigned_device(AssignedDevice
> > > > *dev)
> > > > 
> > > >      }
> > > >  
> > > >  }
> > > > 
> > > > -static uint32_t calc_assigned_dev_id(uint16_t seg, uint8_t bus,
> > > > uint8_t devfn) -{
> > > > -    return (uint32_t)seg << 16 | (uint32_t)bus << 8 |
> > > > (uint32_t)devfn; -}
> > > > -
> > > > 
> > > >  static void assign_failed_examine(AssignedDevice *dev)
> > > >  {
> > > >  
> > > >      char name[PATH_MAX], dir[PATH_MAX], driver[PATH_MAX] = {}, *ns;
> > > > 
> > > > @@ -1123,6 +1136,27 @@ static int
> > > > get_msix_entries_max_nr(AssignedDevice *adev)
> > > > 
> > > >      return entries_max_nr;
> > > >  
> > > >  }
> > > > 
> > > > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > > > +static int assigned_dev_msix_entry_masked(AssignedDevice *adev, int
> > > > entry) +{
> > > > +    struct kvm_assigned_msix_entry msix_entry;
> > > > +    int r;
> > > > +
> > > > +    memset(&msix_entry, 0, sizeof msix_entry);
> > > > +    msix_entry.assigned_dev_id = calc_assigned_dev_id(adev->h_segnr,
> > > > +            adev->h_busnr, (uint8_t)adev->h_devfn);
> > > > +    msix_entry.entry = entry;
> > > > +    msix_entry.flags = KVM_MSIX_FLAG_QUERY_MASK;
> > > > +    r = kvm_assign_get_msix_entry(kvm_context, &msix_entry);
> > > > +    if (r) {
> > > > +        fprintf(stderr, "assigned_dev_msix_entry_masked: "
> > > > +			"Fail to get mask bit of entry %d\n", entry);
> > > > +        return 1;
> > > 
> > > This error handling seems pretty useless. assert?
> > 
> > Maybe we can continue with it. Assert seems a little strict.
> 
> Well need to consider whether to return 1 or 0 then.

Tell it it's masked then routing change may be continue. 

> 
> > > > +    }
> > > > +    return (msix_entry.flags & KVM_MSIX_FLAG_MASK);
> > > > +}
> > > > +#endif
> > > > +
> > > > 
> > > >  static int get_msix_valid_entries_nr(AssignedDevice *adev,
> > > >  
> > > >  				     uint16_t entries_max_nr)
> > > >  
> > > >  {
> > > > 
> > > > @@ -1136,7 +1170,11 @@ static int
> > > > get_msix_valid_entries_nr(AssignedDevice *adev,
> > > > 
> > > >          memcpy(&msg_ctrl, va + i * 16 + 12, 4);
> > > >          memcpy(&msg_data, va + i * 16 + 8, 4);
> > > >          /* Ignore unused entry even it's unmasked */
> > > > 
> > > > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > > > +        if (assigned_dev_msix_entry_masked(adev, i))
> > > > +#else
> > > > 
> > > >          if (msg_data == 0)
> > > > 
> > > > +#endif
> > > 
> > > So, we are replacing msg_data == 0 check with masked check?
> > > If yes why not do this for non-KVM_CAP_DEVICE_MSIX_MASK too?
> > 
> > Because the old one doesn't have idea about mask bit...
> 
> So track mask bits in userspace.

That's not the target of this patch. And it still won't work.
> 
> > > >              continue;
> > > >          
> > > >          entries_nr ++;
> > > >      
> > > >      }
> > > > 
> > > > @@ -1165,6 +1203,8 @@ static int
> > > > assigned_dev_update_msix_mmio(PCIDevice *pci_dev,
> > > > 
> > > >      }
> > > >      
> > > >      free_dev_irq_entries(adev);
> > > > 
> > > > +    memset(pci_dev->msix_entry_used, 0, KVM_MAX_MSIX_PER_DEV *
> > > > +
> > > > sizeof(*pci_dev->msix_entry_used));
> > > > 
> > > >      adev->irq_entries_nr = entries_nr;
> > > >      adev->entry = calloc(entries_nr, sizeof(struct
> > > >      kvm_irq_routing_entry)); if (!adev->entry) {
> > > > 
> > > > @@ -1179,7 +1219,11 @@ static int
> > > > assigned_dev_update_msix_mmio(PCIDevice *pci_dev,
> > > > 
> > > >              break;
> > > >          
> > > >          memcpy(&msg_ctrl, va + i * 16 + 12, 4);
> > > >          memcpy(&msg_data, va + i * 16 + 8, 4);
> > > > 
> > > > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > > > +        if (assigned_dev_msix_entry_masked(adev, i))
> > > > +#else
> > > > 
> > > >          if (msg_data == 0)
> > > > 
> > > > +#endif
> > > 
> > > You can't use ifdef to check that kernel supports an ioctl.
> > > You must check this at runtime.
> > 
> > Maybe we can convert them in bulk later.
> 
> Well you are adding a bug, all writes on old kernels
> will spew out a ton of stderr.
> We can't just assume new kernels.
> 
> > > >              continue;
> > > >          
> > > >          memcpy(&msg_addr, va + i * 16, 4);
> > > > 
> > > > @@ -1200,6 +1244,7 @@ static int
> > > > assigned_dev_update_msix_mmio(PCIDevice *pci_dev,
> > > > 
> > > >          msix_entry.gsi = adev->entry[entries_nr].gsi;
> > > >          msix_entry.entry = i;
> > > > 
> > > > +        pci_dev->msix_entry_used[i] = 1;
> > > > 
> > > >          r = kvm_assign_set_msix_entry(kvm_context, &msix_entry);
> > > >          if (r) {
> > > >          
> > > >              fprintf(stderr, "fail to set MSI-X entry! %s\n",
> > > >              strerror(-r));
> > > > 
> > > > @@ -1243,6 +1288,8 @@ static void assigned_dev_update_msix(PCIDevice
> > > > *pci_dev, int enable_msix)
> > > > 
> > > >              perror("assigned_dev_update_msix: deassign irq");
> > > >          
> > > >          assigned_dev->irq_requested_type = 0;
> > > > 
> > > > +        memset(pci_dev->msix_entry_used, 0, KVM_MAX_MSIX_PER_DEV *
> > > > +
> > > > sizeof(*pci_dev->msix_entry_used));
> > > > 
> > > >      }
> > > >      
> > > >      entries_max_nr = get_msix_entries_max_nr(assigned_dev);
> > > > 
> > > > @@ -1250,10 +1297,16 @@ static void
> > > > assigned_dev_update_msix(PCIDevice *pci_dev, int enable_msix)
> > > > 
> > > >          fprintf(stderr, "assigned_dev_update_msix: MSI-X
> > > >          entries_max_nr == 0"); return;
> > > >      
> > > >      }
> > > > 
> > > > +    /*
> > > > +     * Guest may try to enable MSI-X before setting MSI-X entry
> > > > done, so +     * let's wait until guest unmask the entries.
> > > > +     */
> > > 
> > > Well it can also set up any number of entries, enable msix then
> > > set up more entries. Now what?
> > 
> > It's the same. If it want to set up more entries, it have to unmask them.
> > Then we would get them.
> 
> Why can't we handle the first enable in the same way?

Don't understand the question, but I guess the answer is pci_enable_msix().
> 
> > > >      entries_nr = get_msix_valid_entries_nr(assigned_dev,
> > > >      entries_max_nr); if (entries_nr == 0) {
> > > > 
> > > > +#ifndef KVM_CAP_DEVICE_MSIX_MASK
> > > > 
> > > >          if (enable_msix)
> > > >          
> > > >              fprintf(stderr, "MSI-X entry number is zero!\n");
> > > 
> > > And what happens then?
> > 
> > MSI-X can't work for old ones without MSIX mask support.
> 
> Old ones?

Old userspace without KVM_CAP_DEVICE_MSIX_MASK
> 
> > Reload the guest module
> > may help.
> 
> Guest might not have any concept of modules.

Just an workaround. Not a suggestion method.

--
regards
Yang, Sheng

> 
> > --
> > regards
> > Yang, Sheng

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/3] qemu-kvm: device assignment: emulate MSI-X mask bits
  2010-11-05 11:02         ` Sheng Yang
@ 2010-11-05 13:54           ` Michael S. Tsirkin
  2010-11-08  5:17             ` Sheng Yang
  0 siblings, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2010-11-05 13:54 UTC (permalink / raw)
  To: Sheng Yang; +Cc: Avi Kivity, Marcelo Tosatti, kvm

On Fri, Nov 05, 2010 at 07:02:02PM +0800, Sheng Yang wrote:
> On Friday 05 November 2010 17:05:32 Michael S. Tsirkin wrote:
> > On Fri, Nov 05, 2010 at 11:20:37AM +0800, Sheng Yang wrote:
> > > On Thursday 04 November 2010 17:44:27 Michael S. Tsirkin wrote:
> > > 
> > > Thanks very much for reviewing this! Seems nobody cares about userspace
> > > one before...
> > > 
> > > > On Thu, Nov 04, 2010 at 02:18:21PM +0800, Sheng Yang wrote:
> > > > > This patch emulated MSI-X per vector mask bit on assigned device.
> > > > > 
> > > > > Signed-off-by: Sheng Yang <sheng@linux.intel.com>
> > > > > ---
> > > > > 
> > > > >  hw/device-assignment.c |  161
> > > > >  ++++++++++++++++++++++++++++++++++++++++++++++-- 1 files
> > > 
> > > changed, 155
> > > 
> > > > >  insertions(+), 6 deletions(-)
> > > > > 
> > > > > diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> > > > > index 8a98876..639aa0b 100644
> > > > > --- a/hw/device-assignment.c
> > > > > +++ b/hw/device-assignment.c
> > > > > @@ -62,6 +62,11 @@ static void
> > > > > assigned_dev_load_option_rom(AssignedDevice *dev);
> > > > > 
> > > > >  static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev);
> > > > > 
> > > > > +static uint32_t calc_assigned_dev_id(uint16_t seg, uint8_t bus,
> > > > > uint8_t devfn) +{
> > > > > +    return (uint32_t)seg << 16 | (uint32_t)bus << 8 |
> > > > > (uint32_t)devfn; +}
> > > > > +
> > > > > 
> > > > >  static uint32_t assigned_dev_ioport_rw(AssignedDevRegion
> > > > >  *dev_region,
> > > > >  
> > > > >                                         uint32_t addr, int len,
> > > > >                                         uint32_t *val)
> > > > >  
> > > > >  {
> > > > > 
> > > > > @@ -264,6 +269,9 @@ static void assigned_dev_iomem_map(PCIDevice
> > > > > *pci_dev, int region_num,
> > > > > 
> > > > >      AssignedDevRegion *region = &r_dev->v_addrs[region_num];
> > > > >      PCIRegion *real_region =
> > > > >      &r_dev->real_device.regions[region_num]; int ret = 0;
> > > > > 
> > > > > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > > > > +    struct kvm_assigned_msix_mmio msix_mmio;
> > > > > +#endif
> > > > > 
> > > > >      DEBUG("e_phys=%08" FMT_PCIBUS " r_virt=%p type=%d len=%08"
> > > > >      FMT_PCIBUS " region_num=%d \n",
> > > > >      
> > > > >            e_phys, region->u.r_virtbase, type, e_size, region_num);
> > > > > 
> > > > > @@ -282,6 +290,16 @@ static void assigned_dev_iomem_map(PCIDevice
> > > > > *pci_dev, int region_num,
> > > > > 
> > > > >              cpu_register_physical_memory(e_phys + offset,
> > > > >              
> > > > >                      TARGET_PAGE_SIZE, r_dev->mmio_index);
> > > > > 
> > > > > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > > > > +	    memset(&msix_mmio, 0, sizeof(struct kvm_assigned_msix_mmio));
> > > > > +	    msix_mmio.assigned_dev_id =
> > > > > calc_assigned_dev_id(r_dev->h_segnr, +			    r_dev->h_busnr,
> > > > > r_dev->h_devfn);
> > > > > +	    msix_mmio.base_addr = e_phys + offset;
> > > > > +            /* We required kernel MSI-X support */
> > > > > +	    ret = kvm_assign_reg_msix_mmio(kvm_context, &msix_mmio);
> > > > > +	    if (ret)
> > > > > +                fprintf(stderr, "fail to register in-kernel
> > > > > msix_mmio!\n"); +#endif
> > > > > 
> > > > >          }
> > > > >      
> > > > >      }
> > > > > 
> > > > > @@ -824,11 +842,6 @@ static void free_assigned_device(AssignedDevice
> > > > > *dev)
> > > > > 
> > > > >      }
> > > > >  
> > > > >  }
> > > > > 
> > > > > -static uint32_t calc_assigned_dev_id(uint16_t seg, uint8_t bus,
> > > > > uint8_t devfn) -{
> > > > > -    return (uint32_t)seg << 16 | (uint32_t)bus << 8 |
> > > > > (uint32_t)devfn; -}
> > > > > -
> > > > > 
> > > > >  static void assign_failed_examine(AssignedDevice *dev)
> > > > >  {
> > > > >  
> > > > >      char name[PATH_MAX], dir[PATH_MAX], driver[PATH_MAX] = {}, *ns;
> > > > > 
> > > > > @@ -1123,6 +1136,27 @@ static int
> > > > > get_msix_entries_max_nr(AssignedDevice *adev)
> > > > > 
> > > > >      return entries_max_nr;
> > > > >  
> > > > >  }
> > > > > 
> > > > > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > > > > +static int assigned_dev_msix_entry_masked(AssignedDevice *adev, int
> > > > > entry) +{
> > > > > +    struct kvm_assigned_msix_entry msix_entry;
> > > > > +    int r;
> > > > > +
> > > > > +    memset(&msix_entry, 0, sizeof msix_entry);
> > > > > +    msix_entry.assigned_dev_id = calc_assigned_dev_id(adev->h_segnr,
> > > > > +            adev->h_busnr, (uint8_t)adev->h_devfn);
> > > > > +    msix_entry.entry = entry;
> > > > > +    msix_entry.flags = KVM_MSIX_FLAG_QUERY_MASK;
> > > > > +    r = kvm_assign_get_msix_entry(kvm_context, &msix_entry);
> > > > > +    if (r) {
> > > > > +        fprintf(stderr, "assigned_dev_msix_entry_masked: "
> > > > > +			"Fail to get mask bit of entry %d\n", entry);
> > > > > +        return 1;
> > > > 
> > > > This error handling seems pretty useless. assert?
> > > 
> > > Maybe we can continue with it. Assert seems a little strict.
> > 
> > Well need to consider whether to return 1 or 0 then.
> 
> Tell it it's masked then routing change may be continue. 

OK, then
1. we probably want to print this only once, not on each write
2. only try doing this if capability is present.

> > 
> > > > > +    }
> > > > > +    return (msix_entry.flags & KVM_MSIX_FLAG_MASK);
> > > > > +}
> > > > > +#endif
> > > > > +
> > > > > 
> > > > >  static int get_msix_valid_entries_nr(AssignedDevice *adev,
> > > > >  
> > > > >  				     uint16_t entries_max_nr)
> > > > >  
> > > > >  {
> > > > > 
> > > > > @@ -1136,7 +1170,11 @@ static int
> > > > > get_msix_valid_entries_nr(AssignedDevice *adev,
> > > > > 
> > > > >          memcpy(&msg_ctrl, va + i * 16 + 12, 4);
> > > > >          memcpy(&msg_data, va + i * 16 + 8, 4);
> > > > >          /* Ignore unused entry even it's unmasked */
> > > > > 
> > > > > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > > > > +        if (assigned_dev_msix_entry_masked(adev, i))
> > > > > +#else
> > > > > 
> > > > >          if (msg_data == 0)
> > > > > 
> > > > > +#endif
> > > > 
> > > > So, we are replacing msg_data == 0 check with masked check?
> > > > If yes why not do this for non-KVM_CAP_DEVICE_MSIX_MASK too?
> > > 
> > > Because the old one doesn't have idea about mask bit...
> > 
> > So track mask bits in userspace.
> 
> That's not the target of this patch. And it still won't work.

Sure it will, at least to figure out # of entries.

> > 
> > > > >              continue;
> > > > >          
> > > > >          entries_nr ++;
> > > > >      
> > > > >      }
> > > > > 
> > > > > @@ -1165,6 +1203,8 @@ static int
> > > > > assigned_dev_update_msix_mmio(PCIDevice *pci_dev,
> > > > > 
> > > > >      }
> > > > >      
> > > > >      free_dev_irq_entries(adev);
> > > > > 
> > > > > +    memset(pci_dev->msix_entry_used, 0, KVM_MAX_MSIX_PER_DEV *
> > > > > +
> > > > > sizeof(*pci_dev->msix_entry_used));
> > > > > 
> > > > >      adev->irq_entries_nr = entries_nr;
> > > > >      adev->entry = calloc(entries_nr, sizeof(struct
> > > > >      kvm_irq_routing_entry)); if (!adev->entry) {
> > > > > 
> > > > > @@ -1179,7 +1219,11 @@ static int
> > > > > assigned_dev_update_msix_mmio(PCIDevice *pci_dev,
> > > > > 
> > > > >              break;
> > > > >          
> > > > >          memcpy(&msg_ctrl, va + i * 16 + 12, 4);
> > > > >          memcpy(&msg_data, va + i * 16 + 8, 4);
> > > > > 
> > > > > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > > > > +        if (assigned_dev_msix_entry_masked(adev, i))
> > > > > +#else
> > > > > 
> > > > >          if (msg_data == 0)
> > > > > 
> > > > > +#endif
> > > > 
> > > > You can't use ifdef to check that kernel supports an ioctl.
> > > > You must check this at runtime.
> > > 
> > > Maybe we can convert them in bulk later.
> > 
> > Well you are adding a bug, all writes on old kernels
> > will spew out a ton of stderr.
> > We can't just assume new kernels.
> > 
> > > > >              continue;
> > > > >          
> > > > >          memcpy(&msg_addr, va + i * 16, 4);
> > > > > 
> > > > > @@ -1200,6 +1244,7 @@ static int
> > > > > assigned_dev_update_msix_mmio(PCIDevice *pci_dev,
> > > > > 
> > > > >          msix_entry.gsi = adev->entry[entries_nr].gsi;
> > > > >          msix_entry.entry = i;
> > > > > 
> > > > > +        pci_dev->msix_entry_used[i] = 1;
> > > > > 
> > > > >          r = kvm_assign_set_msix_entry(kvm_context, &msix_entry);
> > > > >          if (r) {
> > > > >          
> > > > >              fprintf(stderr, "fail to set MSI-X entry! %s\n",
> > > > >              strerror(-r));
> > > > > 
> > > > > @@ -1243,6 +1288,8 @@ static void assigned_dev_update_msix(PCIDevice
> > > > > *pci_dev, int enable_msix)
> > > > > 
> > > > >              perror("assigned_dev_update_msix: deassign irq");
> > > > >          
> > > > >          assigned_dev->irq_requested_type = 0;
> > > > > 
> > > > > +        memset(pci_dev->msix_entry_used, 0, KVM_MAX_MSIX_PER_DEV *
> > > > > +
> > > > > sizeof(*pci_dev->msix_entry_used));
> > > > > 
> > > > >      }
> > > > >      
> > > > >      entries_max_nr = get_msix_entries_max_nr(assigned_dev);
> > > > > 
> > > > > @@ -1250,10 +1297,16 @@ static void
> > > > > assigned_dev_update_msix(PCIDevice *pci_dev, int enable_msix)
> > > > > 
> > > > >          fprintf(stderr, "assigned_dev_update_msix: MSI-X
> > > > >          entries_max_nr == 0"); return;
> > > > >      
> > > > >      }
> > > > > 
> > > > > +    /*
> > > > > +     * Guest may try to enable MSI-X before setting MSI-X entry
> > > > > done, so +     * let's wait until guest unmask the entries.
> > > > > +     */
> > > > 
> > > > Well it can also set up any number of entries, enable msix then
> > > > set up more entries. Now what?
> > > 
> > > It's the same. If it want to set up more entries, it have to unmask them.
> > > Then we would get them.
> > 
> > Why can't we handle the first enable in the same way?
> 
> Don't understand the question, but I guess the answer is pci_enable_msix().

pci_enable_msix is an internal kernel API to enable msix, isn't it?
We are discussing qemu patch here.

> > 
> > > > >      entries_nr = get_msix_valid_entries_nr(assigned_dev,
> > > > >      entries_max_nr); if (entries_nr == 0) {
> > > > > 
> > > > > +#ifndef KVM_CAP_DEVICE_MSIX_MASK
> > > > > 
> > > > >          if (enable_msix)
> > > > >          
> > > > >              fprintf(stderr, "MSI-X entry number is zero!\n");
> > > > 
> > > > And what happens then?
> > > 
> > > MSI-X can't work for old ones without MSIX mask support.
> > 
> > Old ones?
> 
> Old userspace without KVM_CAP_DEVICE_MSIX_MASK
> > 
> > > Reload the guest module
> > > may help.
> > 
> > Guest might not have any concept of modules.
> 
> Just an workaround. Not a suggestion method.


Look, if there's some failure mode we need a way to recover,
or to make it very unlikely. If this is guest doing something
illegal let's add a comment explaining what it is and why
it's illegal. If this is legal, why the print out?

> --
> regards
> Yang, Sheng
> 
> > 
> > > --
> > > regards
> > > Yang, Sheng

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/3] qemu-kvm: device assignment: emulate MSI-X mask bits
  2010-11-05 13:54           ` Michael S. Tsirkin
@ 2010-11-08  5:17             ` Sheng Yang
  2010-11-08  6:27               ` Michael S. Tsirkin
  0 siblings, 1 reply; 20+ messages in thread
From: Sheng Yang @ 2010-11-08  5:17 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Avi Kivity, Marcelo Tosatti, kvm

On Friday 05 November 2010 21:54:16 Michael S. Tsirkin wrote:
> On Fri, Nov 05, 2010 at 07:02:02PM +0800, Sheng Yang wrote:
> > On Friday 05 November 2010 17:05:32 Michael S. Tsirkin wrote:
> > > On Fri, Nov 05, 2010 at 11:20:37AM +0800, Sheng Yang wrote:
> > > > On Thursday 04 November 2010 17:44:27 Michael S. Tsirkin wrote:
> > > > 
> > > > Thanks very much for reviewing this! Seems nobody cares about
> > > > userspace one before...
> > > > 
> > > > > On Thu, Nov 04, 2010 at 02:18:21PM +0800, Sheng Yang wrote:
> > > > > > This patch emulated MSI-X per vector mask bit on assigned device.
> > > > > > 
> > > > > > Signed-off-by: Sheng Yang <sheng@linux.intel.com>
> > > > > > ---
> > > > > > 
> > > > > >  hw/device-assignment.c |  161
> > > > > >  ++++++++++++++++++++++++++++++++++++++++++++++-- 1 
files
> > > > 
> > > > changed, 155
> > > > 
> > > > > >  insertions(+), 6 deletions(-)
> > > > > > 
> > > > > > diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> > > > > > index 8a98876..639aa0b 100644
> > > > > > --- a/hw/device-assignment.c
> > > > > > +++ b/hw/device-assignment.c
> > > > > > @@ -62,6 +62,11 @@ static void
> > > > > > assigned_dev_load_option_rom(AssignedDevice *dev);
> > > > > > 
> > > > > >  static void assigned_dev_unregister_msix_mmio(AssignedDevice
> > > > > >  *dev);
> > > > > > 
> > > > > > +static uint32_t calc_assigned_dev_id(uint16_t seg, uint8_t bus,
> > > > > > uint8_t devfn) +{
> > > > > > +    return (uint32_t)seg << 16 | (uint32_t)bus << 8 |
> > > > > > (uint32_t)devfn; +}
> > > > > > +
> > > > > > 
> > > > > >  static uint32_t assigned_dev_ioport_rw(AssignedDevRegion
> > > > > >  *dev_region,
> > > > > >  
> > > > > >                                         uint32_t addr, int len,
> > > > > >                                         uint32_t *val)
> > > > > >  
> > > > > >  {
> > > > > > 
> > > > > > @@ -264,6 +269,9 @@ static void assigned_dev_iomem_map(PCIDevice
> > > > > > *pci_dev, int region_num,
> > > > > > 
> > > > > >      AssignedDevRegion *region = &r_dev->v_addrs[region_num];
> > > > > >      PCIRegion *real_region =
> > > > > >      &r_dev->real_device.regions[region_num]; int ret = 0;
> > > > > > 
> > > > > > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > > > > > +    struct kvm_assigned_msix_mmio msix_mmio;
> > > > > > +#endif
> > > > > > 
> > > > > >      DEBUG("e_phys=%08" FMT_PCIBUS " r_virt=%p type=%d len=%08"
> > > > > >      FMT_PCIBUS " region_num=%d \n",
> > > > > >      
> > > > > >            e_phys, region->u.r_virtbase, type, e_size,
> > > > > >            region_num);
> > > > > > 
> > > > > > @@ -282,6 +290,16 @@ static void assigned_dev_iomem_map(PCIDevice
> > > > > > *pci_dev, int region_num,
> > > > > > 
> > > > > >              cpu_register_physical_memory(e_phys + offset,
> > > > > >              
> > > > > >                      TARGET_PAGE_SIZE, r_dev->mmio_index);
> > > > > > 
> > > > > > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > > > > > +	    memset(&msix_mmio, 0, sizeof(struct
> > > > > > kvm_assigned_msix_mmio)); +	    msix_mmio.assigned_dev_id =
> > > > > > calc_assigned_dev_id(r_dev->h_segnr, +			    r_dev-
>h_busnr,
> > > > > > r_dev->h_devfn);
> > > > > > +	    msix_mmio.base_addr = e_phys + offset;
> > > > > > +            /* We required kernel MSI-X support */
> > > > > > +	    ret = kvm_assign_reg_msix_mmio(kvm_context, &msix_mmio);
> > > > > > +	    if (ret)
> > > > > > +                fprintf(stderr, "fail to register in-kernel
> > > > > > msix_mmio!\n"); +#endif
> > > > > > 
> > > > > >          }
> > > > > >      
> > > > > >      }
> > > > > > 
> > > > > > @@ -824,11 +842,6 @@ static void
> > > > > > free_assigned_device(AssignedDevice *dev)
> > > > > > 
> > > > > >      }
> > > > > >  
> > > > > >  }
> > > > > > 
> > > > > > -static uint32_t calc_assigned_dev_id(uint16_t seg, uint8_t bus,
> > > > > > uint8_t devfn) -{
> > > > > > -    return (uint32_t)seg << 16 | (uint32_t)bus << 8 |
> > > > > > (uint32_t)devfn; -}
> > > > > > -
> > > > > > 
> > > > > >  static void assign_failed_examine(AssignedDevice *dev)
> > > > > >  {
> > > > > >  
> > > > > >      char name[PATH_MAX], dir[PATH_MAX], driver[PATH_MAX] = {},
> > > > > >      *ns;
> > > > > > 
> > > > > > @@ -1123,6 +1136,27 @@ static int
> > > > > > get_msix_entries_max_nr(AssignedDevice *adev)
> > > > > > 
> > > > > >      return entries_max_nr;
> > > > > >  
> > > > > >  }
> > > > > > 
> > > > > > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > > > > > +static int assigned_dev_msix_entry_masked(AssignedDevice *adev,
> > > > > > int entry) +{
> > > > > > +    struct kvm_assigned_msix_entry msix_entry;
> > > > > > +    int r;
> > > > > > +
> > > > > > +    memset(&msix_entry, 0, sizeof msix_entry);
> > > > > > +    msix_entry.assigned_dev_id =
> > > > > > calc_assigned_dev_id(adev->h_segnr, +            adev->h_busnr,
> > > > > > (uint8_t)adev->h_devfn);
> > > > > > +    msix_entry.entry = entry;
> > > > > > +    msix_entry.flags = KVM_MSIX_FLAG_QUERY_MASK;
> > > > > > +    r = kvm_assign_get_msix_entry(kvm_context, &msix_entry);
> > > > > > +    if (r) {
> > > > > > +        fprintf(stderr, "assigned_dev_msix_entry_masked: "
> > > > > > +			"Fail to get mask bit of entry %d\n", entry);
> > > > > > +        return 1;
> > > > > 
> > > > > This error handling seems pretty useless. assert?
> > > > 
> > > > Maybe we can continue with it. Assert seems a little strict.
> > > 
> > > Well need to consider whether to return 1 or 0 then.
> > 
> > Tell it it's masked then routing change may be continue.
> 
> OK, then
> 1. we probably want to print this only once, not on each write
> 2. only try doing this if capability is present.
> 
> > > > > > +    }
> > > > > > +    return (msix_entry.flags & KVM_MSIX_FLAG_MASK);
> > > > > > +}
> > > > > > +#endif
> > > > > > +
> > > > > > 
> > > > > >  static int get_msix_valid_entries_nr(AssignedDevice *adev,
> > > > > >  
> > > > > >  				     uint16_t entries_max_nr)
> > > > > >  
> > > > > >  {
> > > > > > 
> > > > > > @@ -1136,7 +1170,11 @@ static int
> > > > > > get_msix_valid_entries_nr(AssignedDevice *adev,
> > > > > > 
> > > > > >          memcpy(&msg_ctrl, va + i * 16 + 12, 4);
> > > > > >          memcpy(&msg_data, va + i * 16 + 8, 4);
> > > > > >          /* Ignore unused entry even it's unmasked */
> > > > > > 
> > > > > > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > > > > > +        if (assigned_dev_msix_entry_masked(adev, i))
> > > > > > +#else
> > > > > > 
> > > > > >          if (msg_data == 0)
> > > > > > 
> > > > > > +#endif
> > > > > 
> > > > > So, we are replacing msg_data == 0 check with masked check?
> > > > > If yes why not do this for non-KVM_CAP_DEVICE_MSIX_MASK too?
> > > > 
> > > > Because the old one doesn't have idea about mask bit...
> > > 
> > > So track mask bits in userspace.
> > 
> > That's not the target of this patch. And it still won't work.
> 
> Sure it will, at least to figure out # of entries.

OK, we can only tracking them.
 
> > > > > >              continue;
> > > > > >          
> > > > > >          entries_nr ++;
> > > > > >      
> > > > > >      }
> > > > > > 
> > > > > > @@ -1165,6 +1203,8 @@ static int
> > > > > > assigned_dev_update_msix_mmio(PCIDevice *pci_dev,
> > > > > > 
> > > > > >      }
> > > > > >      
> > > > > >      free_dev_irq_entries(adev);
> > > > > > 
> > > > > > +    memset(pci_dev->msix_entry_used, 0, KVM_MAX_MSIX_PER_DEV *
> > > > > > +
> > > > > > sizeof(*pci_dev->msix_entry_used));
> > > > > > 
> > > > > >      adev->irq_entries_nr = entries_nr;
> > > > > >      adev->entry = calloc(entries_nr, sizeof(struct
> > > > > >      kvm_irq_routing_entry)); if (!adev->entry) {
> > > > > > 
> > > > > > @@ -1179,7 +1219,11 @@ static int
> > > > > > assigned_dev_update_msix_mmio(PCIDevice *pci_dev,
> > > > > > 
> > > > > >              break;
> > > > > >          
> > > > > >          memcpy(&msg_ctrl, va + i * 16 + 12, 4);
> > > > > >          memcpy(&msg_data, va + i * 16 + 8, 4);
> > > > > > 
> > > > > > +#ifdef KVM_CAP_DEVICE_MSIX_MASK
> > > > > > +        if (assigned_dev_msix_entry_masked(adev, i))
> > > > > > +#else
> > > > > > 
> > > > > >          if (msg_data == 0)
> > > > > > 
> > > > > > +#endif
> > > > > 
> > > > > You can't use ifdef to check that kernel supports an ioctl.
> > > > > You must check this at runtime.
> > > > 
> > > > Maybe we can convert them in bulk later.
> > > 
> > > Well you are adding a bug, all writes on old kernels
> > > will spew out a ton of stderr.
> > > We can't just assume new kernels.
> > > 
> > > > > >              continue;
> > > > > >          
> > > > > >          memcpy(&msg_addr, va + i * 16, 4);
> > > > > > 
> > > > > > @@ -1200,6 +1244,7 @@ static int
> > > > > > assigned_dev_update_msix_mmio(PCIDevice *pci_dev,
> > > > > > 
> > > > > >          msix_entry.gsi = adev->entry[entries_nr].gsi;
> > > > > >          msix_entry.entry = i;
> > > > > > 
> > > > > > +        pci_dev->msix_entry_used[i] = 1;
> > > > > > 
> > > > > >          r = kvm_assign_set_msix_entry(kvm_context, &msix_entry);
> > > > > >          if (r) {
> > > > > >          
> > > > > >              fprintf(stderr, "fail to set MSI-X entry! %s\n",
> > > > > >              strerror(-r));
> > > > > > 
> > > > > > @@ -1243,6 +1288,8 @@ static void
> > > > > > assigned_dev_update_msix(PCIDevice *pci_dev, int enable_msix)
> > > > > > 
> > > > > >              perror("assigned_dev_update_msix: deassign irq");
> > > > > >          
> > > > > >          assigned_dev->irq_requested_type = 0;
> > > > > > 
> > > > > > +        memset(pci_dev->msix_entry_used, 0, KVM_MAX_MSIX_PER_DEV
> > > > > > * +
> > > > > > sizeof(*pci_dev->msix_entry_used));
> > > > > > 
> > > > > >      }
> > > > > >      
> > > > > >      entries_max_nr = get_msix_entries_max_nr(assigned_dev);
> > > > > > 
> > > > > > @@ -1250,10 +1297,16 @@ static void
> > > > > > assigned_dev_update_msix(PCIDevice *pci_dev, int enable_msix)
> > > > > > 
> > > > > >          fprintf(stderr, "assigned_dev_update_msix: MSI-X
> > > > > >          entries_max_nr == 0"); return;
> > > > > >      
> > > > > >      }
> > > > > > 
> > > > > > +    /*
> > > > > > +     * Guest may try to enable MSI-X before setting MSI-X entry
> > > > > > done, so +     * let's wait until guest unmask the entries.
> > > > > > +     */
> > > > > 
> > > > > Well it can also set up any number of entries, enable msix then
> > > > > set up more entries. Now what?
> > > > 
> > > > It's the same. If it want to set up more entries, it have to unmask
> > > > them. Then we would get them.
> > > 
> > > Why can't we handle the first enable in the same way?
> > 
> > Don't understand the question, but I guess the answer is
> > pci_enable_msix().
> 
> pci_enable_msix is an internal kernel API to enable msix, isn't it?
> We are discussing qemu patch here.

OK, I understand the question now. The others entries would be enabled with 
unmask, but not very first ones. Guest can unmask the entries first, then enable 
MSI-X. We should scan the table when MSI-X enabled, that's why the first one is 
different from others.

> > > > > >      entries_nr = get_msix_valid_entries_nr(assigned_dev,
> > > > > >      entries_max_nr); if (entries_nr == 0) {
> > > > > > 
> > > > > > +#ifndef KVM_CAP_DEVICE_MSIX_MASK
> > > > > > 
> > > > > >          if (enable_msix)
> > > > > >          
> > > > > >              fprintf(stderr, "MSI-X entry number is zero!\n");
> > > > > 
> > > > > And what happens then?
> > > > 
> > > > MSI-X can't work for old ones without MSIX mask support.
> > > 
> > > Old ones?
> > 
> > Old userspace without KVM_CAP_DEVICE_MSIX_MASK
> > 
> > > > Reload the guest module
> > > > may help.
> > > 
> > > Guest might not have any concept of modules.
> > 
> > Just an workaround. Not a suggestion method.
> 
> Look, if there's some failure mode we need a way to recover,
> or to make it very unlikely. If this is guest doing something
> illegal let's add a comment explaining what it is and why
> it's illegal. If this is legal, why the print out?

It's unsupported in the old QEmu. I can change the comment, but I do think it's 
good to let user know something important is wrong. 

Maybe it's better to put it into DEBUG.

--
regards
Yang, Sheng

> 
> > --
> > regards
> > Yang, Sheng
> > 
> > > > --
> > > > regards
> > > > Yang, Sheng

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/3] qemu-kvm: device assignment: emulate MSI-X mask bits
  2010-11-08  5:17             ` Sheng Yang
@ 2010-11-08  6:27               ` Michael S. Tsirkin
  0 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2010-11-08  6:27 UTC (permalink / raw)
  To: Sheng Yang; +Cc: Avi Kivity, Marcelo Tosatti, kvm

On Mon, Nov 08, 2010 at 01:17:47PM +0800, Sheng Yang wrote:
> > > > > > > @@ -1250,10 +1297,16 @@ static void
> > > > > > > assigned_dev_update_msix(PCIDevice *pci_dev, int enable_msix)
> > > > > > > 
> > > > > > >          fprintf(stderr, "assigned_dev_update_msix: MSI-X
> > > > > > >          entries_max_nr == 0"); return;
> > > > > > >      
> > > > > > >      }
> > > > > > > 
> > > > > > > +    /*
> > > > > > > +     * Guest may try to enable MSI-X before setting MSI-X entry
> > > > > > > done, so +     * let's wait until guest unmask the entries.
> > > > > > > +     */
> > > > > > 
> > > > > > Well it can also set up any number of entries, enable msix then
> > > > > > set up more entries. Now what?
> > > > > 
> > > > > It's the same. If it want to set up more entries, it have to unmask
> > > > > them. Then we would get them.
> > > > 
> > > > Why can't we handle the first enable in the same way?
> > > 
> > > Don't understand the question, but I guess the answer is
> > > pci_enable_msix().
> > 
> > pci_enable_msix is an internal kernel API to enable msix, isn't it?
> > We are discussing qemu patch here.
> 
> OK, I understand the question now. The others entries would be enabled with 
> unmask, but not very first ones. Guest can unmask the entries first, then enable 
> MSI-X. We should scan the table when MSI-X enabled, that's why the first one is 
> different from others.

I see. I think the comment mislead me: you really mean
'Guest can unmask MSI-X entries when MSI-X is disabled,
 don't do anything until MSI-X is enabled'.



> > > > > > >      entries_max_nr); if (entries_nr == 0) {
> > > > > > > 
> > > > > > > +#ifndef KVM_CAP_DEVICE_MSIX_MASK
> > > > > > > 
> > > > > > >          if (enable_msix)
> > > > > > >          
> > > > > > >              fprintf(stderr, "MSI-X entry number is zero!\n");
> > > > > > 
> > > > > > And what happens then?
> > > > > 
> > > > > MSI-X can't work for old ones without MSIX mask support.
> > > > 
> > > > Old ones?
> > > 
> > > Old userspace without KVM_CAP_DEVICE_MSIX_MASK
> > > 
> > > > > Reload the guest module
> > > > > may help.
> > > > 
> > > > Guest might not have any concept of modules.
> > > 
> > > Just an workaround. Not a suggestion method.
> > 
> > Look, if there's some failure mode we need a way to recover,
> > or to make it very unlikely. If this is guest doing something
> > illegal let's add a comment explaining what it is and why
> > it's illegal. If this is legal, why the print out?
> 
> It's unsupported in the old QEmu.


I think I have it. The comment you want is:

/* Error: number of MSI-X entries is zero. Using this device
   requires KVM_CAP_DEVICE_MSIX_MASK support in kvm
   which is missing.
 */

As I said prebiously, we must check runtime capabilities
for this, ifdef is not enough to know kernel supports a feature
because there's no dependency between kernel and qemu packages.

> I can change the comment, but I do think it's 
> good to let user know something important is wrong. 

stderr is really there for developers.
You can't expect the user to understand this message.

> Maybe it's better to put it into DEBUG.
> --
> regards
> Yang, Sheng
> 
> > 
> > > --
> > > regards
> > > Yang, Sheng
> > > 
> > > > > --
> > > > > regards
> > > > > Yang, Sheng

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/3] qemu-kvm: Ioctl for in-kernel mask support
  2010-11-05  8:52       ` Michael S. Tsirkin
@ 2010-11-11  6:20         ` Sheng Yang
  0 siblings, 0 replies; 20+ messages in thread
From: Sheng Yang @ 2010-11-11  6:20 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Avi Kivity, Marcelo Tosatti, kvm

On Friday 05 November 2010 16:52:47 Michael S. Tsirkin wrote:
> On Fri, Nov 05, 2010 at 10:59:23AM +0800, Sheng Yang wrote:
> > > We are trying to move away from using ifdefs. Stub these out instead?
> > 
> > Example?
> 
> 37859f054986fab6b4094cd5950266ae56a6ca6a and
> 889e30cc18e21f2091b77267dca8096d7dd34f8b that depends on it.

Gave it a try, but the parameter of the function depends on certain version of KVM 
header file which still require #ifdef...

I think we can leave it a bit later.

--
regards
Yang, Sheng

> 
> > --
> > regards
> > Yang, Sheng

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2010-11-11  6:20 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-04  6:18 [PATCH 0/3] MSI-X mask supporting for assigned device(QEmu) Sheng Yang
2010-11-04  6:18 ` [PATCH 1/3] qemu-kvm: Ioctl for in-kernel mask support Sheng Yang
2010-11-04  9:47   ` Michael S. Tsirkin
2010-11-05  2:59     ` Sheng Yang
2010-11-05  8:52       ` Michael S. Tsirkin
2010-11-11  6:20         ` Sheng Yang
2010-11-04  6:18 ` [PATCH 2/3] qemu-kvm: device assignment: Some clean up about MSI-X code Sheng Yang
2010-11-04  9:47   ` Michael S. Tsirkin
2010-11-05  3:08     ` Sheng Yang
2010-11-04  6:18 ` [PATCH 3/3] qemu-kvm: device assignment: emulate MSI-X mask bits Sheng Yang
2010-11-04  9:44   ` Michael S. Tsirkin
2010-11-05  3:20     ` Sheng Yang
2010-11-05  9:05       ` Michael S. Tsirkin
2010-11-05 11:02         ` Sheng Yang
2010-11-05 13:54           ` Michael S. Tsirkin
2010-11-08  5:17             ` Sheng Yang
2010-11-08  6:27               ` Michael S. Tsirkin
2010-11-04 10:04   ` Michael S. Tsirkin
2010-11-05  3:14     ` Sheng Yang
2010-11-05  8:59       ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).