* [PATCH 0/3] Emulate MSI-X mask bits for assigned devices
@ 2010-09-28 9:44 Sheng Yang
2010-09-28 9:44 ` [PATCH 1/3] KVM: Emulation " Sheng Yang
` (3 more replies)
0 siblings, 4 replies; 26+ messages in thread
From: Sheng Yang @ 2010-09-28 9:44 UTC (permalink / raw)
To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Sheng Yang
Hi Avi & Marcelo
This patchset would add emulation of MSI-X mask bit for assigned devices in
QEmu.
BTW: We are also purposed an acceleration of MSI-X mask bit for KVM - to get
it done in kernel. That's because sometime MSI-X mask bit was accessed very
frequently by guest and emulation in QEmu cost a lot(tens to hundreds percent
CPU utilization sometime), e.g. guest using Linux 2.6.18 or under some heavy
workload. The method has been proved efficient in Xen, but it would require
VMM to handle MSI-X table. Is that OK with you?
--
regards
Yang, Sheng
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 1/3] KVM: Emulation MSI-X mask bits for assigned devices
2010-09-28 9:44 [PATCH 0/3] Emulate MSI-X mask bits for assigned devices Sheng Yang
@ 2010-09-28 9:44 ` Sheng Yang
2010-09-28 13:36 ` Michael S. Tsirkin
` (3 more replies)
2010-09-28 9:44 ` [PATCH 2/3] qemu-kvm: device assignment: Some clean up for MSI-X code Sheng Yang
` (2 subsequent siblings)
3 siblings, 4 replies; 26+ messages in thread
From: Sheng Yang @ 2010-09-28 9:44 UTC (permalink / raw)
To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Sheng Yang
This patch enable per-vector mask for assigned devices using MSI-X.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
arch/x86/kvm/x86.c | 1 +
include/linux/kvm.h | 9 ++++++++-
include/linux/kvm_host.h | 1 +
virt/kvm/assigned-dev.c | 39 +++++++++++++++++++++++++++++++++++++++
4 files changed, 49 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8412c91..e6933e6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1927,6 +1927,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_DEBUGREGS:
case KVM_CAP_X86_ROBUST_SINGLESTEP:
case KVM_CAP_XSAVE:
+ case KVM_CAP_DEVICE_MSIX_MASK:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 919ae53..f2b7cdc 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -540,6 +540,9 @@ struct kvm_ppc_pvinfo {
#endif
#define KVM_CAP_PPC_GET_PVINFO 57
#define KVM_CAP_PPC_IRQ_LEVEL 58
+#ifdef __KVM_HAVE_MSIX
+#define KVM_CAP_DEVICE_MSIX_MASK 59
+#endif
#ifdef KVM_CAP_IRQ_ROUTING
@@ -787,11 +790,15 @@ struct kvm_assigned_msix_nr {
};
#define KVM_MAX_MSIX_PER_DEV 256
+
+#define KVM_MSIX_FLAG_MASK 1
+
struct kvm_assigned_msix_entry {
__u32 assigned_dev_id;
__u32 gsi;
__u16 entry; /* The index of entry in the MSI-X table */
- __u16 padding[3];
+ __u16 flags;
+ __u16 padding[2];
};
#endif /* __LINUX_KVM_H */
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0b89d00..a43405c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -415,6 +415,7 @@ struct kvm_irq_ack_notifier {
};
#define KVM_ASSIGNED_MSIX_PENDING 0x1
+#define KVM_ASSIGNED_MSIX_MASK 0x2
struct kvm_guest_msix_entry {
u32 vector;
u16 entry;
diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index 7c98928..15b8c32 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -17,6 +17,8 @@
#include <linux/pci.h>
#include <linux/interrupt.h>
#include <linux/slab.h>
+#include <linux/irqnr.h>
+
#include "irq.h"
static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct list_head *head,
@@ -666,6 +668,30 @@ msix_nr_out:
return r;
}
+static void update_msix_mask(struct kvm_assigned_dev_kernel *assigned_dev,
+ int index)
+{
+ int irq;
+
+ if (!assigned_dev->dev->msix_enabled ||
+ !(assigned_dev->irq_requested_type & KVM_DEV_IRQ_HOST_MSIX))
+ return;
+
+ irq = assigned_dev->host_msix_entries[index].vector;
+
+ ASSERT(irq != 0);
+
+ if (assigned_dev->guest_msix_entries[index].flags &
+ KVM_ASSIGNED_MSIX_MASK)
+ disable_irq(irq);
+ else {
+ enable_irq(irq);
+ if (assigned_dev->guest_msix_entries[index].flags &
+ KVM_ASSIGNED_MSIX_PENDING)
+ schedule_work(&assigned_dev->interrupt_work);
+ }
+}
+
static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm,
struct kvm_assigned_msix_entry *entry)
{
@@ -688,6 +714,19 @@ static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm,
adev->guest_msix_entries[i].entry = entry->entry;
adev->guest_msix_entries[i].vector = entry->gsi;
adev->host_msix_entries[i].entry = entry->entry;
+ if ((entry->flags & KVM_MSIX_FLAG_MASK) &&
+ !(adev->guest_msix_entries[i].flags &
+ KVM_ASSIGNED_MSIX_MASK)) {
+ adev->guest_msix_entries[i].flags |=
+ KVM_ASSIGNED_MSIX_MASK;
+ update_msix_mask(adev, i);
+ } else if (!(entry->flags & KVM_MSIX_FLAG_MASK) &&
+ (adev->guest_msix_entries[i].flags &
+ KVM_ASSIGNED_MSIX_MASK)) {
+ adev->guest_msix_entries[i].flags &=
+ ~KVM_ASSIGNED_MSIX_MASK;
+ update_msix_mask(adev, i);
+ }
break;
}
if (i == adev->entries_nr) {
--
1.7.0.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 2/3] qemu-kvm: device assignment: Some clean up for MSI-X code
2010-09-28 9:44 [PATCH 0/3] Emulate MSI-X mask bits for assigned devices Sheng Yang
2010-09-28 9:44 ` [PATCH 1/3] KVM: Emulation " Sheng Yang
@ 2010-09-28 9:44 ` Sheng Yang
2010-09-28 9:44 ` [PATCH 3/3] qemu-kvm: device assignment: emulate MSI-X mask bits Sheng Yang
2010-09-30 16:44 ` [PATCH 0/3] Emulate MSI-X mask bits for assigned devices Marcelo Tosatti
3 siblings, 0 replies; 26+ messages in thread
From: Sheng Yang @ 2010-09-28 9:44 UTC (permalink / raw)
To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Sheng Yang
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
hw/device-assignment.c | 75 ++++++++++++++++++++++++++++++++---------------
1 files changed, 51 insertions(+), 24 deletions(-)
diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 87f7418..4edae52 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1129,15 +1129,10 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev, unsigned int ctrl_pos)
#endif
#ifdef KVM_CAP_DEVICE_MSIX
-static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
+static int get_msix_entries_max_nr(AssignedDevice *adev)
{
- AssignedDevice *adev = container_of(pci_dev, AssignedDevice, dev);
- uint16_t entries_nr = 0, entries_max_nr;
- int pos = 0, i, r = 0;
- uint32_t msg_addr, msg_upper_addr, msg_data, msg_ctrl;
- struct kvm_assigned_msix_nr msix_nr;
- struct kvm_assigned_msix_entry msix_entry;
- void *va = adev->msix_table_page;
+ int pos, entries_max_nr;
+ PCIDevice *pci_dev = &adev->dev;
if (adev->cap.available & ASSIGNED_DEVICE_CAP_MSI)
pos = pci_dev->cap.start + PCI_CAPABILITY_CONFIG_MSI_LENGTH;
@@ -1148,6 +1143,17 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
entries_max_nr &= PCI_MSIX_TABSIZE;
entries_max_nr += 1;
+ return entries_max_nr;
+}
+
+static int get_msix_valid_entries_nr(AssignedDevice *adev,
+ uint16_t entries_max_nr)
+{
+ void *va = adev->msix_table_page;
+ uint32_t msg_data, msg_ctrl;
+ uint16_t entries_nr = 0;
+ int i;
+
/* Get the usable entry number for allocating */
for (i = 0; i < entries_max_nr; i++) {
memcpy(&msg_ctrl, va + i * 16 + 12, 4);
@@ -1157,11 +1163,20 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
continue;
entries_nr ++;
}
+ return entries_nr;
+}
+
+static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev,
+ uint16_t entries_nr,
+ uint16_t entries_max_nr)
+{
+ AssignedDevice *adev = container_of(pci_dev, AssignedDevice, dev);
+ int i, r = 0;
+ uint32_t msg_addr, msg_upper_addr, msg_data, msg_ctrl;
+ struct kvm_assigned_msix_nr msix_nr;
+ struct kvm_assigned_msix_entry msix_entry;
+ void *va = adev->msix_table_page;
- if (entries_nr == 0) {
- fprintf(stderr, "MSI-X entry number is zero!\n");
- return -EINVAL;
- }
msix_nr.assigned_dev_id = calc_assigned_dev_id(adev->h_segnr, adev->h_busnr,
(uint8_t)adev->h_devfn);
msix_nr.entry_nr = entries_nr;
@@ -1203,8 +1218,8 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
adev->entry[entries_nr].u.msi.address_lo = msg_addr;
adev->entry[entries_nr].u.msi.address_hi = msg_upper_addr;
adev->entry[entries_nr].u.msi.data = msg_data;
- DEBUG("MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!", msg_data, msg_addr);
- kvm_add_routing_entry(kvm_context, &adev->entry[entries_nr]);
+ DEBUG("MSI-X data 0x%x, MSI-X addr_lo 0x%x\n", msg_data, msg_addr);
+ kvm_add_routing_entry(kvm_context, &adev->entry[entries_nr]);
msix_entry.gsi = adev->entry[entries_nr].gsi;
msix_entry.entry = i;
@@ -1226,12 +1241,12 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
return r;
}
-static void assigned_dev_update_msix(PCIDevice *pci_dev, unsigned int ctrl_pos)
+static void assigned_dev_update_msix(PCIDevice *pci_dev, int enable_msix)
{
struct kvm_assigned_irq assigned_irq_data;
AssignedDevice *assigned_dev = container_of(pci_dev, AssignedDevice, dev);
- uint16_t *ctrl_word = (uint16_t *)(pci_dev->config + ctrl_pos);
int r;
+ uint16_t entries_nr, entries_max_nr;
memset(&assigned_irq_data, 0, sizeof assigned_irq_data);
assigned_irq_data.assigned_dev_id =
@@ -1242,8 +1257,7 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev, unsigned int ctrl_pos)
* try to catch this by only deassigning irqs if the guest is using
* MSIX or intends to start. */
if ((assigned_dev->irq_requested_type & KVM_DEV_IRQ_GUEST_MSIX) ||
- (*ctrl_word & PCI_MSIX_ENABLE)) {
-
+ enable_msix) {
assigned_irq_data.flags = assigned_dev->irq_requested_type;
free_dev_irq_entries(assigned_dev);
r = kvm_deassign_irq(kvm_context, &assigned_irq_data);
@@ -1254,14 +1268,25 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev, unsigned int ctrl_pos)
assigned_dev->irq_requested_type = 0;
}
- if (*ctrl_word & PCI_MSIX_ENABLE) {
- assigned_irq_data.flags = KVM_DEV_IRQ_HOST_MSIX |
- KVM_DEV_IRQ_GUEST_MSIX;
-
- if (assigned_dev_update_msix_mmio(pci_dev) < 0) {
+ entries_max_nr = get_msix_entries_max_nr(assigned_dev);
+ if (entries_max_nr == 0) {
+ fprintf(stderr, "assigned_dev_update_msix: MSI-X entries_max_nr == 0");
+ return;
+ }
+ entries_nr = get_msix_valid_entries_nr(assigned_dev, entries_max_nr);
+ if (entries_nr == 0) {
+ if (enable_msix)
+ fprintf(stderr, "MSI-X entry number is zero!\n");
+ return;
+ }
+ if (enable_msix) {
+ if (assigned_dev_update_msix_mmio(pci_dev,
+ entries_nr, entries_max_nr) < 0) {
perror("assigned_dev_update_msix_mmio");
return;
}
+ assigned_irq_data.flags = KVM_DEV_IRQ_HOST_MSIX |
+ KVM_DEV_IRQ_GUEST_MSIX;
if (kvm_assign_irq(kvm_context, &assigned_irq_data) < 0) {
perror("assigned_dev_enable_msix: assign irq");
return;
@@ -1277,6 +1302,7 @@ static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev, uint32_t ad
{
AssignedDevice *assigned_dev = container_of(pci_dev, AssignedDevice, dev);
unsigned int pos = pci_dev->cap.start, ctrl_pos;
+ uint16_t *ctrl_word;
pci_default_cap_write_config(pci_dev, address, val, len);
#ifdef KVM_CAP_IRQ_ROUTING
@@ -1293,7 +1319,8 @@ static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev, uint32_t ad
ctrl_pos = pos + 3;
if (address <= ctrl_pos && address + len > ctrl_pos) {
ctrl_pos--; /* control is word long */
- assigned_dev_update_msix(pci_dev, ctrl_pos);
+ ctrl_word = (uint16_t *)(pci_dev->config + ctrl_pos);
+ assigned_dev_update_msix(pci_dev, (*ctrl_word & PCI_MSIX_ENABLE));
}
pos += PCI_CAPABILITY_CONFIG_MSIX_LENGTH;
}
--
1.7.0.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 3/3] qemu-kvm: device assignment: emulate MSI-X mask bits
2010-09-28 9:44 [PATCH 0/3] Emulate MSI-X mask bits for assigned devices Sheng Yang
2010-09-28 9:44 ` [PATCH 1/3] KVM: Emulation " Sheng Yang
2010-09-28 9:44 ` [PATCH 2/3] qemu-kvm: device assignment: Some clean up for MSI-X code Sheng Yang
@ 2010-09-28 9:44 ` Sheng Yang
2010-09-30 16:44 ` [PATCH 0/3] Emulate MSI-X mask bits for assigned devices Marcelo Tosatti
3 siblings, 0 replies; 26+ messages in thread
From: Sheng Yang @ 2010-09-28 9:44 UTC (permalink / raw)
To: Avi Kivity, Marcelo Tosatti; +Cc: kvm, Sheng Yang
This patch emulated MSI-X per vector mask bit on assigned device.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
---
hw/device-assignment.c | 127 +++++++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 126 insertions(+), 1 deletions(-)
diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 4edae52..564c5ab 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1146,6 +1146,7 @@ static int get_msix_entries_max_nr(AssignedDevice *adev)
return entries_max_nr;
}
+#define MSIX_VECTOR_MASK 0x1
static int get_msix_valid_entries_nr(AssignedDevice *adev,
uint16_t entries_max_nr)
{
@@ -1159,7 +1160,11 @@ static int get_msix_valid_entries_nr(AssignedDevice *adev,
memcpy(&msg_ctrl, va + i * 16 + 12, 4);
memcpy(&msg_data, va + i * 16 + 8, 4);
/* Ignore unused entry even it's unmasked */
+#ifdef KVM_CAP_DEVICE_MSIX_MASK
+ if (msg_data == 0 || (msg_ctrl & MSIX_VECTOR_MASK))
+#else
if (msg_data == 0)
+#endif
continue;
entries_nr ++;
}
@@ -1188,6 +1193,8 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev,
}
free_dev_irq_entries(adev);
+ memset(pci_dev->msix_entry_used, 0, KVM_MAX_MSIX_PER_DEV *
+ sizeof(*pci_dev->msix_entry_used));
adev->irq_entries_nr = entries_nr;
adev->entry = calloc(entries_nr, sizeof(struct kvm_irq_routing_entry));
if (!adev->entry) {
@@ -1223,6 +1230,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev,
msix_entry.gsi = adev->entry[entries_nr].gsi;
msix_entry.entry = i;
+ pci_dev->msix_entry_used[i] = 1;
r = kvm_assign_set_msix_entry(kvm_context, &msix_entry);
if (r) {
fprintf(stderr, "fail to set MSI-X entry! %s\n", strerror(-r));
@@ -1266,6 +1274,8 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev, int enable_msix)
perror("assigned_dev_update_msix: deassign irq");
assigned_dev->irq_requested_type = 0;
+ memset(pci_dev->msix_entry_used, 0, KVM_MAX_MSIX_PER_DEV *
+ sizeof(*pci_dev->msix_entry_used));
}
entries_max_nr = get_msix_entries_max_nr(assigned_dev);
@@ -1273,10 +1283,16 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev, int enable_msix)
fprintf(stderr, "assigned_dev_update_msix: MSI-X entries_max_nr == 0");
return;
}
+ /*
+ * Guest may try to enable MSI-X before setting MSI-X entry done, so
+ * let's wait until guest unmask the entries.
+ */
entries_nr = get_msix_valid_entries_nr(assigned_dev, entries_max_nr);
if (entries_nr == 0) {
+#ifndef KVM_CAP_DEVICE_MSIX_MASK
if (enable_msix)
fprintf(stderr, "MSI-X entry number is zero!\n");
+#endif
return;
}
if (enable_msix) {
@@ -1320,7 +1336,8 @@ static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev, uint32_t ad
if (address <= ctrl_pos && address + len > ctrl_pos) {
ctrl_pos--; /* control is word long */
ctrl_word = (uint16_t *)(pci_dev->config + ctrl_pos);
- assigned_dev_update_msix(pci_dev, (*ctrl_word & PCI_MSIX_ENABLE));
+ assigned_dev_update_msix(pci_dev,
+ (*ctrl_word & PCI_MSIX_ENABLE) && !(*ctrl_word & PCI_MSIX_MASK));
}
pos += PCI_CAPABILITY_CONFIG_MSIX_LENGTH;
}
@@ -1412,6 +1429,104 @@ static uint32_t msix_mmio_readw(void *opaque, target_phys_addr_t addr)
(8 * (addr & 3))) & 0xffff;
}
+#ifdef KVM_CAP_DEVICE_MSIX_MASK
+static void msix_mmio_access_mask_bit(AssignedDevice *adev, int index)
+{
+ void *page = adev->msix_table_page;
+ uint32_t msg_ctrl, msg_data, msg_upper_addr, msg_addr;
+ struct kvm_assigned_msix_entry msix_entry;
+ int r = 0, pos, ctrl_word, entry_idx, entries_max_nr;
+ struct kvm_irq_routing_entry old_entry = {};
+
+ memcpy(&msg_addr, (char *)page + index * 16, 4);
+ memcpy(&msg_upper_addr, (char *)page + index * 16 + 4, 4);
+ memcpy(&msg_data, (char *)page + index * 16 + 8, 4);
+ memcpy(&msg_ctrl, (char *)page + index * 16 + 12, 4);
+ DEBUG("Access mask bit: MSI-X entries index %d: "
+ "msg_addr 0x%x, msg_upper_addr 0x%x, msg_data 0x%x, vec_ctl 0x%x\n",
+ index, msg_addr, msg_upper_addr, msg_data, msg_ctrl);
+
+ if (adev->cap.available & ASSIGNED_DEVICE_CAP_MSI)
+ pos = adev->dev.cap.start + PCI_CAPABILITY_CONFIG_MSI_LENGTH;
+ else
+ pos = adev->dev.cap.start;
+
+ ctrl_word = *(uint16_t *)(adev->dev.config + pos + 2);
+
+ if (!((ctrl_word & PCI_MSIX_ENABLE) && !(ctrl_word & PCI_MSIX_MASK)))
+ return;
+
+ if (!adev->dev.msix_entry_used[index] && (msg_ctrl & MSIX_VECTOR_MASK) == 0) {
+ DEBUG("Try to modify unenabled MSI-X entry %d's mask. "
+ "Reenable MSI-X.\n",
+ index);
+ assigned_dev_update_msix(&adev->dev, 1);
+ return;
+ }
+
+ /* find the correlated index of adev->entry */
+ entries_max_nr = get_msix_entries_max_nr(adev);
+ entry_idx = 0;
+ while (entry_idx < entries_max_nr)
+ if (adev->dev.msix_entry_used[index]) {
+ if (entry_idx == index)
+ break;
+ entry_idx ++;
+ }
+ if (entry_idx >= entries_max_nr) {
+ fprintf(stderr, "msix_mmio_access_mask_bit: Entry idx exceed limit!\n");
+ return;
+ }
+
+ msix_entry.assigned_dev_id = calc_assigned_dev_id(adev->h_segnr,
+ adev->h_busnr,
+ (uint8_t)adev->h_devfn);
+ msix_entry.gsi = adev->entry[entry_idx].gsi;
+ msix_entry.entry = index;
+ if (msg_ctrl & MSIX_VECTOR_MASK)
+ msix_entry.flags = KVM_MSIX_FLAG_MASK;
+ else
+ msix_entry.flags = 0;
+ DEBUG("set MSI-X index %d, esi 0x%x, mask %d\n",
+ index, msix_entry.gsi, msix_entry.flags);
+ r = kvm_assign_set_msix_entry(kvm_context, &msix_entry);
+ if (r) {
+ perror("msix_mmio_access_mask_bit: "
+ "fail to set MSI-X entry!");
+ return;
+ }
+
+ if (msix_entry.flags == 0) {
+ old_entry.gsi = adev->entry[entry_idx].gsi;
+ old_entry.type = KVM_IRQ_ROUTING_MSI;
+ old_entry.flags = 0;
+ old_entry.u.msi.address_lo = adev->entry[entry_idx].u.msi.address_lo;
+ old_entry.u.msi.address_hi = adev->entry[entry_idx].u.msi.address_hi;
+ old_entry.u.msi.data = adev->entry[entry_idx].u.msi.data;
+ adev->entry[entry_idx].u.msi.address_lo = msg_addr;
+ adev->entry[entry_idx].u.msi.address_hi = msg_upper_addr;
+ adev->entry[entry_idx].u.msi.data = msg_data;
+ if (memcmp(&adev->entry[entry_idx].u.msi, &old_entry.u.msi,
+ sizeof old_entry.u.msi)) {
+ int r;
+ r = kvm_update_routing_entry(kvm_context, &old_entry,
+ &adev->entry[entry_idx]);
+ if (r) {
+ perror("msix_mmio_access_mask_bit: "
+ "kvm_update_routing_entry failed\n");
+ return;
+ }
+ r = kvm_commit_irq_routes(kvm_context);
+ if (r) {
+ perror("msix_mmio_access_mask_bit: "
+ "kvm_commit_irq_routes failed\n");
+ return;
+ }
+ }
+ }
+}
+#endif
+
static void msix_mmio_writel(void *opaque,
target_phys_addr_t addr, uint32_t val)
{
@@ -1422,6 +1537,12 @@ static void msix_mmio_writel(void *opaque,
DEBUG("write to MSI-X entry table mmio offset 0x%lx, val 0x%x\n",
addr, val);
memcpy((void *)((char *)page + offset), &val, 4);
+
+#ifdef KVM_CAP_DEVICE_MSIX_MASK
+ /* Check if mask bit is being accessed */
+ if (offset % 16 == 12)
+ msix_mmio_access_mask_bit(adev, offset / 16);
+#endif
}
static void msix_mmio_writew(void *opaque,
@@ -1459,6 +1580,8 @@ static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
memset(dev->msix_table_page, 0, 0x1000);
dev->mmio_index = cpu_register_io_memory(
msix_mmio_read, msix_mmio_write, dev);
+ dev->dev.msix_entry_used = qemu_mallocz(KVM_MAX_MSIX_PER_DEV *
+ sizeof *dev->dev.msix_entry_used);
return 0;
}
@@ -1475,6 +1598,8 @@ static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
strerror(errno));
}
dev->msix_table_page = NULL;
+ free(dev->dev.msix_entry_used);
+ dev->dev.msix_entry_used = NULL;
}
static int assigned_initfn(struct PCIDevice *pci_dev)
--
1.7.0.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH 1/3] KVM: Emulation MSI-X mask bits for assigned devices
2010-09-28 9:44 ` [PATCH 1/3] KVM: Emulation " Sheng Yang
@ 2010-09-28 13:36 ` Michael S. Tsirkin
2010-09-29 1:17 ` Sheng Yang
2010-09-29 8:36 ` Avi Kivity
` (2 subsequent siblings)
3 siblings, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2010-09-28 13:36 UTC (permalink / raw)
To: Sheng Yang; +Cc: Avi Kivity, Marcelo Tosatti, kvm
On Tue, Sep 28, 2010 at 05:44:10PM +0800, Sheng Yang wrote:
> This patch enable per-vector mask for assigned devices using MSI-X.
>
> Signed-off-by: Sheng Yang <sheng@linux.intel.com>
> ---
> arch/x86/kvm/x86.c | 1 +
> include/linux/kvm.h | 9 ++++++++-
> include/linux/kvm_host.h | 1 +
> virt/kvm/assigned-dev.c | 39 +++++++++++++++++++++++++++++++++++++++
> 4 files changed, 49 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 8412c91..e6933e6 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1927,6 +1927,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> case KVM_CAP_DEBUGREGS:
> case KVM_CAP_X86_ROBUST_SINGLESTEP:
> case KVM_CAP_XSAVE:
> + case KVM_CAP_DEVICE_MSIX_MASK:
> r = 1;
> break;
> case KVM_CAP_COALESCED_MMIO:
> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> index 919ae53..f2b7cdc 100644
> --- a/include/linux/kvm.h
> +++ b/include/linux/kvm.h
> @@ -540,6 +540,9 @@ struct kvm_ppc_pvinfo {
> #endif
> #define KVM_CAP_PPC_GET_PVINFO 57
> #define KVM_CAP_PPC_IRQ_LEVEL 58
> +#ifdef __KVM_HAVE_MSIX
> +#define KVM_CAP_DEVICE_MSIX_MASK 59
> +#endif
>
> #ifdef KVM_CAP_IRQ_ROUTING
>
> @@ -787,11 +790,15 @@ struct kvm_assigned_msix_nr {
> };
>
> #define KVM_MAX_MSIX_PER_DEV 256
> +
> +#define KVM_MSIX_FLAG_MASK 1
> +
> struct kvm_assigned_msix_entry {
> __u32 assigned_dev_id;
> __u32 gsi;
> __u16 entry; /* The index of entry in the MSI-X table */
> - __u16 padding[3];
> + __u16 flags;
> + __u16 padding[2];
> };
>
> #endif /* __LINUX_KVM_H */
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 0b89d00..a43405c 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -415,6 +415,7 @@ struct kvm_irq_ack_notifier {
> };
>
> #define KVM_ASSIGNED_MSIX_PENDING 0x1
> +#define KVM_ASSIGNED_MSIX_MASK 0x2
> struct kvm_guest_msix_entry {
> u32 vector;
> u16 entry;
> diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
> index 7c98928..15b8c32 100644
> --- a/virt/kvm/assigned-dev.c
> +++ b/virt/kvm/assigned-dev.c
> @@ -17,6 +17,8 @@
> #include <linux/pci.h>
> #include <linux/interrupt.h>
> #include <linux/slab.h>
> +#include <linux/irqnr.h>
> +
> #include "irq.h"
>
> static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct list_head *head,
> @@ -666,6 +668,30 @@ msix_nr_out:
> return r;
> }
>
> +static void update_msix_mask(struct kvm_assigned_dev_kernel *assigned_dev,
> + int index)
> +{
> + int irq;
> +
> + if (!assigned_dev->dev->msix_enabled ||
> + !(assigned_dev->irq_requested_type & KVM_DEV_IRQ_HOST_MSIX))
> + return;
> +
> + irq = assigned_dev->host_msix_entries[index].vector;
> +
> + ASSERT(irq != 0);
> +
> + if (assigned_dev->guest_msix_entries[index].flags &
> + KVM_ASSIGNED_MSIX_MASK)
> + disable_irq(irq);
> + else {
> + enable_irq(irq);
> + if (assigned_dev->guest_msix_entries[index].flags &
> + KVM_ASSIGNED_MSIX_PENDING)
> + schedule_work(&assigned_dev->interrupt_work);
> + }
Hmm, won't this lose interrupts which were sent while bit was pending?
It is also pretty heavy if as you say guests touch the mask a lot.
I think we must keep the interrupt disabled, just set a bit
and delay interrupt injection until vector is unmasked
or deleted. The interface to do this will need more thought:
e.g. how can userspace clear this bit then?
> +}
> +
> static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm,
> struct kvm_assigned_msix_entry *entry)
> {
> @@ -688,6 +714,19 @@ static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm,
> adev->guest_msix_entries[i].entry = entry->entry;
> adev->guest_msix_entries[i].vector = entry->gsi;
> adev->host_msix_entries[i].entry = entry->entry;
> + if ((entry->flags & KVM_MSIX_FLAG_MASK) &&
> + !(adev->guest_msix_entries[i].flags &
> + KVM_ASSIGNED_MSIX_MASK)) {
> + adev->guest_msix_entries[i].flags |=
> + KVM_ASSIGNED_MSIX_MASK;
> + update_msix_mask(adev, i);
> + } else if (!(entry->flags & KVM_MSIX_FLAG_MASK) &&
> + (adev->guest_msix_entries[i].flags &
> + KVM_ASSIGNED_MSIX_MASK)) {
> + adev->guest_msix_entries[i].flags &=
> + ~KVM_ASSIGNED_MSIX_MASK;
> + update_msix_mask(adev, i);
> + }
> break;
> }
> if (i == adev->entries_nr) {
> --
> 1.7.0.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/3] KVM: Emulation MSI-X mask bits for assigned devices
2010-09-28 13:36 ` Michael S. Tsirkin
@ 2010-09-29 1:17 ` Sheng Yang
2010-09-30 16:25 ` Marcelo Tosatti
0 siblings, 1 reply; 26+ messages in thread
From: Sheng Yang @ 2010-09-29 1:17 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: Avi Kivity, Marcelo Tosatti, kvm
On Tuesday 28 September 2010 21:36:00 Michael S. Tsirkin wrote:
> On Tue, Sep 28, 2010 at 05:44:10PM +0800, Sheng Yang wrote:
> > This patch enable per-vector mask for assigned devices using MSI-X.
> >
> > Signed-off-by: Sheng Yang <sheng@linux.intel.com>
> > ---
> >
> > arch/x86/kvm/x86.c | 1 +
> > include/linux/kvm.h | 9 ++++++++-
> > include/linux/kvm_host.h | 1 +
> > virt/kvm/assigned-dev.c | 39
+++++++++++++++++++++++++++++++++++++++
> > 4 files changed, 49 insertions(+), 1 deletions(-)
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 8412c91..e6933e6 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -1927,6 +1927,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> >
> > case KVM_CAP_DEBUGREGS:
> > case KVM_CAP_X86_ROBUST_SINGLESTEP:
> >
> > case KVM_CAP_XSAVE:
> > + case KVM_CAP_DEVICE_MSIX_MASK:
> > r = 1;
> > break;
> >
> > case KVM_CAP_COALESCED_MMIO:
> > diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> > index 919ae53..f2b7cdc 100644
> > --- a/include/linux/kvm.h
> > +++ b/include/linux/kvm.h
> > @@ -540,6 +540,9 @@ struct kvm_ppc_pvinfo {
> >
> > #endif
> > #define KVM_CAP_PPC_GET_PVINFO 57
> > #define KVM_CAP_PPC_IRQ_LEVEL 58
> >
> > +#ifdef __KVM_HAVE_MSIX
> > +#define KVM_CAP_DEVICE_MSIX_MASK 59
> > +#endif
> >
> > #ifdef KVM_CAP_IRQ_ROUTING
> >
> > @@ -787,11 +790,15 @@ struct kvm_assigned_msix_nr {
> >
> > };
> >
> > #define KVM_MAX_MSIX_PER_DEV 256
> >
> > +
> > +#define KVM_MSIX_FLAG_MASK 1
> > +
> >
> > struct kvm_assigned_msix_entry {
> >
> > __u32 assigned_dev_id;
> > __u32 gsi;
> > __u16 entry; /* The index of entry in the MSI-X table */
> >
> > - __u16 padding[3];
> > + __u16 flags;
> > + __u16 padding[2];
> >
> > };
> >
> > #endif /* __LINUX_KVM_H */
> >
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 0b89d00..a43405c 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -415,6 +415,7 @@ struct kvm_irq_ack_notifier {
> >
> > };
> >
> > #define KVM_ASSIGNED_MSIX_PENDING 0x1
> >
> > +#define KVM_ASSIGNED_MSIX_MASK 0x2
> >
> > struct kvm_guest_msix_entry {
> >
> > u32 vector;
> > u16 entry;
> >
> > diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
> > index 7c98928..15b8c32 100644
> > --- a/virt/kvm/assigned-dev.c
> > +++ b/virt/kvm/assigned-dev.c
> > @@ -17,6 +17,8 @@
> >
> > #include <linux/pci.h>
> > #include <linux/interrupt.h>
> > #include <linux/slab.h>
> >
> > +#include <linux/irqnr.h>
> > +
> >
> > #include "irq.h"
> >
> > static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct
> > list_head *head,
> >
> > @@ -666,6 +668,30 @@ msix_nr_out:
> > return r;
> >
> > }
> >
> > +static void update_msix_mask(struct kvm_assigned_dev_kernel
> > *assigned_dev, + int index)
> > +{
> > + int irq;
> > +
> > + if (!assigned_dev->dev->msix_enabled ||
> > + !(assigned_dev->irq_requested_type & KVM_DEV_IRQ_HOST_MSIX))
> > + return;
> > +
> > + irq = assigned_dev->host_msix_entries[index].vector;
> > +
> > + ASSERT(irq != 0);
> > +
> > + if (assigned_dev->guest_msix_entries[index].flags &
> > + KVM_ASSIGNED_MSIX_MASK)
> > + disable_irq(irq);
> > + else {
> > + enable_irq(irq);
> > + if (assigned_dev->guest_msix_entries[index].flags &
> > + KVM_ASSIGNED_MSIX_PENDING)
> > + schedule_work(&assigned_dev->interrupt_work);
> > + }
>
> Hmm, won't this lose interrupts which were sent while bit was pending?
> It is also pretty heavy if as you say guests touch the mask a lot.
> I think we must keep the interrupt disabled, just set a bit
> and delay interrupt injection until vector is unmasked
> or deleted. The interface to do this will need more thought:
> e.g. how can userspace clear this bit then?
I think it's fine. Because we didn't modify pending bit here, and the interrupt
handler would schedule an work to check it regardless of if the IRQ is disable. By
this meaning, no interrupt would be lose.
But the checking for pending bit on unmask action seems unnecessary? Because if
the bit is set, then definitely one work has been scheduled to deal with it.
--
regards
Yang, Sheng
>
> > +}
> > +
> >
> > static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm,
> >
> > struct kvm_assigned_msix_entry *entry)
> >
> > {
> >
> > @@ -688,6 +714,19 @@ static int kvm_vm_ioctl_set_msix_entry(struct kvm
> > *kvm,
> >
> > adev->guest_msix_entries[i].entry = entry->entry;
> > adev->guest_msix_entries[i].vector = entry->gsi;
> > adev->host_msix_entries[i].entry = entry->entry;
> >
> > + if ((entry->flags & KVM_MSIX_FLAG_MASK) &&
> > + !(adev->guest_msix_entries[i].flags &
> > + KVM_ASSIGNED_MSIX_MASK)) {
> > + adev->guest_msix_entries[i].flags |=
> > + KVM_ASSIGNED_MSIX_MASK;
> > + update_msix_mask(adev, i);
> > + } else if (!(entry->flags & KVM_MSIX_FLAG_MASK) &&
> > + (adev->guest_msix_entries[i].flags &
> > + KVM_ASSIGNED_MSIX_MASK)) {
> > + adev->guest_msix_entries[i].flags &=
> > + ~KVM_ASSIGNED_MSIX_MASK;
> > + update_msix_mask(adev, i);
> > + }
> >
> > break;
> >
> > }
> >
> > if (i == adev->entries_nr) {
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/3] KVM: Emulation MSI-X mask bits for assigned devices
2010-09-28 9:44 ` [PATCH 1/3] KVM: Emulation " Sheng Yang
2010-09-28 13:36 ` Michael S. Tsirkin
@ 2010-09-29 8:36 ` Avi Kivity
2010-09-30 5:41 ` Sheng Yang
2010-09-30 16:18 ` Marcelo Tosatti
2010-10-03 11:12 ` Michael S. Tsirkin
3 siblings, 1 reply; 26+ messages in thread
From: Avi Kivity @ 2010-09-29 8:36 UTC (permalink / raw)
To: Sheng Yang; +Cc: Marcelo Tosatti, kvm
On 09/28/2010 11:44 AM, Sheng Yang wrote:
> This patch enable per-vector mask for assigned devices using MSI-X.
>
> +
> struct kvm_assigned_msix_entry {
> __u32 assigned_dev_id;
> __u32 gsi;
> __u16 entry; /* The index of entry in the MSI-X table */
> - __u16 padding[3];
> + __u16 flags;
> + __u16 padding[2];
> };
>
Given that this field wasn't a flag field previously, we can't expect it
to be zero. So before we can check it, userspace needs to tell us that
is knows the field is not padding, but a flags field.
You can use KVM_ENABLE_CAP for that.
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/3] KVM: Emulation MSI-X mask bits for assigned devices
2010-09-29 8:36 ` Avi Kivity
@ 2010-09-30 5:41 ` Sheng Yang
2010-10-03 13:03 ` Avi Kivity
0 siblings, 1 reply; 26+ messages in thread
From: Sheng Yang @ 2010-09-30 5:41 UTC (permalink / raw)
To: Avi Kivity; +Cc: Marcelo Tosatti, kvm
On Wednesday 29 September 2010 16:36:25 Avi Kivity wrote:
> On 09/28/2010 11:44 AM, Sheng Yang wrote:
> > This patch enable per-vector mask for assigned devices using MSI-X.
> >
> > +
> >
> > struct kvm_assigned_msix_entry {
> >
> > __u32 assigned_dev_id;
> > __u32 gsi;
> > __u16 entry; /* The index of entry in the MSI-X table */
> >
> > - __u16 padding[3];
> > + __u16 flags;
> > + __u16 padding[2];
> >
> > };
>
> Given that this field wasn't a flag field previously, we can't expect it
> to be zero. So before we can check it, userspace needs to tell us that
> is knows the field is not padding, but a flags field.
>
> You can use KVM_ENABLE_CAP for that.
OK, I would use it. And it's per vcpu ioctl though, I would like to use it on
setting for per vm.
P.S. I would be offline till Oct.11 due to our holiday.
--
regards
Yang, Sheng
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/3] KVM: Emulation MSI-X mask bits for assigned devices
2010-09-28 9:44 ` [PATCH 1/3] KVM: Emulation " Sheng Yang
2010-09-28 13:36 ` Michael S. Tsirkin
2010-09-29 8:36 ` Avi Kivity
@ 2010-09-30 16:18 ` Marcelo Tosatti
2010-10-11 9:49 ` Sheng Yang
2010-10-03 11:12 ` Michael S. Tsirkin
3 siblings, 1 reply; 26+ messages in thread
From: Marcelo Tosatti @ 2010-09-30 16:18 UTC (permalink / raw)
To: Sheng Yang; +Cc: Avi Kivity, kvm
On Tue, Sep 28, 2010 at 05:44:10PM +0800, Sheng Yang wrote:
> This patch enable per-vector mask for assigned devices using MSI-X.
>
> Signed-off-by: Sheng Yang <sheng@linux.intel.com>
> ---
> arch/x86/kvm/x86.c | 1 +
> include/linux/kvm.h | 9 ++++++++-
> include/linux/kvm_host.h | 1 +
> virt/kvm/assigned-dev.c | 39 +++++++++++++++++++++++++++++++++++++++
> 4 files changed, 49 insertions(+), 1 deletions(-)
>
> #include <linux/slab.h>
> +#include <linux/irqnr.h>
> +
> #include "irq.h"
>
> static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct list_head *head,
> @@ -666,6 +668,30 @@ msix_nr_out:
> return r;
> }
>
> +static void update_msix_mask(struct kvm_assigned_dev_kernel *assigned_dev,
> + int index)
> +{
> + int irq;
> +
> + if (!assigned_dev->dev->msix_enabled ||
> + !(assigned_dev->irq_requested_type & KVM_DEV_IRQ_HOST_MSIX))
> + return;
> +
> + irq = assigned_dev->host_msix_entries[index].vector;
> +
> + ASSERT(irq != 0);
> +
> + if (assigned_dev->guest_msix_entries[index].flags &
> + KVM_ASSIGNED_MSIX_MASK)
> + disable_irq(irq);
> + else {
> + enable_irq(irq);
> + if (assigned_dev->guest_msix_entries[index].flags &
> + KVM_ASSIGNED_MSIX_PENDING)
> + schedule_work(&assigned_dev->interrupt_work);
> + }
> +}
> +
Should flush the workqueue after disabling irq. As you say, the
schedule_work should be unnecessary.
Also must be careful with races.
> static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm,
> struct kvm_assigned_msix_entry *entry)
> {
> @@ -688,6 +714,19 @@ static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm,
> adev->guest_msix_entries[i].entry = entry->entry;
> adev->guest_msix_entries[i].vector = entry->gsi;
> adev->host_msix_entries[i].entry = entry->entry;
> + if ((entry->flags & KVM_MSIX_FLAG_MASK) &&
> + !(adev->guest_msix_entries[i].flags &
> + KVM_ASSIGNED_MSIX_MASK)) {
> + adev->guest_msix_entries[i].flags |=
> + KVM_ASSIGNED_MSIX_MASK;
> + update_msix_mask(adev, i);
> + } else if (!(entry->flags & KVM_MSIX_FLAG_MASK) &&
> + (adev->guest_msix_entries[i].flags &
> + KVM_ASSIGNED_MSIX_MASK)) {
> + adev->guest_msix_entries[i].flags &=
> + ~KVM_ASSIGNED_MSIX_MASK;
> + update_msix_mask(adev, i);
> + }
> break;
> }
> if (i == adev->entries_nr) {
> --
> 1.7.0.1
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/3] KVM: Emulation MSI-X mask bits for assigned devices
2010-09-29 1:17 ` Sheng Yang
@ 2010-09-30 16:25 ` Marcelo Tosatti
0 siblings, 0 replies; 26+ messages in thread
From: Marcelo Tosatti @ 2010-09-30 16:25 UTC (permalink / raw)
To: Sheng Yang; +Cc: Michael S. Tsirkin, Avi Kivity, kvm
On Wed, Sep 29, 2010 at 09:17:14AM +0800, Sheng Yang wrote:
> > > + else {
> > > + enable_irq(irq);
> > > + if (assigned_dev->guest_msix_entries[index].flags &
> > > + KVM_ASSIGNED_MSIX_PENDING)
> > > + schedule_work(&assigned_dev->interrupt_work);
> > > + }
> >
> > Hmm, won't this lose interrupts which were sent while bit was pending?
> > It is also pretty heavy if as you say guests touch the mask a lot.
> > I think we must keep the interrupt disabled, just set a bit
> > and delay interrupt injection until vector is unmasked
> > or deleted. The interface to do this will need more thought:
> > e.g. how can userspace clear this bit then?
>
> I think it's fine. Because we didn't modify pending bit here, and the interrupt
> handler would schedule an work to check it regardless of if the IRQ is disable. By
> this meaning, no interrupt would be lose.
AFAICS unmasking the host irq will trigger pending message. If thats
correct, it should be fine.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 0/3] Emulate MSI-X mask bits for assigned devices
2010-09-28 9:44 [PATCH 0/3] Emulate MSI-X mask bits for assigned devices Sheng Yang
` (2 preceding siblings ...)
2010-09-28 9:44 ` [PATCH 3/3] qemu-kvm: device assignment: emulate MSI-X mask bits Sheng Yang
@ 2010-09-30 16:44 ` Marcelo Tosatti
2010-10-11 9:34 ` Sheng Yang
3 siblings, 1 reply; 26+ messages in thread
From: Marcelo Tosatti @ 2010-09-30 16:44 UTC (permalink / raw)
To: Sheng Yang; +Cc: Avi Kivity, kvm
On Tue, Sep 28, 2010 at 05:44:09PM +0800, Sheng Yang wrote:
> Hi Avi & Marcelo
>
> This patchset would add emulation of MSI-X mask bit for assigned devices in
> QEmu.
>
> BTW: We are also purposed an acceleration of MSI-X mask bit for KVM - to get
> it done in kernel. That's because sometime MSI-X mask bit was accessed very
> frequently by guest and emulation in QEmu cost a lot(tens to hundreds percent
> CPU utilization sometime), e.g. guest using Linux 2.6.18 or under some heavy
> workload. The method has been proved efficient in Xen, but it would require
> VMM to handle MSI-X table. Is that OK with you?
It would be good to understand where that cost is, before. Eg. do you
commit kvm irq route frequently when guest unmasks?
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/3] KVM: Emulation MSI-X mask bits for assigned devices
2010-09-28 9:44 ` [PATCH 1/3] KVM: Emulation " Sheng Yang
` (2 preceding siblings ...)
2010-09-30 16:18 ` Marcelo Tosatti
@ 2010-10-03 11:12 ` Michael S. Tsirkin
2010-10-11 9:28 ` Sheng Yang
3 siblings, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2010-10-03 11:12 UTC (permalink / raw)
To: Sheng Yang; +Cc: Avi Kivity, Marcelo Tosatti, kvm
On Tue, Sep 28, 2010 at 05:44:10PM +0800, Sheng Yang wrote:
> This patch enable per-vector mask for assigned devices using MSI-X.
>
> Signed-off-by: Sheng Yang <sheng@linux.intel.com>
I think I see an issue here, noted below. Some general comments:
- The mask bit seems broken for injecting interrupts from
userspace (with interrupts/ioctls).
I think we must test it on injection path.
- We'll need a way to support the pending bit.
Disabling interrupts will not let us do it.
> ---
> arch/x86/kvm/x86.c | 1 +
> include/linux/kvm.h | 9 ++++++++-
> include/linux/kvm_host.h | 1 +
> virt/kvm/assigned-dev.c | 39 +++++++++++++++++++++++++++++++++++++++
> 4 files changed, 49 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 8412c91..e6933e6 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1927,6 +1927,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> case KVM_CAP_DEBUGREGS:
> case KVM_CAP_X86_ROBUST_SINGLESTEP:
> case KVM_CAP_XSAVE:
> + case KVM_CAP_DEVICE_MSIX_MASK:
> r = 1;
> break;
> case KVM_CAP_COALESCED_MMIO:
> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> index 919ae53..f2b7cdc 100644
> --- a/include/linux/kvm.h
> +++ b/include/linux/kvm.h
> @@ -540,6 +540,9 @@ struct kvm_ppc_pvinfo {
> #endif
> #define KVM_CAP_PPC_GET_PVINFO 57
> #define KVM_CAP_PPC_IRQ_LEVEL 58
> +#ifdef __KVM_HAVE_MSIX
> +#define KVM_CAP_DEVICE_MSIX_MASK 59
> +#endif
>
> #ifdef KVM_CAP_IRQ_ROUTING
>
> @@ -787,11 +790,15 @@ struct kvm_assigned_msix_nr {
> };
>
> #define KVM_MAX_MSIX_PER_DEV 256
> +
> +#define KVM_MSIX_FLAG_MASK 1
> +
> struct kvm_assigned_msix_entry {
> __u32 assigned_dev_id;
> __u32 gsi;
> __u16 entry; /* The index of entry in the MSI-X table */
> - __u16 padding[3];
> + __u16 flags;
> + __u16 padding[2];
> };
>
> #endif /* __LINUX_KVM_H */
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 0b89d00..a43405c 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -415,6 +415,7 @@ struct kvm_irq_ack_notifier {
> };
>
> #define KVM_ASSIGNED_MSIX_PENDING 0x1
> +#define KVM_ASSIGNED_MSIX_MASK 0x2
> struct kvm_guest_msix_entry {
> u32 vector;
> u16 entry;
> diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
> index 7c98928..15b8c32 100644
> --- a/virt/kvm/assigned-dev.c
> +++ b/virt/kvm/assigned-dev.c
> @@ -17,6 +17,8 @@
> #include <linux/pci.h>
> #include <linux/interrupt.h>
> #include <linux/slab.h>
> +#include <linux/irqnr.h>
> +
> #include "irq.h"
>
> static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct list_head *head,
> @@ -666,6 +668,30 @@ msix_nr_out:
> return r;
> }
>
> +static void update_msix_mask(struct kvm_assigned_dev_kernel *assigned_dev,
> + int index)
> +{
> + int irq;
> +
> + if (!assigned_dev->dev->msix_enabled ||
> + !(assigned_dev->irq_requested_type & KVM_DEV_IRQ_HOST_MSIX))
> + return;
> +
> + irq = assigned_dev->host_msix_entries[index].vector;
> +
> + ASSERT(irq != 0);
> +
> + if (assigned_dev->guest_msix_entries[index].flags &
> + KVM_ASSIGNED_MSIX_MASK)
> + disable_irq(irq);
> + else {
> + enable_irq(irq);
> + if (assigned_dev->guest_msix_entries[index].flags &
> + KVM_ASSIGNED_MSIX_PENDING)
> + schedule_work(&assigned_dev->interrupt_work);
> + }
> +}
> +
What happens if guest masks an entry and then we hot-unplug the
device and remove it from guest? It looks like interrupt
will stay disabled?
> static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm,
> struct kvm_assigned_msix_entry *entry)
> {
> @@ -688,6 +714,19 @@ static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm,
> adev->guest_msix_entries[i].entry = entry->entry;
> adev->guest_msix_entries[i].vector = entry->gsi;
> adev->host_msix_entries[i].entry = entry->entry;
> + if ((entry->flags & KVM_MSIX_FLAG_MASK) &&
> + !(adev->guest_msix_entries[i].flags &
> + KVM_ASSIGNED_MSIX_MASK)) {
> + adev->guest_msix_entries[i].flags |=
> + KVM_ASSIGNED_MSIX_MASK;
> + update_msix_mask(adev, i);
> + } else if (!(entry->flags & KVM_MSIX_FLAG_MASK) &&
> + (adev->guest_msix_entries[i].flags &
> + KVM_ASSIGNED_MSIX_MASK)) {
> + adev->guest_msix_entries[i].flags &=
> + ~KVM_ASSIGNED_MSIX_MASK;
> + update_msix_mask(adev, i);
> + }
> break;
> }
> if (i == adev->entries_nr) {
> --
> 1.7.0.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/3] KVM: Emulation MSI-X mask bits for assigned devices
2010-09-30 5:41 ` Sheng Yang
@ 2010-10-03 13:03 ` Avi Kivity
2010-10-11 9:32 ` Sheng Yang
0 siblings, 1 reply; 26+ messages in thread
From: Avi Kivity @ 2010-10-03 13:03 UTC (permalink / raw)
To: Sheng Yang; +Cc: Marcelo Tosatti, kvm
On 09/30/2010 07:41 AM, Sheng Yang wrote:
> On Wednesday 29 September 2010 16:36:25 Avi Kivity wrote:
> > On 09/28/2010 11:44 AM, Sheng Yang wrote:
> > > This patch enable per-vector mask for assigned devices using MSI-X.
> > >
> > > +
> > >
> > > struct kvm_assigned_msix_entry {
> > >
> > > __u32 assigned_dev_id;
> > > __u32 gsi;
> > > __u16 entry; /* The index of entry in the MSI-X table */
> > >
> > > - __u16 padding[3];
> > > + __u16 flags;
> > > + __u16 padding[2];
> > >
> > > };
> >
> > Given that this field wasn't a flag field previously, we can't expect it
> > to be zero. So before we can check it, userspace needs to tell us that
> > is knows the field is not padding, but a flags field.
> >
> > You can use KVM_ENABLE_CAP for that.
>
> OK, I would use it. And it's per vcpu ioctl though, I would like to use it on
> setting for per vm.
Yes. It's easier to have KVM_ASSIGN_SET_MSIX_ENTRY2 that knows about
the flags field.
btw, this should be documented in Documentation/kvm/api.txt.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/3] KVM: Emulation MSI-X mask bits for assigned devices
2010-10-03 11:12 ` Michael S. Tsirkin
@ 2010-10-11 9:28 ` Sheng Yang
2010-10-11 10:01 ` Michael S. Tsirkin
0 siblings, 1 reply; 26+ messages in thread
From: Sheng Yang @ 2010-10-11 9:28 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: Avi Kivity, Marcelo Tosatti, kvm
On Sunday 03 October 2010 19:12:47 Michael S. Tsirkin wrote:
> On Tue, Sep 28, 2010 at 05:44:10PM +0800, Sheng Yang wrote:
> > This patch enable per-vector mask for assigned devices using MSI-X.
> >
> > Signed-off-by: Sheng Yang <sheng@linux.intel.com>
>
> I think I see an issue here, noted below. Some general comments:
> - The mask bit seems broken for injecting interrupts from
> userspace (with interrupts/ioctls).
> I think we must test it on injection path.
I am not quite understand how it related to userspace interrupt injection here...
This patch only cover assigned devices for now.
> - We'll need a way to support the pending bit.
> Disabling interrupts will not let us do it.
We may need a way to support pending bit, though I don't know which guest has used
it... And we can still know if there is interrupt pending by check the real
hardware's pending bit if it's necessary. (And we haven't seen any problem by
leaving the bit 0 so far, and it's not in this patch's scope.)
> > ---
> >
> > arch/x86/kvm/x86.c | 1 +
> > include/linux/kvm.h | 9 ++++++++-
> > include/linux/kvm_host.h | 1 +
> > virt/kvm/assigned-dev.c | 39
+++++++++++++++++++++++++++++++++++++++
> > 4 files changed, 49 insertions(+), 1 deletions(-)
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 8412c91..e6933e6 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -1927,6 +1927,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> >
> > case KVM_CAP_DEBUGREGS:
> > case KVM_CAP_X86_ROBUST_SINGLESTEP:
> >
> > case KVM_CAP_XSAVE:
> > + case KVM_CAP_DEVICE_MSIX_MASK:
> > r = 1;
> > break;
> >
> > case KVM_CAP_COALESCED_MMIO:
> > diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> > index 919ae53..f2b7cdc 100644
> > --- a/include/linux/kvm.h
> > +++ b/include/linux/kvm.h
> > @@ -540,6 +540,9 @@ struct kvm_ppc_pvinfo {
> >
> > #endif
> > #define KVM_CAP_PPC_GET_PVINFO 57
> > #define KVM_CAP_PPC_IRQ_LEVEL 58
> >
> > +#ifdef __KVM_HAVE_MSIX
> > +#define KVM_CAP_DEVICE_MSIX_MASK 59
> > +#endif
> >
> > #ifdef KVM_CAP_IRQ_ROUTING
> >
> > @@ -787,11 +790,15 @@ struct kvm_assigned_msix_nr {
> >
> > };
> >
> > #define KVM_MAX_MSIX_PER_DEV 256
> >
> > +
> > +#define KVM_MSIX_FLAG_MASK 1
> > +
> >
> > struct kvm_assigned_msix_entry {
> >
> > __u32 assigned_dev_id;
> > __u32 gsi;
> > __u16 entry; /* The index of entry in the MSI-X table */
> >
> > - __u16 padding[3];
> > + __u16 flags;
> > + __u16 padding[2];
> >
> > };
> >
> > #endif /* __LINUX_KVM_H */
> >
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 0b89d00..a43405c 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -415,6 +415,7 @@ struct kvm_irq_ack_notifier {
> >
> > };
> >
> > #define KVM_ASSIGNED_MSIX_PENDING 0x1
> >
> > +#define KVM_ASSIGNED_MSIX_MASK 0x2
> >
> > struct kvm_guest_msix_entry {
> >
> > u32 vector;
> > u16 entry;
> >
> > diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
> > index 7c98928..15b8c32 100644
> > --- a/virt/kvm/assigned-dev.c
> > +++ b/virt/kvm/assigned-dev.c
> > @@ -17,6 +17,8 @@
> >
> > #include <linux/pci.h>
> > #include <linux/interrupt.h>
> > #include <linux/slab.h>
> >
> > +#include <linux/irqnr.h>
> > +
> >
> > #include "irq.h"
> >
> > static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct
> > list_head *head,
> >
> > @@ -666,6 +668,30 @@ msix_nr_out:
> > return r;
> >
> > }
> >
> > +static void update_msix_mask(struct kvm_assigned_dev_kernel
> > *assigned_dev, + int index)
> > +{
> > + int irq;
> > +
> > + if (!assigned_dev->dev->msix_enabled ||
> > + !(assigned_dev->irq_requested_type & KVM_DEV_IRQ_HOST_MSIX))
> > + return;
> > +
> > + irq = assigned_dev->host_msix_entries[index].vector;
> > +
> > + ASSERT(irq != 0);
> > +
> > + if (assigned_dev->guest_msix_entries[index].flags &
> > + KVM_ASSIGNED_MSIX_MASK)
> > + disable_irq(irq);
> > + else {
> > + enable_irq(irq);
> > + if (assigned_dev->guest_msix_entries[index].flags &
> > + KVM_ASSIGNED_MSIX_PENDING)
> > + schedule_work(&assigned_dev->interrupt_work);
> > + }
> > +}
> > +
>
> What happens if guest masks an entry and then we hot-unplug the
> device and remove it from guest? It looks like interrupt
> will stay disabled?
I don't think so. pci_disable_msix() which was called in hot-unplug path would
recycle all IRQs used by the device. It should be the same as VM shutdown.
Also before the IRQ was recycled, I believe the same dynamic IRQ wouldn't be used
by other devices.
--
regards
Yang, Sheng
>
> > static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm,
> >
> > struct kvm_assigned_msix_entry *entry)
> >
> > {
> >
> > @@ -688,6 +714,19 @@ static int kvm_vm_ioctl_set_msix_entry(struct kvm
> > *kvm,
> >
> > adev->guest_msix_entries[i].entry = entry->entry;
> > adev->guest_msix_entries[i].vector = entry->gsi;
> > adev->host_msix_entries[i].entry = entry->entry;
> >
> > + if ((entry->flags & KVM_MSIX_FLAG_MASK) &&
> > + !(adev->guest_msix_entries[i].flags &
> > + KVM_ASSIGNED_MSIX_MASK)) {
> > + adev->guest_msix_entries[i].flags |=
> > + KVM_ASSIGNED_MSIX_MASK;
> > + update_msix_mask(adev, i);
> > + } else if (!(entry->flags & KVM_MSIX_FLAG_MASK) &&
> > + (adev->guest_msix_entries[i].flags &
> > + KVM_ASSIGNED_MSIX_MASK)) {
> > + adev->guest_msix_entries[i].flags &=
> > + ~KVM_ASSIGNED_MSIX_MASK;
> > + update_msix_mask(adev, i);
> > + }
> >
> > break;
> >
> > }
> >
> > if (i == adev->entries_nr) {
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/3] KVM: Emulation MSI-X mask bits for assigned devices
2010-10-03 13:03 ` Avi Kivity
@ 2010-10-11 9:32 ` Sheng Yang
0 siblings, 0 replies; 26+ messages in thread
From: Sheng Yang @ 2010-10-11 9:32 UTC (permalink / raw)
To: Avi Kivity; +Cc: Marcelo Tosatti, kvm
On Sunday 03 October 2010 21:03:16 Avi Kivity wrote:
> On 09/30/2010 07:41 AM, Sheng Yang wrote:
> > On Wednesday 29 September 2010 16:36:25 Avi Kivity wrote:
> > > On 09/28/2010 11:44 AM, Sheng Yang wrote:
> > > > This patch enable per-vector mask for assigned devices using MSI-X.
> > > >
> > > > +
> > > >
> > > > struct kvm_assigned_msix_entry {
> > > >
> > > > __u32 assigned_dev_id;
> > > > __u32 gsi;
> > > > __u16 entry; /* The index of entry in the MSI-X table */
> > > >
> > > > - __u16 padding[3];
> > > > + __u16 flags;
> > > > + __u16 padding[2];
> > > >
> > > > };
> > >
> > > Given that this field wasn't a flag field previously, we can't expect
> > > it to be zero. So before we can check it, userspace needs to tell us
> > > that is knows the field is not padding, but a flags field.
> > >
> > > You can use KVM_ENABLE_CAP for that.
> >
> > OK, I would use it. And it's per vcpu ioctl though, I would like to use
> > it on setting for per vm.
>
> Yes. It's easier to have KVM_ASSIGN_SET_MSIX_ENTRY2 that knows about
> the flags field.
Well, maybe it's a waste when we still have room in old one...
I think KVM_ENABLE_CAP maybe good enough.
>
> btw, this should be documented in Documentation/kvm/api.txt.
OK.
--
regards
Yang, Sheng
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 0/3] Emulate MSI-X mask bits for assigned devices
2010-09-30 16:44 ` [PATCH 0/3] Emulate MSI-X mask bits for assigned devices Marcelo Tosatti
@ 2010-10-11 9:34 ` Sheng Yang
0 siblings, 0 replies; 26+ messages in thread
From: Sheng Yang @ 2010-10-11 9:34 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Avi Kivity, kvm
On Friday 01 October 2010 00:44:51 Marcelo Tosatti wrote:
> On Tue, Sep 28, 2010 at 05:44:09PM +0800, Sheng Yang wrote:
> > Hi Avi & Marcelo
> >
> > This patchset would add emulation of MSI-X mask bit for assigned devices
> > in QEmu.
> >
> > BTW: We are also purposed an acceleration of MSI-X mask bit for KVM - to
> > get it done in kernel. That's because sometime MSI-X mask bit was
> > accessed very frequently by guest and emulation in QEmu cost a lot(tens
> > to hundreds percent CPU utilization sometime), e.g. guest using Linux
> > 2.6.18 or under some heavy workload. The method has been proved
> > efficient in Xen, but it would require VMM to handle MSI-X table. Is
> > that OK with you?
>
> It would be good to understand where that cost is, before. Eg. do you
> commit kvm irq route frequently when guest unmasks?
I don't think it would cause performance issue, but maybe need another bulk of
codes in kernel. I would post the in-kernel patch later, which is easier for
discuss. :)
--
regards
Yang, Sheng
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/3] KVM: Emulation MSI-X mask bits for assigned devices
2010-09-30 16:18 ` Marcelo Tosatti
@ 2010-10-11 9:49 ` Sheng Yang
2010-10-11 17:11 ` Marcelo Tosatti
0 siblings, 1 reply; 26+ messages in thread
From: Sheng Yang @ 2010-10-11 9:49 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Avi Kivity, kvm
On Friday 01 October 2010 00:18:10 Marcelo Tosatti wrote:
> On Tue, Sep 28, 2010 at 05:44:10PM +0800, Sheng Yang wrote:
> > This patch enable per-vector mask for assigned devices using MSI-X.
> >
> > Signed-off-by: Sheng Yang <sheng@linux.intel.com>
> > ---
> >
> > arch/x86/kvm/x86.c | 1 +
> > include/linux/kvm.h | 9 ++++++++-
> > include/linux/kvm_host.h | 1 +
> > virt/kvm/assigned-dev.c | 39
+++++++++++++++++++++++++++++++++++++++
> > 4 files changed, 49 insertions(+), 1 deletions(-)
> >
> >
> > #include <linux/slab.h>
> >
> > +#include <linux/irqnr.h>
> > +
> >
> > #include "irq.h"
> >
> > static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct
> > list_head *head,
> >
> > @@ -666,6 +668,30 @@ msix_nr_out:
> > return r;
> >
> > }
> >
> > +static void update_msix_mask(struct kvm_assigned_dev_kernel
> > *assigned_dev, + int index)
> > +{
> > + int irq;
> > +
> > + if (!assigned_dev->dev->msix_enabled ||
> > + !(assigned_dev->irq_requested_type & KVM_DEV_IRQ_HOST_MSIX))
> > + return;
> > +
> > + irq = assigned_dev->host_msix_entries[index].vector;
> > +
> > + ASSERT(irq != 0);
> > +
> > + if (assigned_dev->guest_msix_entries[index].flags &
> > + KVM_ASSIGNED_MSIX_MASK)
> > + disable_irq(irq);
> > + else {
> > + enable_irq(irq);
> > + if (assigned_dev->guest_msix_entries[index].flags &
> > + KVM_ASSIGNED_MSIX_PENDING)
> > + schedule_work(&assigned_dev->interrupt_work);
> > + }
> > +}
> > +
>
> Should flush the workqueue after disabling irq. As you say, the
> schedule_work should be unnecessary.
>
> Also must be careful with races.
Do we need to flush the workqueue? I think the return of writing MSI-X mask bit
doesn't have implicit meaning that we have handled all pending interrupts. So I
think leave the work later should also be fine.
--
regards
Yang, Sheng
> > static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm,
> >
> > struct kvm_assigned_msix_entry *entry)
> >
> > {
> >
> > @@ -688,6 +714,19 @@ static int kvm_vm_ioctl_set_msix_entry(struct kvm
> > *kvm,
> >
> > adev->guest_msix_entries[i].entry = entry->entry;
> > adev->guest_msix_entries[i].vector = entry->gsi;
> > adev->host_msix_entries[i].entry = entry->entry;
> >
> > + if ((entry->flags & KVM_MSIX_FLAG_MASK) &&
> > + !(adev->guest_msix_entries[i].flags &
> > + KVM_ASSIGNED_MSIX_MASK)) {
> > + adev->guest_msix_entries[i].flags |=
> > + KVM_ASSIGNED_MSIX_MASK;
> > + update_msix_mask(adev, i);
> > + } else if (!(entry->flags & KVM_MSIX_FLAG_MASK) &&
> > + (adev->guest_msix_entries[i].flags &
> > + KVM_ASSIGNED_MSIX_MASK)) {
> > + adev->guest_msix_entries[i].flags &=
> > + ~KVM_ASSIGNED_MSIX_MASK;
> > + update_msix_mask(adev, i);
> > + }
> >
> > break;
> >
> > }
> >
> > if (i == adev->entries_nr) {
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/3] KVM: Emulation MSI-X mask bits for assigned devices
2010-10-11 9:28 ` Sheng Yang
@ 2010-10-11 10:01 ` Michael S. Tsirkin
2010-10-12 6:49 ` Sheng Yang
0 siblings, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2010-10-11 10:01 UTC (permalink / raw)
To: Sheng Yang; +Cc: Avi Kivity, Marcelo Tosatti, kvm
On Mon, Oct 11, 2010 at 05:28:30PM +0800, Sheng Yang wrote:
> On Sunday 03 October 2010 19:12:47 Michael S. Tsirkin wrote:
> > On Tue, Sep 28, 2010 at 05:44:10PM +0800, Sheng Yang wrote:
> > > This patch enable per-vector mask for assigned devices using MSI-X.
> > >
> > > Signed-off-by: Sheng Yang <sheng@linux.intel.com>
> >
> > I think I see an issue here, noted below. Some general comments:
> > - The mask bit seems broken for injecting interrupts from
> > userspace (with interrupts/ioctls).
> > I think we must test it on injection path.
>
> I am not quite understand how it related to userspace interrupt injection here...
> This patch only cover assigned devices for now.
Well, this is a kernel/userspace interface, if it's broken for userspace
injection now we'll have to go through pain to fix it in a compatible
way later when we want to use it for userspace injection.
You might want to ask why we want the kernel to support making
userspace-injected interrupts when userspace can just avoid injecting
them, and the answer would be that with irqfd the injection might be
handled in a separate process.
We currently handle this by destroying irqfd when irq is masked,
an ioctl instead would be much faster.
> > - We'll need a way to support the pending bit.
> > Disabling interrupts will not let us do it.
>
> We may need a way to support pending bit, though I don't know which guest has used
> it... And we can still know if there is interrupt pending by check the real
> hardware's pending bit if it's necessary.
That's what I'm saying: since instead of masking the vector in hardware
you disable irq in the APIC, the pending bit that we read from hardware
will not have the correct value.
If we fix this, pending bit handling can be done by userspace.
> (And we haven't seen any problem by
> leaving the bit 0 so far, and it's not in this patch's scope.)
I don't know about anyone using this, either, but the PCI spec does
require support of polling mode where the pending bit is polled instead
of interrupts. So yes, not a high priority to implement, but let's give
the way we intend to support this in the future some thought.
> > > ---
> > >
> > > arch/x86/kvm/x86.c | 1 +
> > > include/linux/kvm.h | 9 ++++++++-
> > > include/linux/kvm_host.h | 1 +
> > > virt/kvm/assigned-dev.c | 39
> +++++++++++++++++++++++++++++++++++++++
> > > 4 files changed, 49 insertions(+), 1 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index 8412c91..e6933e6 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -1927,6 +1927,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> > >
> > > case KVM_CAP_DEBUGREGS:
> > > case KVM_CAP_X86_ROBUST_SINGLESTEP:
> > >
> > > case KVM_CAP_XSAVE:
> > > + case KVM_CAP_DEVICE_MSIX_MASK:
> > > r = 1;
> > > break;
> > >
> > > case KVM_CAP_COALESCED_MMIO:
> > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> > > index 919ae53..f2b7cdc 100644
> > > --- a/include/linux/kvm.h
> > > +++ b/include/linux/kvm.h
> > > @@ -540,6 +540,9 @@ struct kvm_ppc_pvinfo {
> > >
> > > #endif
> > > #define KVM_CAP_PPC_GET_PVINFO 57
> > > #define KVM_CAP_PPC_IRQ_LEVEL 58
> > >
> > > +#ifdef __KVM_HAVE_MSIX
> > > +#define KVM_CAP_DEVICE_MSIX_MASK 59
> > > +#endif
> > >
> > > #ifdef KVM_CAP_IRQ_ROUTING
> > >
> > > @@ -787,11 +790,15 @@ struct kvm_assigned_msix_nr {
> > >
> > > };
> > >
> > > #define KVM_MAX_MSIX_PER_DEV 256
> > >
> > > +
> > > +#define KVM_MSIX_FLAG_MASK 1
> > > +
> > >
> > > struct kvm_assigned_msix_entry {
> > >
> > > __u32 assigned_dev_id;
> > > __u32 gsi;
> > > __u16 entry; /* The index of entry in the MSI-X table */
> > >
> > > - __u16 padding[3];
> > > + __u16 flags;
> > > + __u16 padding[2];
> > >
> > > };
> > >
> > > #endif /* __LINUX_KVM_H */
> > >
> > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > index 0b89d00..a43405c 100644
> > > --- a/include/linux/kvm_host.h
> > > +++ b/include/linux/kvm_host.h
> > > @@ -415,6 +415,7 @@ struct kvm_irq_ack_notifier {
> > >
> > > };
> > >
> > > #define KVM_ASSIGNED_MSIX_PENDING 0x1
> > >
> > > +#define KVM_ASSIGNED_MSIX_MASK 0x2
> > >
> > > struct kvm_guest_msix_entry {
> > >
> > > u32 vector;
> > > u16 entry;
> > >
> > > diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
> > > index 7c98928..15b8c32 100644
> > > --- a/virt/kvm/assigned-dev.c
> > > +++ b/virt/kvm/assigned-dev.c
> > > @@ -17,6 +17,8 @@
> > >
> > > #include <linux/pci.h>
> > > #include <linux/interrupt.h>
> > > #include <linux/slab.h>
> > >
> > > +#include <linux/irqnr.h>
> > > +
> > >
> > > #include "irq.h"
> > >
> > > static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct
> > > list_head *head,
> > >
> > > @@ -666,6 +668,30 @@ msix_nr_out:
> > > return r;
> > >
> > > }
> > >
> > > +static void update_msix_mask(struct kvm_assigned_dev_kernel
> > > *assigned_dev, + int index)
> > > +{
> > > + int irq;
> > > +
> > > + if (!assigned_dev->dev->msix_enabled ||
> > > + !(assigned_dev->irq_requested_type & KVM_DEV_IRQ_HOST_MSIX))
> > > + return;
> > > +
> > > + irq = assigned_dev->host_msix_entries[index].vector;
> > > +
> > > + ASSERT(irq != 0);
> > > +
> > > + if (assigned_dev->guest_msix_entries[index].flags &
> > > + KVM_ASSIGNED_MSIX_MASK)
> > > + disable_irq(irq);
> > > + else {
> > > + enable_irq(irq);
> > > + if (assigned_dev->guest_msix_entries[index].flags &
> > > + KVM_ASSIGNED_MSIX_PENDING)
> > > + schedule_work(&assigned_dev->interrupt_work);
> > > + }
> > > +}
> > > +
> >
> > What happens if guest masks an entry and then we hot-unplug the
> > device and remove it from guest? It looks like interrupt
> > will stay disabled?
>
> I don't think so. pci_disable_msix() which was called in hot-unplug path would
> recycle all IRQs used by the device. It should be the same as VM shutdown.
>
> Also before the IRQ was recycled, I believe the same dynamic IRQ wouldn't be used
> by other devices.
>
> --
> regards
> Yang, Sheng
>
> >
> > > static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm,
> > >
> > > struct kvm_assigned_msix_entry *entry)
> > >
> > > {
> > >
> > > @@ -688,6 +714,19 @@ static int kvm_vm_ioctl_set_msix_entry(struct kvm
> > > *kvm,
> > >
> > > adev->guest_msix_entries[i].entry = entry->entry;
> > > adev->guest_msix_entries[i].vector = entry->gsi;
> > > adev->host_msix_entries[i].entry = entry->entry;
> > >
> > > + if ((entry->flags & KVM_MSIX_FLAG_MASK) &&
> > > + !(adev->guest_msix_entries[i].flags &
> > > + KVM_ASSIGNED_MSIX_MASK)) {
> > > + adev->guest_msix_entries[i].flags |=
> > > + KVM_ASSIGNED_MSIX_MASK;
> > > + update_msix_mask(adev, i);
> > > + } else if (!(entry->flags & KVM_MSIX_FLAG_MASK) &&
> > > + (adev->guest_msix_entries[i].flags &
> > > + KVM_ASSIGNED_MSIX_MASK)) {
> > > + adev->guest_msix_entries[i].flags &=
> > > + ~KVM_ASSIGNED_MSIX_MASK;
> > > + update_msix_mask(adev, i);
> > > + }
> > >
> > > break;
> > >
> > > }
> > >
> > > if (i == adev->entries_nr) {
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/3] KVM: Emulation MSI-X mask bits for assigned devices
2010-10-11 9:49 ` Sheng Yang
@ 2010-10-11 17:11 ` Marcelo Tosatti
0 siblings, 0 replies; 26+ messages in thread
From: Marcelo Tosatti @ 2010-10-11 17:11 UTC (permalink / raw)
To: Sheng Yang; +Cc: Avi Kivity, kvm
On Mon, Oct 11, 2010 at 05:49:17PM +0800, Sheng Yang wrote:
> On Friday 01 October 2010 00:18:10 Marcelo Tosatti wrote:
> > On Tue, Sep 28, 2010 at 05:44:10PM +0800, Sheng Yang wrote:
> > > This patch enable per-vector mask for assigned devices using MSI-X.
> > >
> > > Signed-off-by: Sheng Yang <sheng@linux.intel.com>
> > > ---
> > >
> > > arch/x86/kvm/x86.c | 1 +
> > > include/linux/kvm.h | 9 ++++++++-
> > > include/linux/kvm_host.h | 1 +
> > > virt/kvm/assigned-dev.c | 39
> +++++++++++++++++++++++++++++++++++++++
> > > 4 files changed, 49 insertions(+), 1 deletions(-)
> > >
> > >
> > > #include <linux/slab.h>
> > >
> > > +#include <linux/irqnr.h>
> > > +
> > >
> > > #include "irq.h"
> > >
> > > static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct
> > > list_head *head,
> > >
> > > @@ -666,6 +668,30 @@ msix_nr_out:
> > > return r;
> > >
> > > }
> > >
> > > +static void update_msix_mask(struct kvm_assigned_dev_kernel
> > > *assigned_dev, + int index)
> > > +{
> > > + int irq;
> > > +
> > > + if (!assigned_dev->dev->msix_enabled ||
> > > + !(assigned_dev->irq_requested_type & KVM_DEV_IRQ_HOST_MSIX))
> > > + return;
> > > +
> > > + irq = assigned_dev->host_msix_entries[index].vector;
> > > +
> > > + ASSERT(irq != 0);
> > > +
> > > + if (assigned_dev->guest_msix_entries[index].flags &
> > > + KVM_ASSIGNED_MSIX_MASK)
> > > + disable_irq(irq);
> > > + else {
> > > + enable_irq(irq);
> > > + if (assigned_dev->guest_msix_entries[index].flags &
> > > + KVM_ASSIGNED_MSIX_PENDING)
> > > + schedule_work(&assigned_dev->interrupt_work);
> > > + }
> > > +}
> > > +
> >
> > Should flush the workqueue after disabling irq. As you say, the
> > schedule_work should be unnecessary.
> >
> > Also must be careful with races.
>
> Do we need to flush the workqueue? I think the return of writing MSI-X mask bit
> doesn't have implicit meaning that we have handled all pending interrupts. So I
> think leave the work later should also be fine.
The point is the interrupt will be delivered to the guest after the mask
bit is set. AFAICS, it should either be either before or after (before
is easier to implement, flushing the workqueue).
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/3] KVM: Emulation MSI-X mask bits for assigned devices
2010-10-11 10:01 ` Michael S. Tsirkin
@ 2010-10-12 6:49 ` Sheng Yang
2010-10-12 18:28 ` Michael S. Tsirkin
2010-10-12 18:30 ` Michael S. Tsirkin
0 siblings, 2 replies; 26+ messages in thread
From: Sheng Yang @ 2010-10-12 6:49 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: Avi Kivity, Marcelo Tosatti, kvm
On Monday 11 October 2010 18:01:00 Michael S. Tsirkin wrote:
> On Mon, Oct 11, 2010 at 05:28:30PM +0800, Sheng Yang wrote:
> > On Sunday 03 October 2010 19:12:47 Michael S. Tsirkin wrote:
> > > On Tue, Sep 28, 2010 at 05:44:10PM +0800, Sheng Yang wrote:
> > > > This patch enable per-vector mask for assigned devices using MSI-X.
> > > >
> > > > Signed-off-by: Sheng Yang <sheng@linux.intel.com>
> > >
> > > I think I see an issue here, noted below. Some general comments:
> > > - The mask bit seems broken for injecting interrupts from
> > >
> > > userspace (with interrupts/ioctls).
> > > I think we must test it on injection path.
> >
> > I am not quite understand how it related to userspace interrupt injection
> > here... This patch only cover assigned devices for now.
>
> Well, this is a kernel/userspace interface, if it's broken for userspace
> injection now we'll have to go through pain to fix it in a compatible
> way later when we want to use it for userspace injection.
> You might want to ask why we want the kernel to support making
> userspace-injected interrupts when userspace can just avoid injecting
> them, and the answer would be that with irqfd the injection might be
> handled in a separate process.
OK, I've understood how it related to userspace interrupt injection. But I still
can't see why the interface is broken...
>
> We currently handle this by destroying irqfd when irq is masked,
> an ioctl instead would be much faster.
>
> > > - We'll need a way to support the pending bit.
> > >
> > > Disabling interrupts will not let us do it.
> >
> > We may need a way to support pending bit, though I don't know which guest
> > has used it... And we can still know if there is interrupt pending by
> > check the real hardware's pending bit if it's necessary.
>
> That's what I'm saying: since instead of masking the vector in hardware
> you disable irq in the APIC, the pending bit that we read from hardware
> will not have the correct value.
Are you sure? This disable_irq() has nothing to do with APIC. The disable callback
in msi_chip didn't do anything but mark the IRQ status as IRQ_DISABLED, and the
follow interrupt(if there are any) would be acked and masked, using mask callback
in msi_chip.
irq_to_desc() need to be exported for my initial version, in order to use mask
callback. But later I think it would be clear and better if we use general IRQ
function to do it. And I don't think the current solution would prevent us from
reading hardware pending bits.
--
regards
Yang, Sheng
>
> If we fix this, pending bit handling can be done by userspace.
>
> > (And we haven't seen any problem by
> > leaving the bit 0 so far, and it's not in this patch's scope.)
>
> I don't know about anyone using this, either, but the PCI spec does
> require support of polling mode where the pending bit is polled instead
> of interrupts. So yes, not a high priority to implement, but let's give
> the way we intend to support this in the future some thought.
>
> > > > ---
> > > >
> > > > arch/x86/kvm/x86.c | 1 +
> > > > include/linux/kvm.h | 9 ++++++++-
> > > > include/linux/kvm_host.h | 1 +
> > > > virt/kvm/assigned-dev.c | 39
> >
> > +++++++++++++++++++++++++++++++++++++++
> >
> > > > 4 files changed, 49 insertions(+), 1 deletions(-)
> > > >
> > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > index 8412c91..e6933e6 100644
> > > > --- a/arch/x86/kvm/x86.c
> > > > +++ b/arch/x86/kvm/x86.c
> > > > @@ -1927,6 +1927,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> > > >
> > > > case KVM_CAP_DEBUGREGS:
> > > > case KVM_CAP_X86_ROBUST_SINGLESTEP:
> > > >
> > > > case KVM_CAP_XSAVE:
> > > > + case KVM_CAP_DEVICE_MSIX_MASK:
> > > > r = 1;
> > > > break;
> > > >
> > > > case KVM_CAP_COALESCED_MMIO:
> > > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> > > > index 919ae53..f2b7cdc 100644
> > > > --- a/include/linux/kvm.h
> > > > +++ b/include/linux/kvm.h
> > > > @@ -540,6 +540,9 @@ struct kvm_ppc_pvinfo {
> > > >
> > > > #endif
> > > > #define KVM_CAP_PPC_GET_PVINFO 57
> > > > #define KVM_CAP_PPC_IRQ_LEVEL 58
> > > >
> > > > +#ifdef __KVM_HAVE_MSIX
> > > > +#define KVM_CAP_DEVICE_MSIX_MASK 59
> > > > +#endif
> > > >
> > > > #ifdef KVM_CAP_IRQ_ROUTING
> > > >
> > > > @@ -787,11 +790,15 @@ struct kvm_assigned_msix_nr {
> > > >
> > > > };
> > > >
> > > > #define KVM_MAX_MSIX_PER_DEV 256
> > > >
> > > > +
> > > > +#define KVM_MSIX_FLAG_MASK 1
> > > > +
> > > >
> > > > struct kvm_assigned_msix_entry {
> > > >
> > > > __u32 assigned_dev_id;
> > > > __u32 gsi;
> > > > __u16 entry; /* The index of entry in the MSI-X table */
> > > >
> > > > - __u16 padding[3];
> > > > + __u16 flags;
> > > > + __u16 padding[2];
> > > >
> > > > };
> > > >
> > > > #endif /* __LINUX_KVM_H */
> > > >
> > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > > index 0b89d00..a43405c 100644
> > > > --- a/include/linux/kvm_host.h
> > > > +++ b/include/linux/kvm_host.h
> > > > @@ -415,6 +415,7 @@ struct kvm_irq_ack_notifier {
> > > >
> > > > };
> > > >
> > > > #define KVM_ASSIGNED_MSIX_PENDING 0x1
> > > >
> > > > +#define KVM_ASSIGNED_MSIX_MASK 0x2
> > > >
> > > > struct kvm_guest_msix_entry {
> > > >
> > > > u32 vector;
> > > > u16 entry;
> > > >
> > > > diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
> > > > index 7c98928..15b8c32 100644
> > > > --- a/virt/kvm/assigned-dev.c
> > > > +++ b/virt/kvm/assigned-dev.c
> > > > @@ -17,6 +17,8 @@
> > > >
> > > > #include <linux/pci.h>
> > > > #include <linux/interrupt.h>
> > > > #include <linux/slab.h>
> > > >
> > > > +#include <linux/irqnr.h>
> > > > +
> > > >
> > > > #include "irq.h"
> > > >
> > > > static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct
> > > > list_head *head,
> > > >
> > > > @@ -666,6 +668,30 @@ msix_nr_out:
> > > > return r;
> > > >
> > > > }
> > > >
> > > > +static void update_msix_mask(struct kvm_assigned_dev_kernel
> > > > *assigned_dev, + int index)
> > > > +{
> > > > + int irq;
> > > > +
> > > > + if (!assigned_dev->dev->msix_enabled ||
> > > > + !(assigned_dev->irq_requested_type & KVM_DEV_IRQ_HOST_MSIX))
> > > > + return;
> > > > +
> > > > + irq = assigned_dev->host_msix_entries[index].vector;
> > > > +
> > > > + ASSERT(irq != 0);
> > > > +
> > > > + if (assigned_dev->guest_msix_entries[index].flags &
> > > > + KVM_ASSIGNED_MSIX_MASK)
> > > > + disable_irq(irq);
> > > > + else {
> > > > + enable_irq(irq);
> > > > + if (assigned_dev->guest_msix_entries[index].flags &
> > > > + KVM_ASSIGNED_MSIX_PENDING)
> > > > + schedule_work(&assigned_dev->interrupt_work);
> > > > + }
> > > > +}
> > > > +
> > >
> > > What happens if guest masks an entry and then we hot-unplug the
> > > device and remove it from guest? It looks like interrupt
> > > will stay disabled?
> >
> > I don't think so. pci_disable_msix() which was called in hot-unplug path
> > would recycle all IRQs used by the device. It should be the same as VM
> > shutdown.
> >
> > Also before the IRQ was recycled, I believe the same dynamic IRQ wouldn't
> > be used by other devices.
> >
> > --
> > regards
> > Yang, Sheng
> >
> > > > static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm,
> > > >
> > > > struct kvm_assigned_msix_entry *entry)
> > > >
> > > > {
> > > >
> > > > @@ -688,6 +714,19 @@ static int kvm_vm_ioctl_set_msix_entry(struct
> > > > kvm *kvm,
> > > >
> > > > adev->guest_msix_entries[i].entry = entry->entry;
> > > > adev->guest_msix_entries[i].vector = entry->gsi;
> > > > adev->host_msix_entries[i].entry = entry->entry;
> > > >
> > > > + if ((entry->flags & KVM_MSIX_FLAG_MASK) &&
> > > > + !(adev->guest_msix_entries[i].flags &
> > > > + KVM_ASSIGNED_MSIX_MASK)) {
> > > > + adev->guest_msix_entries[i].flags |=
> > > > + KVM_ASSIGNED_MSIX_MASK;
> > > > + update_msix_mask(adev, i);
> > > > + } else if (!(entry->flags & KVM_MSIX_FLAG_MASK) &&
> > > > + (adev->guest_msix_entries[i].flags &
> > > > + KVM_ASSIGNED_MSIX_MASK)) {
> > > > + adev->guest_msix_entries[i].flags &=
> > > > + ~KVM_ASSIGNED_MSIX_MASK;
> > > > + update_msix_mask(adev, i);
> > > > + }
> > > >
> > > > break;
> > > >
> > > > }
> > > >
> > > > if (i == adev->entries_nr) {
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/3] KVM: Emulation MSI-X mask bits for assigned devices
2010-10-12 6:49 ` Sheng Yang
@ 2010-10-12 18:28 ` Michael S. Tsirkin
2010-10-13 1:03 ` Sheng Yang
2010-10-12 18:30 ` Michael S. Tsirkin
1 sibling, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2010-10-12 18:28 UTC (permalink / raw)
To: Sheng Yang; +Cc: Avi Kivity, Marcelo Tosatti, kvm
On Tue, Oct 12, 2010 at 02:49:58PM +0800, Sheng Yang wrote:
> On Monday 11 October 2010 18:01:00 Michael S. Tsirkin wrote:
> > On Mon, Oct 11, 2010 at 05:28:30PM +0800, Sheng Yang wrote:
> > > On Sunday 03 October 2010 19:12:47 Michael S. Tsirkin wrote:
> > > > On Tue, Sep 28, 2010 at 05:44:10PM +0800, Sheng Yang wrote:
> > > > > This patch enable per-vector mask for assigned devices using MSI-X.
> > > > >
> > > > > Signed-off-by: Sheng Yang <sheng@linux.intel.com>
> > > >
> > > > I think I see an issue here, noted below. Some general comments:
> > > > - The mask bit seems broken for injecting interrupts from
> > > >
> > > > userspace (with interrupts/ioctls).
> > > > I think we must test it on injection path.
> > >
> > > I am not quite understand how it related to userspace interrupt injection
> > > here... This patch only cover assigned devices for now.
> >
> > Well, this is a kernel/userspace interface, if it's broken for userspace
> > injection now we'll have to go through pain to fix it in a compatible
> > way later when we want to use it for userspace injection.
> > You might want to ask why we want the kernel to support making
> > userspace-injected interrupts when userspace can just avoid injecting
> > them, and the answer would be that with irqfd the injection might be
> > handled in a separate process.
>
> OK, I've understood how it related to userspace interrupt injection. But I still
> can't see why the interface is broken...
> >
> > We currently handle this by destroying irqfd when irq is masked,
> > an ioctl instead would be much faster.
> >
> > > > - We'll need a way to support the pending bit.
> > > >
> > > > Disabling interrupts will not let us do it.
> > >
> > > We may need a way to support pending bit, though I don't know which guest
> > > has used it... And we can still know if there is interrupt pending by
> > > check the real hardware's pending bit if it's necessary.
> >
> > That's what I'm saying: since instead of masking the vector in hardware
> > you disable irq in the APIC, the pending bit that we read from hardware
> > will not have the correct value.
>
> Are you sure? This disable_irq() has nothing to do with APIC. The disable callback
> in msi_chip didn't do anything but mark the IRQ status as IRQ_DISABLED, and the
> follow interrupt(if there are any) would be acked and masked, using mask callback
> in msi_chip.
>
> irq_to_desc() need to be exported for my initial version, in order to use mask
> callback. But later I think it would be clear and better if we use general IRQ
> function to do it. And I don't think the current solution would prevent us from
> reading hardware pending bits.
Not sure, I'll try to look into code later, but just based on this
description:
on real hardware:
mask
interrupt
results in both pending bit being set
on guest with assigned device
mask
interrupt
results in mask being set but pending bit not set
(as interrupt was already sent)
So if we try to look at pending bits looks like we'll miss
some interrupts. No?
>
> --
> regards
> Yang, Sheng
>
> >
> > If we fix this, pending bit handling can be done by userspace.
> >
> > > (And we haven't seen any problem by
> > > leaving the bit 0 so far, and it's not in this patch's scope.)
> >
> > I don't know about anyone using this, either, but the PCI spec does
> > require support of polling mode where the pending bit is polled instead
> > of interrupts. So yes, not a high priority to implement, but let's give
> > the way we intend to support this in the future some thought.
> >
> > > > > ---
> > > > >
> > > > > arch/x86/kvm/x86.c | 1 +
> > > > > include/linux/kvm.h | 9 ++++++++-
> > > > > include/linux/kvm_host.h | 1 +
> > > > > virt/kvm/assigned-dev.c | 39
> > >
> > > +++++++++++++++++++++++++++++++++++++++
> > >
> > > > > 4 files changed, 49 insertions(+), 1 deletions(-)
> > > > >
> > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > > index 8412c91..e6933e6 100644
> > > > > --- a/arch/x86/kvm/x86.c
> > > > > +++ b/arch/x86/kvm/x86.c
> > > > > @@ -1927,6 +1927,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> > > > >
> > > > > case KVM_CAP_DEBUGREGS:
> > > > > case KVM_CAP_X86_ROBUST_SINGLESTEP:
> > > > >
> > > > > case KVM_CAP_XSAVE:
> > > > > + case KVM_CAP_DEVICE_MSIX_MASK:
> > > > > r = 1;
> > > > > break;
> > > > >
> > > > > case KVM_CAP_COALESCED_MMIO:
> > > > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> > > > > index 919ae53..f2b7cdc 100644
> > > > > --- a/include/linux/kvm.h
> > > > > +++ b/include/linux/kvm.h
> > > > > @@ -540,6 +540,9 @@ struct kvm_ppc_pvinfo {
> > > > >
> > > > > #endif
> > > > > #define KVM_CAP_PPC_GET_PVINFO 57
> > > > > #define KVM_CAP_PPC_IRQ_LEVEL 58
> > > > >
> > > > > +#ifdef __KVM_HAVE_MSIX
> > > > > +#define KVM_CAP_DEVICE_MSIX_MASK 59
> > > > > +#endif
> > > > >
> > > > > #ifdef KVM_CAP_IRQ_ROUTING
> > > > >
> > > > > @@ -787,11 +790,15 @@ struct kvm_assigned_msix_nr {
> > > > >
> > > > > };
> > > > >
> > > > > #define KVM_MAX_MSIX_PER_DEV 256
> > > > >
> > > > > +
> > > > > +#define KVM_MSIX_FLAG_MASK 1
> > > > > +
> > > > >
> > > > > struct kvm_assigned_msix_entry {
> > > > >
> > > > > __u32 assigned_dev_id;
> > > > > __u32 gsi;
> > > > > __u16 entry; /* The index of entry in the MSI-X table */
> > > > >
> > > > > - __u16 padding[3];
> > > > > + __u16 flags;
> > > > > + __u16 padding[2];
> > > > >
> > > > > };
> > > > >
> > > > > #endif /* __LINUX_KVM_H */
> > > > >
> > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > > > index 0b89d00..a43405c 100644
> > > > > --- a/include/linux/kvm_host.h
> > > > > +++ b/include/linux/kvm_host.h
> > > > > @@ -415,6 +415,7 @@ struct kvm_irq_ack_notifier {
> > > > >
> > > > > };
> > > > >
> > > > > #define KVM_ASSIGNED_MSIX_PENDING 0x1
> > > > >
> > > > > +#define KVM_ASSIGNED_MSIX_MASK 0x2
> > > > >
> > > > > struct kvm_guest_msix_entry {
> > > > >
> > > > > u32 vector;
> > > > > u16 entry;
> > > > >
> > > > > diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
> > > > > index 7c98928..15b8c32 100644
> > > > > --- a/virt/kvm/assigned-dev.c
> > > > > +++ b/virt/kvm/assigned-dev.c
> > > > > @@ -17,6 +17,8 @@
> > > > >
> > > > > #include <linux/pci.h>
> > > > > #include <linux/interrupt.h>
> > > > > #include <linux/slab.h>
> > > > >
> > > > > +#include <linux/irqnr.h>
> > > > > +
> > > > >
> > > > > #include "irq.h"
> > > > >
> > > > > static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct
> > > > > list_head *head,
> > > > >
> > > > > @@ -666,6 +668,30 @@ msix_nr_out:
> > > > > return r;
> > > > >
> > > > > }
> > > > >
> > > > > +static void update_msix_mask(struct kvm_assigned_dev_kernel
> > > > > *assigned_dev, + int index)
> > > > > +{
> > > > > + int irq;
> > > > > +
> > > > > + if (!assigned_dev->dev->msix_enabled ||
> > > > > + !(assigned_dev->irq_requested_type & KVM_DEV_IRQ_HOST_MSIX))
> > > > > + return;
> > > > > +
> > > > > + irq = assigned_dev->host_msix_entries[index].vector;
> > > > > +
> > > > > + ASSERT(irq != 0);
> > > > > +
> > > > > + if (assigned_dev->guest_msix_entries[index].flags &
> > > > > + KVM_ASSIGNED_MSIX_MASK)
> > > > > + disable_irq(irq);
> > > > > + else {
> > > > > + enable_irq(irq);
> > > > > + if (assigned_dev->guest_msix_entries[index].flags &
> > > > > + KVM_ASSIGNED_MSIX_PENDING)
> > > > > + schedule_work(&assigned_dev->interrupt_work);
> > > > > + }
> > > > > +}
> > > > > +
> > > >
> > > > What happens if guest masks an entry and then we hot-unplug the
> > > > device and remove it from guest? It looks like interrupt
> > > > will stay disabled?
> > >
> > > I don't think so. pci_disable_msix() which was called in hot-unplug path
> > > would recycle all IRQs used by the device. It should be the same as VM
> > > shutdown.
> > >
> > > Also before the IRQ was recycled, I believe the same dynamic IRQ wouldn't
> > > be used by other devices.
> > >
> > > --
> > > regards
> > > Yang, Sheng
> > >
> > > > > static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm,
> > > > >
> > > > > struct kvm_assigned_msix_entry *entry)
> > > > >
> > > > > {
> > > > >
> > > > > @@ -688,6 +714,19 @@ static int kvm_vm_ioctl_set_msix_entry(struct
> > > > > kvm *kvm,
> > > > >
> > > > > adev->guest_msix_entries[i].entry = entry->entry;
> > > > > adev->guest_msix_entries[i].vector = entry->gsi;
> > > > > adev->host_msix_entries[i].entry = entry->entry;
> > > > >
> > > > > + if ((entry->flags & KVM_MSIX_FLAG_MASK) &&
> > > > > + !(adev->guest_msix_entries[i].flags &
> > > > > + KVM_ASSIGNED_MSIX_MASK)) {
> > > > > + adev->guest_msix_entries[i].flags |=
> > > > > + KVM_ASSIGNED_MSIX_MASK;
> > > > > + update_msix_mask(adev, i);
> > > > > + } else if (!(entry->flags & KVM_MSIX_FLAG_MASK) &&
> > > > > + (adev->guest_msix_entries[i].flags &
> > > > > + KVM_ASSIGNED_MSIX_MASK)) {
> > > > > + adev->guest_msix_entries[i].flags &=
> > > > > + ~KVM_ASSIGNED_MSIX_MASK;
> > > > > + update_msix_mask(adev, i);
> > > > > + }
> > > > >
> > > > > break;
> > > > >
> > > > > }
> > > > >
> > > > > if (i == adev->entries_nr) {
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/3] KVM: Emulation MSI-X mask bits for assigned devices
2010-10-12 6:49 ` Sheng Yang
2010-10-12 18:28 ` Michael S. Tsirkin
@ 2010-10-12 18:30 ` Michael S. Tsirkin
2010-10-13 0:58 ` Sheng Yang
1 sibling, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2010-10-12 18:30 UTC (permalink / raw)
To: Sheng Yang; +Cc: Avi Kivity, Marcelo Tosatti, kvm
On Tue, Oct 12, 2010 at 02:49:58PM +0800, Sheng Yang wrote:
> On Monday 11 October 2010 18:01:00 Michael S. Tsirkin wrote:
> > On Mon, Oct 11, 2010 at 05:28:30PM +0800, Sheng Yang wrote:
> > > On Sunday 03 October 2010 19:12:47 Michael S. Tsirkin wrote:
> > > > On Tue, Sep 28, 2010 at 05:44:10PM +0800, Sheng Yang wrote:
> > > > > This patch enable per-vector mask for assigned devices using MSI-X.
> > > > >
> > > > > Signed-off-by: Sheng Yang <sheng@linux.intel.com>
> > > >
> > > > I think I see an issue here, noted below. Some general comments:
> > > > - The mask bit seems broken for injecting interrupts from
> > > >
> > > > userspace (with interrupts/ioctls).
> > > > I think we must test it on injection path.
> > >
> > > I am not quite understand how it related to userspace interrupt injection
> > > here... This patch only cover assigned devices for now.
> >
> > Well, this is a kernel/userspace interface, if it's broken for userspace
> > injection now we'll have to go through pain to fix it in a compatible
> > way later when we want to use it for userspace injection.
> > You might want to ask why we want the kernel to support making
> > userspace-injected interrupts when userspace can just avoid injecting
> > them, and the answer would be that with irqfd the injection might be
> > handled in a separate process.
>
> OK, I've understood how it related to userspace interrupt injection. But I still
> can't see why the interface is broken...
That's easy to explain: mask vector, then inject interrupt with an ioctl or irqfd:
it still goes in even though it's masked.
--
MST
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/3] KVM: Emulation MSI-X mask bits for assigned devices
2010-10-12 18:30 ` Michael S. Tsirkin
@ 2010-10-13 0:58 ` Sheng Yang
2010-10-13 12:03 ` Michael S. Tsirkin
0 siblings, 1 reply; 26+ messages in thread
From: Sheng Yang @ 2010-10-13 0:58 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: Avi Kivity, Marcelo Tosatti, kvm
On Wednesday 13 October 2010 02:30:30 Michael S. Tsirkin wrote:
> On Tue, Oct 12, 2010 at 02:49:58PM +0800, Sheng Yang wrote:
> > On Monday 11 October 2010 18:01:00 Michael S. Tsirkin wrote:
> > > On Mon, Oct 11, 2010 at 05:28:30PM +0800, Sheng Yang wrote:
> > > > On Sunday 03 October 2010 19:12:47 Michael S. Tsirkin wrote:
> > > > > On Tue, Sep 28, 2010 at 05:44:10PM +0800, Sheng Yang wrote:
> > > > > > This patch enable per-vector mask for assigned devices using
> > > > > > MSI-X.
> > > > > >
> > > > > > Signed-off-by: Sheng Yang <sheng@linux.intel.com>
> > > > >
> > > > > I think I see an issue here, noted below. Some general comments:
> > > > > - The mask bit seems broken for injecting interrupts from
> > > > >
> > > > > userspace (with interrupts/ioctls).
> > > > > I think we must test it on injection path.
> > > >
> > > > I am not quite understand how it related to userspace interrupt
> > > > injection here... This patch only cover assigned devices for now.
> > >
> > > Well, this is a kernel/userspace interface, if it's broken for
> > > userspace injection now we'll have to go through pain to fix it in a
> > > compatible way later when we want to use it for userspace injection.
> > > You might want to ask why we want the kernel to support making
> > > userspace-injected interrupts when userspace can just avoid injecting
> > > them, and the answer would be that with irqfd the injection might be
> > > handled in a separate process.
> >
> > OK, I've understood how it related to userspace interrupt injection. But
> > I still can't see why the interface is broken...
>
> That's easy to explain: mask vector, then inject interrupt with an ioctl or
> irqfd: it still goes in even though it's masked.
Lol... Obviously we have different definitions of the word "broken".
To my understanding, first, this interface is not new, just a part of old interface
for assigned devices; and it covered current situation well. Second, this
interface didn't cover your situation which doesn't exist currently. Third, this
interface didn't prevent you from implementing solution for userspace interrupt
injection mask in the future. So I don't understand how can you say it's "broken".
Currently I think this interface is direct and elegant.
And could you elaborate your alternative way of doing this? Of course I would
consider it.
--
regards
Yang, Sheng
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/3] KVM: Emulation MSI-X mask bits for assigned devices
2010-10-12 18:28 ` Michael S. Tsirkin
@ 2010-10-13 1:03 ` Sheng Yang
2010-10-13 10:35 ` Michael S. Tsirkin
0 siblings, 1 reply; 26+ messages in thread
From: Sheng Yang @ 2010-10-13 1:03 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: Avi Kivity, Marcelo Tosatti, kvm
On Wednesday 13 October 2010 02:28:59 Michael S. Tsirkin wrote:
> On Tue, Oct 12, 2010 at 02:49:58PM +0800, Sheng Yang wrote:
> > On Monday 11 October 2010 18:01:00 Michael S. Tsirkin wrote:
> > > On Mon, Oct 11, 2010 at 05:28:30PM +0800, Sheng Yang wrote:
> > > > On Sunday 03 October 2010 19:12:47 Michael S. Tsirkin wrote:
> > > > > On Tue, Sep 28, 2010 at 05:44:10PM +0800, Sheng Yang wrote:
> > > > > > This patch enable per-vector mask for assigned devices using
> > > > > > MSI-X.
> > > > > >
> > > > > > Signed-off-by: Sheng Yang <sheng@linux.intel.com>
> > > > >
> > > > > I think I see an issue here, noted below. Some general comments:
> > > > > - The mask bit seems broken for injecting interrupts from
> > > > >
> > > > > userspace (with interrupts/ioctls).
> > > > > I think we must test it on injection path.
> > > >
> > > > I am not quite understand how it related to userspace interrupt
> > > > injection here... This patch only cover assigned devices for now.
> > >
> > > Well, this is a kernel/userspace interface, if it's broken for
> > > userspace injection now we'll have to go through pain to fix it in a
> > > compatible way later when we want to use it for userspace injection.
> > > You might want to ask why we want the kernel to support making
> > > userspace-injected interrupts when userspace can just avoid injecting
> > > them, and the answer would be that with irqfd the injection might be
> > > handled in a separate process.
> >
> > OK, I've understood how it related to userspace interrupt injection. But
> > I still can't see why the interface is broken...
> >
> > > We currently handle this by destroying irqfd when irq is masked,
> > > an ioctl instead would be much faster.
> > >
> > > > > - We'll need a way to support the pending bit.
> > > > >
> > > > > Disabling interrupts will not let us do it.
> > > >
> > > > We may need a way to support pending bit, though I don't know which
> > > > guest has used it... And we can still know if there is interrupt
> > > > pending by check the real hardware's pending bit if it's necessary.
> > >
> > > That's what I'm saying: since instead of masking the vector in hardware
> > > you disable irq in the APIC, the pending bit that we read from hardware
> > > will not have the correct value.
> >
> > Are you sure? This disable_irq() has nothing to do with APIC. The disable
> > callback in msi_chip didn't do anything but mark the IRQ status as
> > IRQ_DISABLED, and the follow interrupt(if there are any) would be acked
> > and masked, using mask callback in msi_chip.
> >
> > irq_to_desc() need to be exported for my initial version, in order to use
> > mask callback. But later I think it would be clear and better if we use
> > general IRQ function to do it. And I don't think the current solution
> > would prevent us from reading hardware pending bits.
>
> Not sure, I'll try to look into code later, but just based on this
> description:
>
> on real hardware:
> mask
> interrupt
> results in both pending bit being set
>
> on guest with assigned device
> mask
> interrupt
> results in mask being set but pending bit not set
> (as interrupt was already sent)
>
> So if we try to look at pending bits looks like we'll miss
> some interrupts. No?
Yes, there is one interrupt would fail to set the pending bit. But I don't think
it worth export/adding new interface for core irq handling functions now... Still,
I don't think pending bit matters much. And with current code, we can still make
pending bit works well on the most condition. I think that's good enough.
--
regards
Yang, Sheng
>
> > --
> > regards
> > Yang, Sheng
> >
> > > If we fix this, pending bit handling can be done by userspace.
> > >
> > > > (And we haven't seen any problem by
> > > > leaving the bit 0 so far, and it's not in this patch's scope.)
> > >
> > > I don't know about anyone using this, either, but the PCI spec does
> > > require support of polling mode where the pending bit is polled instead
> > > of interrupts. So yes, not a high priority to implement, but let's give
> > > the way we intend to support this in the future some thought.
> > >
> > > > > > ---
> > > > > >
> > > > > > arch/x86/kvm/x86.c | 1 +
> > > > > > include/linux/kvm.h | 9 ++++++++-
> > > > > > include/linux/kvm_host.h | 1 +
> > > > > > virt/kvm/assigned-dev.c | 39
> > > >
> > > > +++++++++++++++++++++++++++++++++++++++
> > > >
> > > > > > 4 files changed, 49 insertions(+), 1 deletions(-)
> > > > > >
> > > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > > > index 8412c91..e6933e6 100644
> > > > > > --- a/arch/x86/kvm/x86.c
> > > > > > +++ b/arch/x86/kvm/x86.c
> > > > > > @@ -1927,6 +1927,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> > > > > >
> > > > > > case KVM_CAP_DEBUGREGS:
> > > > > > case KVM_CAP_X86_ROBUST_SINGLESTEP:
> > > > > >
> > > > > > case KVM_CAP_XSAVE:
> > > > > > + case KVM_CAP_DEVICE_MSIX_MASK:
> > > > > > r = 1;
> > > > > > break;
> > > > > >
> > > > > > case KVM_CAP_COALESCED_MMIO:
> > > > > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> > > > > > index 919ae53..f2b7cdc 100644
> > > > > > --- a/include/linux/kvm.h
> > > > > > +++ b/include/linux/kvm.h
> > > > > > @@ -540,6 +540,9 @@ struct kvm_ppc_pvinfo {
> > > > > >
> > > > > > #endif
> > > > > > #define KVM_CAP_PPC_GET_PVINFO 57
> > > > > > #define KVM_CAP_PPC_IRQ_LEVEL 58
> > > > > >
> > > > > > +#ifdef __KVM_HAVE_MSIX
> > > > > > +#define KVM_CAP_DEVICE_MSIX_MASK 59
> > > > > > +#endif
> > > > > >
> > > > > > #ifdef KVM_CAP_IRQ_ROUTING
> > > > > >
> > > > > > @@ -787,11 +790,15 @@ struct kvm_assigned_msix_nr {
> > > > > >
> > > > > > };
> > > > > >
> > > > > > #define KVM_MAX_MSIX_PER_DEV 256
> > > > > >
> > > > > > +
> > > > > > +#define KVM_MSIX_FLAG_MASK 1
> > > > > > +
> > > > > >
> > > > > > struct kvm_assigned_msix_entry {
> > > > > >
> > > > > > __u32 assigned_dev_id;
> > > > > > __u32 gsi;
> > > > > > __u16 entry; /* The index of entry in the MSI-X table */
> > > > > >
> > > > > > - __u16 padding[3];
> > > > > > + __u16 flags;
> > > > > > + __u16 padding[2];
> > > > > >
> > > > > > };
> > > > > >
> > > > > > #endif /* __LINUX_KVM_H */
> > > > > >
> > > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > > > > index 0b89d00..a43405c 100644
> > > > > > --- a/include/linux/kvm_host.h
> > > > > > +++ b/include/linux/kvm_host.h
> > > > > > @@ -415,6 +415,7 @@ struct kvm_irq_ack_notifier {
> > > > > >
> > > > > > };
> > > > > >
> > > > > > #define KVM_ASSIGNED_MSIX_PENDING 0x1
> > > > > >
> > > > > > +#define KVM_ASSIGNED_MSIX_MASK 0x2
> > > > > >
> > > > > > struct kvm_guest_msix_entry {
> > > > > >
> > > > > > u32 vector;
> > > > > > u16 entry;
> > > > > >
> > > > > > diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
> > > > > > index 7c98928..15b8c32 100644
> > > > > > --- a/virt/kvm/assigned-dev.c
> > > > > > +++ b/virt/kvm/assigned-dev.c
> > > > > > @@ -17,6 +17,8 @@
> > > > > >
> > > > > > #include <linux/pci.h>
> > > > > > #include <linux/interrupt.h>
> > > > > > #include <linux/slab.h>
> > > > > >
> > > > > > +#include <linux/irqnr.h>
> > > > > > +
> > > > > >
> > > > > > #include "irq.h"
> > > > > >
> > > > > > static struct kvm_assigned_dev_kernel
> > > > > > *kvm_find_assigned_dev(struct list_head *head,
> > > > > >
> > > > > > @@ -666,6 +668,30 @@ msix_nr_out:
> > > > > > return r;
> > > > > >
> > > > > > }
> > > > > >
> > > > > > +static void update_msix_mask(struct kvm_assigned_dev_kernel
> > > > > > *assigned_dev, + int index)
> > > > > > +{
> > > > > > + int irq;
> > > > > > +
> > > > > > + if (!assigned_dev->dev->msix_enabled ||
> > > > > > + !(assigned_dev->irq_requested_type &
> > > > > > KVM_DEV_IRQ_HOST_MSIX)) + return;
> > > > > > +
> > > > > > + irq = assigned_dev->host_msix_entries[index].vector;
> > > > > > +
> > > > > > + ASSERT(irq != 0);
> > > > > > +
> > > > > > + if (assigned_dev->guest_msix_entries[index].flags &
> > > > > > + KVM_ASSIGNED_MSIX_MASK)
> > > > > > + disable_irq(irq);
> > > > > > + else {
> > > > > > + enable_irq(irq);
> > > > > > + if (assigned_dev->guest_msix_entries[index].flags &
> > > > > > + KVM_ASSIGNED_MSIX_PENDING)
> > > > > > + schedule_work(&assigned_dev->interrupt_work);
> > > > > > + }
> > > > > > +}
> > > > > > +
> > > > >
> > > > > What happens if guest masks an entry and then we hot-unplug the
> > > > > device and remove it from guest? It looks like interrupt
> > > > > will stay disabled?
> > > >
> > > > I don't think so. pci_disable_msix() which was called in hot-unplug
> > > > path would recycle all IRQs used by the device. It should be the
> > > > same as VM shutdown.
> > > >
> > > > Also before the IRQ was recycled, I believe the same dynamic IRQ
> > > > wouldn't be used by other devices.
> > > >
> > > > --
> > > > regards
> > > > Yang, Sheng
> > > >
> > > > > > static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm,
> > > > > >
> > > > > > struct kvm_assigned_msix_entry *entry)
> > > > > >
> > > > > > {
> > > > > >
> > > > > > @@ -688,6 +714,19 @@ static int
> > > > > > kvm_vm_ioctl_set_msix_entry(struct kvm *kvm,
> > > > > >
> > > > > > adev->guest_msix_entries[i].entry = entry->entry;
> > > > > > adev->guest_msix_entries[i].vector = entry->gsi;
> > > > > > adev->host_msix_entries[i].entry = entry->entry;
> > > > > >
> > > > > > + if ((entry->flags & KVM_MSIX_FLAG_MASK) &&
> > > > > > + !(adev->guest_msix_entries[i].flags &
> > > > > > + KVM_ASSIGNED_MSIX_MASK)) {
> > > > > > + adev->guest_msix_entries[i].flags |=
> > > > > > + KVM_ASSIGNED_MSIX_MASK;
> > > > > > + update_msix_mask(adev, i);
> > > > > > + } else if (!(entry->flags & KVM_MSIX_FLAG_MASK) &&
> > > > > > + (adev->guest_msix_entries[i].flags &
> > > > > > + KVM_ASSIGNED_MSIX_MASK)) {
> > > > > > + adev->guest_msix_entries[i].flags &=
> > > > > > + ~KVM_ASSIGNED_MSIX_MASK;
> > > > > > + update_msix_mask(adev, i);
> > > > > > + }
> > > > > >
> > > > > > break;
> > > > > >
> > > > > > }
> > > > > >
> > > > > > if (i == adev->entries_nr) {
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/3] KVM: Emulation MSI-X mask bits for assigned devices
2010-10-13 1:03 ` Sheng Yang
@ 2010-10-13 10:35 ` Michael S. Tsirkin
0 siblings, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2010-10-13 10:35 UTC (permalink / raw)
To: Sheng Yang; +Cc: Avi Kivity, Marcelo Tosatti, kvm
On Wed, Oct 13, 2010 at 09:03:11AM +0800, Sheng Yang wrote:
> On Wednesday 13 October 2010 02:28:59 Michael S. Tsirkin wrote:
> > On Tue, Oct 12, 2010 at 02:49:58PM +0800, Sheng Yang wrote:
> > > On Monday 11 October 2010 18:01:00 Michael S. Tsirkin wrote:
> > > > On Mon, Oct 11, 2010 at 05:28:30PM +0800, Sheng Yang wrote:
> > > > > On Sunday 03 October 2010 19:12:47 Michael S. Tsirkin wrote:
> > > > > > On Tue, Sep 28, 2010 at 05:44:10PM +0800, Sheng Yang wrote:
> > > > > > > This patch enable per-vector mask for assigned devices using
> > > > > > > MSI-X.
> > > > > > >
> > > > > > > Signed-off-by: Sheng Yang <sheng@linux.intel.com>
> > > > > >
> > > > > > I think I see an issue here, noted below. Some general comments:
> > > > > > - The mask bit seems broken for injecting interrupts from
> > > > > >
> > > > > > userspace (with interrupts/ioctls).
> > > > > > I think we must test it on injection path.
> > > > >
> > > > > I am not quite understand how it related to userspace interrupt
> > > > > injection here... This patch only cover assigned devices for now.
> > > >
> > > > Well, this is a kernel/userspace interface, if it's broken for
> > > > userspace injection now we'll have to go through pain to fix it in a
> > > > compatible way later when we want to use it for userspace injection.
> > > > You might want to ask why we want the kernel to support making
> > > > userspace-injected interrupts when userspace can just avoid injecting
> > > > them, and the answer would be that with irqfd the injection might be
> > > > handled in a separate process.
> > >
> > > OK, I've understood how it related to userspace interrupt injection. But
> > > I still can't see why the interface is broken...
> > >
> > > > We currently handle this by destroying irqfd when irq is masked,
> > > > an ioctl instead would be much faster.
> > > >
> > > > > > - We'll need a way to support the pending bit.
> > > > > >
> > > > > > Disabling interrupts will not let us do it.
> > > > >
> > > > > We may need a way to support pending bit, though I don't know which
> > > > > guest has used it... And we can still know if there is interrupt
> > > > > pending by check the real hardware's pending bit if it's necessary.
> > > >
> > > > That's what I'm saying: since instead of masking the vector in hardware
> > > > you disable irq in the APIC, the pending bit that we read from hardware
> > > > will not have the correct value.
> > >
> > > Are you sure? This disable_irq() has nothing to do with APIC. The disable
> > > callback in msi_chip didn't do anything but mark the IRQ status as
> > > IRQ_DISABLED, and the follow interrupt(if there are any) would be acked
> > > and masked, using mask callback in msi_chip.
> > >
> > > irq_to_desc() need to be exported for my initial version, in order to use
> > > mask callback. But later I think it would be clear and better if we use
> > > general IRQ function to do it. And I don't think the current solution
> > > would prevent us from reading hardware pending bits.
> >
> > Not sure, I'll try to look into code later, but just based on this
> > description:
> >
> > on real hardware:
> > mask
> > interrupt
> > results in both pending bit being set
> >
> > on guest with assigned device
> > mask
> > interrupt
> > results in mask being set but pending bit not set
> > (as interrupt was already sent)
> >
> > So if we try to look at pending bits looks like we'll miss
> > some interrupts. No?
>
> Yes, there is one interrupt would fail to set the pending bit. But I don't think
> it worth export/adding new interface for core irq handling functions now...
Yea, maybe. For assigned devices pending bit is not currently
implemented, so at least it's not a regression, and we can delay
fixing this until we have VFIO.
> Still,
> I don't think pending bit matters much. And with current code, we can still make
> pending bit works well on the most condition. I think that's good enough.
Hmm, my guess is it's probably better to have a constant 0 there than a
subtle race that will only trigger under load.
> --
> regards
> Yang, Sheng
>
> >
> > > --
> > > regards
> > > Yang, Sheng
> > >
> > > > If we fix this, pending bit handling can be done by userspace.
> > > >
> > > > > (And we haven't seen any problem by
> > > > > leaving the bit 0 so far, and it's not in this patch's scope.)
> > > >
> > > > I don't know about anyone using this, either, but the PCI spec does
> > > > require support of polling mode where the pending bit is polled instead
> > > > of interrupts. So yes, not a high priority to implement, but let's give
> > > > the way we intend to support this in the future some thought.
> > > >
> > > > > > > ---
> > > > > > >
> > > > > > > arch/x86/kvm/x86.c | 1 +
> > > > > > > include/linux/kvm.h | 9 ++++++++-
> > > > > > > include/linux/kvm_host.h | 1 +
> > > > > > > virt/kvm/assigned-dev.c | 39
> > > > >
> > > > > +++++++++++++++++++++++++++++++++++++++
> > > > >
> > > > > > > 4 files changed, 49 insertions(+), 1 deletions(-)
> > > > > > >
> > > > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > > > > index 8412c91..e6933e6 100644
> > > > > > > --- a/arch/x86/kvm/x86.c
> > > > > > > +++ b/arch/x86/kvm/x86.c
> > > > > > > @@ -1927,6 +1927,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> > > > > > >
> > > > > > > case KVM_CAP_DEBUGREGS:
> > > > > > > case KVM_CAP_X86_ROBUST_SINGLESTEP:
> > > > > > >
> > > > > > > case KVM_CAP_XSAVE:
> > > > > > > + case KVM_CAP_DEVICE_MSIX_MASK:
> > > > > > > r = 1;
> > > > > > > break;
> > > > > > >
> > > > > > > case KVM_CAP_COALESCED_MMIO:
> > > > > > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> > > > > > > index 919ae53..f2b7cdc 100644
> > > > > > > --- a/include/linux/kvm.h
> > > > > > > +++ b/include/linux/kvm.h
> > > > > > > @@ -540,6 +540,9 @@ struct kvm_ppc_pvinfo {
> > > > > > >
> > > > > > > #endif
> > > > > > > #define KVM_CAP_PPC_GET_PVINFO 57
> > > > > > > #define KVM_CAP_PPC_IRQ_LEVEL 58
> > > > > > >
> > > > > > > +#ifdef __KVM_HAVE_MSIX
> > > > > > > +#define KVM_CAP_DEVICE_MSIX_MASK 59
> > > > > > > +#endif
> > > > > > >
> > > > > > > #ifdef KVM_CAP_IRQ_ROUTING
> > > > > > >
> > > > > > > @@ -787,11 +790,15 @@ struct kvm_assigned_msix_nr {
> > > > > > >
> > > > > > > };
> > > > > > >
> > > > > > > #define KVM_MAX_MSIX_PER_DEV 256
> > > > > > >
> > > > > > > +
> > > > > > > +#define KVM_MSIX_FLAG_MASK 1
> > > > > > > +
> > > > > > >
> > > > > > > struct kvm_assigned_msix_entry {
> > > > > > >
> > > > > > > __u32 assigned_dev_id;
> > > > > > > __u32 gsi;
> > > > > > > __u16 entry; /* The index of entry in the MSI-X table */
> > > > > > >
> > > > > > > - __u16 padding[3];
> > > > > > > + __u16 flags;
> > > > > > > + __u16 padding[2];
> > > > > > >
> > > > > > > };
> > > > > > >
> > > > > > > #endif /* __LINUX_KVM_H */
> > > > > > >
> > > > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > > > > > index 0b89d00..a43405c 100644
> > > > > > > --- a/include/linux/kvm_host.h
> > > > > > > +++ b/include/linux/kvm_host.h
> > > > > > > @@ -415,6 +415,7 @@ struct kvm_irq_ack_notifier {
> > > > > > >
> > > > > > > };
> > > > > > >
> > > > > > > #define KVM_ASSIGNED_MSIX_PENDING 0x1
> > > > > > >
> > > > > > > +#define KVM_ASSIGNED_MSIX_MASK 0x2
> > > > > > >
> > > > > > > struct kvm_guest_msix_entry {
> > > > > > >
> > > > > > > u32 vector;
> > > > > > > u16 entry;
> > > > > > >
> > > > > > > diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
> > > > > > > index 7c98928..15b8c32 100644
> > > > > > > --- a/virt/kvm/assigned-dev.c
> > > > > > > +++ b/virt/kvm/assigned-dev.c
> > > > > > > @@ -17,6 +17,8 @@
> > > > > > >
> > > > > > > #include <linux/pci.h>
> > > > > > > #include <linux/interrupt.h>
> > > > > > > #include <linux/slab.h>
> > > > > > >
> > > > > > > +#include <linux/irqnr.h>
> > > > > > > +
> > > > > > >
> > > > > > > #include "irq.h"
> > > > > > >
> > > > > > > static struct kvm_assigned_dev_kernel
> > > > > > > *kvm_find_assigned_dev(struct list_head *head,
> > > > > > >
> > > > > > > @@ -666,6 +668,30 @@ msix_nr_out:
> > > > > > > return r;
> > > > > > >
> > > > > > > }
> > > > > > >
> > > > > > > +static void update_msix_mask(struct kvm_assigned_dev_kernel
> > > > > > > *assigned_dev, + int index)
> > > > > > > +{
> > > > > > > + int irq;
> > > > > > > +
> > > > > > > + if (!assigned_dev->dev->msix_enabled ||
> > > > > > > + !(assigned_dev->irq_requested_type &
> > > > > > > KVM_DEV_IRQ_HOST_MSIX)) + return;
> > > > > > > +
> > > > > > > + irq = assigned_dev->host_msix_entries[index].vector;
> > > > > > > +
> > > > > > > + ASSERT(irq != 0);
> > > > > > > +
> > > > > > > + if (assigned_dev->guest_msix_entries[index].flags &
> > > > > > > + KVM_ASSIGNED_MSIX_MASK)
> > > > > > > + disable_irq(irq);
> > > > > > > + else {
> > > > > > > + enable_irq(irq);
> > > > > > > + if (assigned_dev->guest_msix_entries[index].flags &
> > > > > > > + KVM_ASSIGNED_MSIX_PENDING)
> > > > > > > + schedule_work(&assigned_dev->interrupt_work);
> > > > > > > + }
> > > > > > > +}
> > > > > > > +
> > > > > >
> > > > > > What happens if guest masks an entry and then we hot-unplug the
> > > > > > device and remove it from guest? It looks like interrupt
> > > > > > will stay disabled?
> > > > >
> > > > > I don't think so. pci_disable_msix() which was called in hot-unplug
> > > > > path would recycle all IRQs used by the device. It should be the
> > > > > same as VM shutdown.
> > > > >
> > > > > Also before the IRQ was recycled, I believe the same dynamic IRQ
> > > > > wouldn't be used by other devices.
> > > > >
> > > > > --
> > > > > regards
> > > > > Yang, Sheng
> > > > >
> > > > > > > static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm,
> > > > > > >
> > > > > > > struct kvm_assigned_msix_entry *entry)
> > > > > > >
> > > > > > > {
> > > > > > >
> > > > > > > @@ -688,6 +714,19 @@ static int
> > > > > > > kvm_vm_ioctl_set_msix_entry(struct kvm *kvm,
> > > > > > >
> > > > > > > adev->guest_msix_entries[i].entry = entry->entry;
> > > > > > > adev->guest_msix_entries[i].vector = entry->gsi;
> > > > > > > adev->host_msix_entries[i].entry = entry->entry;
> > > > > > >
> > > > > > > + if ((entry->flags & KVM_MSIX_FLAG_MASK) &&
> > > > > > > + !(adev->guest_msix_entries[i].flags &
> > > > > > > + KVM_ASSIGNED_MSIX_MASK)) {
> > > > > > > + adev->guest_msix_entries[i].flags |=
> > > > > > > + KVM_ASSIGNED_MSIX_MASK;
> > > > > > > + update_msix_mask(adev, i);
> > > > > > > + } else if (!(entry->flags & KVM_MSIX_FLAG_MASK) &&
> > > > > > > + (adev->guest_msix_entries[i].flags &
> > > > > > > + KVM_ASSIGNED_MSIX_MASK)) {
> > > > > > > + adev->guest_msix_entries[i].flags &=
> > > > > > > + ~KVM_ASSIGNED_MSIX_MASK;
> > > > > > > + update_msix_mask(adev, i);
> > > > > > > + }
> > > > > > >
> > > > > > > break;
> > > > > > >
> > > > > > > }
> > > > > > >
> > > > > > > if (i == adev->entries_nr) {
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/3] KVM: Emulation MSI-X mask bits for assigned devices
2010-10-13 0:58 ` Sheng Yang
@ 2010-10-13 12:03 ` Michael S. Tsirkin
0 siblings, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2010-10-13 12:03 UTC (permalink / raw)
To: Sheng Yang; +Cc: Avi Kivity, Marcelo Tosatti, kvm
On Wed, Oct 13, 2010 at 08:58:27AM +0800, Sheng Yang wrote:
> On Wednesday 13 October 2010 02:30:30 Michael S. Tsirkin wrote:
> > On Tue, Oct 12, 2010 at 02:49:58PM +0800, Sheng Yang wrote:
> > > On Monday 11 October 2010 18:01:00 Michael S. Tsirkin wrote:
> > > > On Mon, Oct 11, 2010 at 05:28:30PM +0800, Sheng Yang wrote:
> > > > > On Sunday 03 October 2010 19:12:47 Michael S. Tsirkin wrote:
> > > > > > On Tue, Sep 28, 2010 at 05:44:10PM +0800, Sheng Yang wrote:
> > > > > > > This patch enable per-vector mask for assigned devices using
> > > > > > > MSI-X.
> > > > > > >
> > > > > > > Signed-off-by: Sheng Yang <sheng@linux.intel.com>
> > > > > >
> > > > > > I think I see an issue here, noted below. Some general comments:
> > > > > > - The mask bit seems broken for injecting interrupts from
> > > > > >
> > > > > > userspace (with interrupts/ioctls).
> > > > > > I think we must test it on injection path.
> > > > >
> > > > > I am not quite understand how it related to userspace interrupt
> > > > > injection here... This patch only cover assigned devices for now.
> > > >
> > > > Well, this is a kernel/userspace interface, if it's broken for
> > > > userspace injection now we'll have to go through pain to fix it in a
> > > > compatible way later when we want to use it for userspace injection.
> > > > You might want to ask why we want the kernel to support making
> > > > userspace-injected interrupts when userspace can just avoid injecting
> > > > them, and the answer would be that with irqfd the injection might be
> > > > handled in a separate process.
> > >
> > > OK, I've understood how it related to userspace interrupt injection. But
> > > I still can't see why the interface is broken...
> >
> > That's easy to explain: mask vector, then inject interrupt with an ioctl or
> > irqfd: it still goes in even though it's masked.
>
> Lol... Obviously we have different definitions of the word "broken".
>
> To my understanding, first, this interface is not new, just a part of old interface
> for assigned devices; and it covered current situation well. Second, this
> interface didn't cover your situation which doesn't exist currently. Third, this
> interface didn't prevent you from implementing solution for userspace interrupt
> injection mask in the future. So I don't understand how can you say it's "broken".
> Currently I think this interface is direct and elegant.
Yes, I didn't make this clear: no problem with the interface:
just add a way to mask the vector. It is just that the implementation
proposed only works for interrupts used by assigned devices. I would like
for it to work for irqfd as well. Your patch would not work for this case.
>
> And could you elaborate your alternative way of doing this? Of course I would
> consider it.
What I was trying to suggest is one of 2 options:
1. mask the vector in the device.
Basically we just want an API to force mask directly
instead of waiting for an interrupt.
As you say this needs core linux changes so it might be harder,
but the advantage is that we follow the guest closely.
2. assuming 1 is just too hard to implement, here's option 2:
keep existing patch, but make irqfd work
Have an ioctl to set a flag in the routing table in KVM,
consult this flag when you are injecting the interrupt,
if masked set pending bit. When unmasked test and clear pending bit and
reinject if it was set.
Hope this helps.
> --
> regards
> Yang, Sheng
^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2010-10-13 12:09 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-28 9:44 [PATCH 0/3] Emulate MSI-X mask bits for assigned devices Sheng Yang
2010-09-28 9:44 ` [PATCH 1/3] KVM: Emulation " Sheng Yang
2010-09-28 13:36 ` Michael S. Tsirkin
2010-09-29 1:17 ` Sheng Yang
2010-09-30 16:25 ` Marcelo Tosatti
2010-09-29 8:36 ` Avi Kivity
2010-09-30 5:41 ` Sheng Yang
2010-10-03 13:03 ` Avi Kivity
2010-10-11 9:32 ` Sheng Yang
2010-09-30 16:18 ` Marcelo Tosatti
2010-10-11 9:49 ` Sheng Yang
2010-10-11 17:11 ` Marcelo Tosatti
2010-10-03 11:12 ` Michael S. Tsirkin
2010-10-11 9:28 ` Sheng Yang
2010-10-11 10:01 ` Michael S. Tsirkin
2010-10-12 6:49 ` Sheng Yang
2010-10-12 18:28 ` Michael S. Tsirkin
2010-10-13 1:03 ` Sheng Yang
2010-10-13 10:35 ` Michael S. Tsirkin
2010-10-12 18:30 ` Michael S. Tsirkin
2010-10-13 0:58 ` Sheng Yang
2010-10-13 12:03 ` Michael S. Tsirkin
2010-09-28 9:44 ` [PATCH 2/3] qemu-kvm: device assignment: Some clean up for MSI-X code Sheng Yang
2010-09-28 9:44 ` [PATCH 3/3] qemu-kvm: device assignment: emulate MSI-X mask bits Sheng Yang
2010-09-30 16:44 ` [PATCH 0/3] Emulate MSI-X mask bits for assigned devices Marcelo Tosatti
2010-10-11 9:34 ` Sheng Yang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox