qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [v15 00/15] vfio-pci: pass the aer error to guest
@ 2016-01-05  1:20 Cao jin
  2016-01-05  1:20 ` [Qemu-devel] [v15 01/15] vfio: extract vfio_get_hot_reset_info as a single function Cao jin
                   ` (14 more replies)
  0 siblings, 15 replies; 20+ messages in thread
From: Cao jin @ 2016-01-05  1:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: chen.fan.fnst, alex.williamson, mst

From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>

For now, for vfio pci passthough devices when qemu receives
an error from host aer report, currentlly just terminate the guest,
but usually user want to know what error occurred but stopping the
guest, so this patches add aer capability support for vfio device,
and pass the error to guest, and have guest driver to recover
from the error.

v14-v15:
   1. add device hot reset callback
   2. add bus_in_reset for vfio device to avoid multi do host bus reset

v13-v14:
   1. for multifunction device, requiring all functions enable AER.(9/13)
   2. due to all affected functions receive error signal, ignore no
      error occurred function. (12/13)

v12-v13:
   1. since support multifuncion hotplug, here add callback to enable aer.
   2. add pci device pre+post reset for aer host reset.

Chen Fan (15):
  vfio: extract vfio_get_hot_reset_info as a single function
  vfio: squeeze out vfio_pci_do_hot_reset for support bus reset
  pcie: modify the capability size assert
  vfio: make the 4 bytes aligned for capability size
  vfio: add pcie extanded capability support
  aer: impove pcie_aer_init to support vfio device
  vfio: add aer support for vfio device
  vfio: add check host bus reset is support or not
  add check reset mechanism when hotplug vfio device
  pci: Introduce device hot reset
  vfio: add hot reset callback
  vfio: add bus in reset flag
  pcie_aer: expose pcie_aer_msg() interface
  vfio-pci: pass the aer error to guest
  vfio: add 'aer' property to expose aercap

 hw/core/qdev.c                     |  24 ++
 hw/pci-bridge/ioh3420.c            |   2 +-
 hw/pci-bridge/xio3130_downstream.c |   2 +-
 hw/pci-bridge/xio3130_upstream.c   |   2 +-
 hw/pci/pci.c                       |  29 ++
 hw/pci/pci_bridge.c                |   2 +-
 hw/pci/pcie.c                      |   2 +-
 hw/pci/pcie_aer.c                  |   6 +-
 hw/vfio/pci.c                      | 622 +++++++++++++++++++++++++++++++++----
 hw/vfio/pci.h                      |   7 +
 include/hw/pci/pci_bus.h           |   5 +
 include/hw/pci/pcie_aer.h          |   3 +-
 include/hw/qdev-core.h             |   3 +
 include/hw/vfio/vfio-common.h      |   1 +
 14 files changed, 637 insertions(+), 73 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [v15 01/15] vfio: extract vfio_get_hot_reset_info as a single function
  2016-01-05  1:20 [Qemu-devel] [v15 00/15] vfio-pci: pass the aer error to guest Cao jin
@ 2016-01-05  1:20 ` Cao jin
  2016-01-05  1:20 ` [Qemu-devel] [v15 02/15] vfio: squeeze out vfio_pci_do_hot_reset for support bus reset Cao jin
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Cao jin @ 2016-01-05  1:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: chen.fan.fnst, alex.williamson, mst

From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>

the function is used to get affected devices by bus reset.
so here extract it, and can used for aer soon.

Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
---
 hw/vfio/pci.c | 66 +++++++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 48 insertions(+), 18 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 1fb868c..efcd3cd 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1654,6 +1654,51 @@ static void vfio_check_af_flr(VFIOPCIDevice *vdev, uint8_t pos)
     }
 }
 
+/*
+ * return negative with errno, return 0 on success.
+ * if success, the point of ret_info fill with the affected device reset info.
+ *
+ */
+static int vfio_get_hot_reset_info(VFIOPCIDevice *vdev,
+                                   struct vfio_pci_hot_reset_info **ret_info)
+{
+    struct vfio_pci_hot_reset_info *info;
+    int ret, count;
+
+    *ret_info = NULL;
+
+    info = g_malloc0(sizeof(*info));
+    info->argsz = sizeof(*info);
+
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
+    if (ret && errno != ENOSPC) {
+        ret = -errno;
+        goto error;
+    }
+
+    count = info->count;
+
+    info = g_realloc(info, sizeof(*info) +
+                     (count * sizeof(struct vfio_pci_dependent_device)));
+    info->argsz = sizeof(*info) +
+                  (count * sizeof(struct vfio_pci_dependent_device));
+
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
+    if (ret) {
+        ret = -errno;
+        error_report("vfio: hot reset info failed: %m");
+        goto error;
+    }
+
+    *ret_info = info;
+    info = NULL;
+
+    return 0;
+error:
+    g_free(info);
+    return ret;
+}
+
 static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos)
 {
     PCIDevice *pdev = &vdev->pdev;
@@ -1793,7 +1838,7 @@ static bool vfio_pci_host_match(PCIHostDeviceAddress *host1,
 static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
 {
     VFIOGroup *group;
-    struct vfio_pci_hot_reset_info *info;
+    struct vfio_pci_hot_reset_info *info = NULL;
     struct vfio_pci_dependent_device *devices;
     struct vfio_pci_hot_reset *reset;
     int32_t *fds;
@@ -1805,12 +1850,8 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
     vfio_pci_pre_reset(vdev);
     vdev->vbasedev.needs_reset = false;
 
-    info = g_malloc0(sizeof(*info));
-    info->argsz = sizeof(*info);
-
-    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
-    if (ret && errno != ENOSPC) {
-        ret = -errno;
+    ret = vfio_get_hot_reset_info(vdev, &info);
+    if (ret) {
         if (!vdev->has_pm_reset) {
             error_report("vfio: Cannot reset device %04x:%02x:%02x.%x, "
                          "no available reset mechanism.", vdev->host.domain,
@@ -1819,18 +1860,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
         goto out_single;
     }
 
-    count = info->count;
-    info = g_realloc(info, sizeof(*info) + (count * sizeof(*devices)));
-    info->argsz = sizeof(*info) + (count * sizeof(*devices));
     devices = &info->devices[0];
-
-    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
-    if (ret) {
-        ret = -errno;
-        error_report("vfio: hot reset info failed: %m");
-        goto out_single;
-    }
-
     trace_vfio_pci_hot_reset_has_dep_devices(vdev->vbasedev.name);
 
     /* Verify that we have all the groups required */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [v15 02/15] vfio: squeeze out vfio_pci_do_hot_reset for support bus reset
  2016-01-05  1:20 [Qemu-devel] [v15 00/15] vfio-pci: pass the aer error to guest Cao jin
  2016-01-05  1:20 ` [Qemu-devel] [v15 01/15] vfio: extract vfio_get_hot_reset_info as a single function Cao jin
@ 2016-01-05  1:20 ` Cao jin
  2016-01-05  1:20 ` [Qemu-devel] [v15 03/15] pcie: modify the capability size assert Cao jin
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Cao jin @ 2016-01-05  1:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: chen.fan.fnst, alex.williamson, mst

From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>

squeeze out vfio_pci_do_hot_reset to do host bus reset when AER recovery.

Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
---
 hw/vfio/pci.c | 75 +++++++++++++++++++++++++++++++++++------------------------
 1 file changed, 44 insertions(+), 31 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index efcd3cd..a63cf85 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1699,6 +1699,48 @@ error:
     return ret;
 }
 
+static int vfio_pci_do_hot_reset(VFIOPCIDevice *vdev,
+                                 struct vfio_pci_hot_reset_info *info)
+{
+    VFIOGroup *group;
+    struct vfio_pci_hot_reset *reset;
+    int32_t *fds;
+    int ret, i, count;
+    struct vfio_pci_dependent_device *devices;
+
+    /* Determine how many group fds need to be passed */
+    count = 0;
+    devices = &info->devices[0];
+    QLIST_FOREACH(group, &vfio_group_list, next) {
+        for (i = 0; i < info->count; i++) {
+            if (group->groupid == devices[i].group_id) {
+                count++;
+                break;
+            }
+        }
+    }
+
+    reset = g_malloc0(sizeof(*reset) + (count * sizeof(*fds)));
+    reset->argsz = sizeof(*reset) + (count * sizeof(*fds));
+    fds = &reset->group_fds[0];
+
+    /* Fill in group fds */
+    QLIST_FOREACH(group, &vfio_group_list, next) {
+        for (i = 0; i < info->count; i++) {
+            if (group->groupid == devices[i].group_id) {
+                fds[reset->count++] = group->fd;
+                break;
+            }
+        }
+    }
+
+    /* Bus reset! */
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_PCI_HOT_RESET, reset);
+    g_free(reset);
+
+    return ret;
+}
+
 static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos)
 {
     PCIDevice *pdev = &vdev->pdev;
@@ -1840,9 +1882,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
     VFIOGroup *group;
     struct vfio_pci_hot_reset_info *info = NULL;
     struct vfio_pci_dependent_device *devices;
-    struct vfio_pci_hot_reset *reset;
-    int32_t *fds;
-    int ret, i, count;
+    int ret, i;
     bool multi = false;
 
     trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi");
@@ -1921,34 +1961,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
         goto out_single;
     }
 
-    /* Determine how many group fds need to be passed */
-    count = 0;
-    QLIST_FOREACH(group, &vfio_group_list, next) {
-        for (i = 0; i < info->count; i++) {
-            if (group->groupid == devices[i].group_id) {
-                count++;
-                break;
-            }
-        }
-    }
-
-    reset = g_malloc0(sizeof(*reset) + (count * sizeof(*fds)));
-    reset->argsz = sizeof(*reset) + (count * sizeof(*fds));
-    fds = &reset->group_fds[0];
-
-    /* Fill in group fds */
-    QLIST_FOREACH(group, &vfio_group_list, next) {
-        for (i = 0; i < info->count; i++) {
-            if (group->groupid == devices[i].group_id) {
-                fds[reset->count++] = group->fd;
-                break;
-            }
-        }
-    }
-
-    /* Bus reset! */
-    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_PCI_HOT_RESET, reset);
-    g_free(reset);
+    ret = vfio_pci_do_hot_reset(vdev, info);
 
     trace_vfio_pci_hot_reset_result(vdev->vbasedev.name,
                                     ret ? "%m" : "Success");
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [v15 03/15] pcie: modify the capability size assert
  2016-01-05  1:20 [Qemu-devel] [v15 00/15] vfio-pci: pass the aer error to guest Cao jin
  2016-01-05  1:20 ` [Qemu-devel] [v15 01/15] vfio: extract vfio_get_hot_reset_info as a single function Cao jin
  2016-01-05  1:20 ` [Qemu-devel] [v15 02/15] vfio: squeeze out vfio_pci_do_hot_reset for support bus reset Cao jin
@ 2016-01-05  1:20 ` Cao jin
  2016-01-05  1:20 ` [Qemu-devel] [v15 04/15] vfio: make the 4 bytes aligned for capability size Cao jin
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Cao jin @ 2016-01-05  1:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: chen.fan.fnst, alex.williamson, mst

From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>

 Device's Offset and size can reach PCIE_CONFIG_SPACE_SIZE,
 fix the corresponding assert.

Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/pci/pcie.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 0eab29d..8f4c0e5 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -607,7 +607,7 @@ void pcie_add_capability(PCIDevice *dev,
 
     assert(offset >= PCI_CONFIG_SPACE_SIZE);
     assert(offset < offset + size);
-    assert(offset + size < PCIE_CONFIG_SPACE_SIZE);
+    assert(offset + size <= PCIE_CONFIG_SPACE_SIZE);
     assert(size >= 8);
     assert(pci_is_express(dev));
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [v15 04/15] vfio: make the 4 bytes aligned for capability size
  2016-01-05  1:20 [Qemu-devel] [v15 00/15] vfio-pci: pass the aer error to guest Cao jin
                   ` (2 preceding siblings ...)
  2016-01-05  1:20 ` [Qemu-devel] [v15 03/15] pcie: modify the capability size assert Cao jin
@ 2016-01-05  1:20 ` Cao jin
  2016-01-05  1:20 ` [Qemu-devel] [v15 05/15] vfio: add pcie extanded capability support Cao jin
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Cao jin @ 2016-01-05  1:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: chen.fan.fnst, alex.williamson, mst

From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>

this function search the capability from the end, the last
size should 0x100 - pos, not 0xff - pos.

Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
---
 hw/vfio/pci.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index a63cf85..288f2c7 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1469,7 +1469,8 @@ static void vfio_unmap_bars(VFIOPCIDevice *vdev)
  */
 static uint8_t vfio_std_cap_max_size(PCIDevice *pdev, uint8_t pos)
 {
-    uint8_t tmp, next = 0xff;
+    uint8_t tmp;
+    uint16_t next = PCI_CONFIG_SPACE_SIZE;
 
     for (tmp = pdev->config[PCI_CAPABILITY_LIST]; tmp;
          tmp = pdev->config[tmp + 1]) {
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [v15 05/15] vfio: add pcie extanded capability support
  2016-01-05  1:20 [Qemu-devel] [v15 00/15] vfio-pci: pass the aer error to guest Cao jin
                   ` (3 preceding siblings ...)
  2016-01-05  1:20 ` [Qemu-devel] [v15 04/15] vfio: make the 4 bytes aligned for capability size Cao jin
@ 2016-01-05  1:20 ` Cao jin
  2016-01-05  1:20 ` [Qemu-devel] [v15 06/15] aer: impove pcie_aer_init to support vfio device Cao jin
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Cao jin @ 2016-01-05  1:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: chen.fan.fnst, alex.williamson, mst

From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>

For vfio pcie device, we could expose the extended capability on
PCIE bus. in order to avoid config space broken, we introduce
a copy config for parsing extended caps. and rebuild the pcie
extended config space.

Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
---
 hw/vfio/pci.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 69 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 288f2c7..64b0867 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1482,6 +1482,21 @@ static uint8_t vfio_std_cap_max_size(PCIDevice *pdev, uint8_t pos)
     return next - pos;
 }
 
+
+static uint16_t vfio_ext_cap_max_size(const uint8_t *config, uint16_t pos)
+{
+    uint16_t tmp, next = PCIE_CONFIG_SPACE_SIZE;
+
+    for (tmp = PCI_CONFIG_SPACE_SIZE; tmp;
+        tmp = PCI_EXT_CAP_NEXT(pci_get_long(config + tmp))) {
+        if (tmp > pos && tmp < next) {
+            next = tmp;
+        }
+    }
+
+    return next - pos;
+}
+
 static void vfio_set_word_bits(uint8_t *buf, uint16_t val, uint16_t mask)
 {
     pci_set_word(buf, (pci_get_word(buf) & ~mask) | val);
@@ -1817,16 +1832,69 @@ static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos)
     return 0;
 }
 
+static int vfio_add_ext_cap(VFIOPCIDevice *vdev)
+{
+    PCIDevice *pdev = &vdev->pdev;
+    uint32_t header;
+    uint16_t cap_id, next, size;
+    uint8_t cap_ver;
+    uint8_t *config;
+
+    /*
+     * In order to avoid breaking config space, create a copy to
+     * use for parsing extended capabilities.
+     */
+    config = g_memdup(pdev->config, vdev->config_size);
+
+    for (next = PCI_CONFIG_SPACE_SIZE; next;
+         next = PCI_EXT_CAP_NEXT(pci_get_long(config + next))) {
+        header = pci_get_long(config + next);
+        cap_id = PCI_EXT_CAP_ID(header);
+        cap_ver = PCI_EXT_CAP_VER(header);
+
+        /*
+         * If it becomes important to configure extended capabilities to their
+         * actual size, use this as the default when it's something we don't
+         * recognize. Since QEMU doesn't actually handle many of the config
+         * accesses, exact size doesn't seem worthwhile.
+         */
+        size = vfio_ext_cap_max_size(config, next);
+
+        pcie_add_capability(pdev, cap_id, cap_ver, next, size);
+        pci_set_long(dev->config + next, PCI_EXT_CAP(cap_id, cap_ver, 0));
+
+        /* Use emulated next pointer to allow dropping extended caps */
+        pci_long_test_and_set_mask(vdev->emulated_config_bits + next,
+                                   PCI_EXT_CAP_NEXT_MASK);
+    }
+
+    g_free(config);
+    return 0;
+}
+
 static int vfio_add_capabilities(VFIOPCIDevice *vdev)
 {
     PCIDevice *pdev = &vdev->pdev;
+    int ret;
 
     if (!(pdev->config[PCI_STATUS] & PCI_STATUS_CAP_LIST) ||
         !pdev->config[PCI_CAPABILITY_LIST]) {
         return 0; /* Nothing to add */
     }
 
-    return vfio_add_std_cap(vdev, pdev->config[PCI_CAPABILITY_LIST]);
+    ret = vfio_add_std_cap(vdev, pdev->config[PCI_CAPABILITY_LIST]);
+    if (ret) {
+        return ret;
+    }
+
+    /* on PCI bus, it doesn't make sense to expose extended capabilities. */
+    if (!pci_is_express(pdev) ||
+        !pci_bus_is_express(pdev->bus) ||
+        !pci_get_long(pdev->config + PCI_CONFIG_SPACE_SIZE)) {
+        return 0;
+    }
+
+    return vfio_add_ext_cap(vdev);
 }
 
 static void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [v15 06/15] aer: impove pcie_aer_init to support vfio device
  2016-01-05  1:20 [Qemu-devel] [v15 00/15] vfio-pci: pass the aer error to guest Cao jin
                   ` (4 preceding siblings ...)
  2016-01-05  1:20 ` [Qemu-devel] [v15 05/15] vfio: add pcie extanded capability support Cao jin
@ 2016-01-05  1:20 ` Cao jin
  2016-01-05  1:20 ` [Qemu-devel] [v15 07/15] vfio: add aer support for " Cao jin
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Cao jin @ 2016-01-05  1:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: chen.fan.fnst, alex.williamson, mst

From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>

pcie_aer_init was used to emulate an aer capability for pcie device,
but for vfio device, the aer config space size is mutable and is not
always equal to PCI_ERR_SIZEOF(0x48). it depends on where the TLP Prefix
register required, so here we add a size argument.

Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/pci-bridge/ioh3420.c            | 2 +-
 hw/pci-bridge/xio3130_downstream.c | 2 +-
 hw/pci-bridge/xio3130_upstream.c   | 2 +-
 hw/pci/pcie_aer.c                  | 4 ++--
 include/hw/pci/pcie_aer.h          | 2 +-
 5 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/pci-bridge/ioh3420.c b/hw/pci-bridge/ioh3420.c
index cce2fdd..4d9cd3f 100644
--- a/hw/pci-bridge/ioh3420.c
+++ b/hw/pci-bridge/ioh3420.c
@@ -129,7 +129,7 @@ static int ioh3420_initfn(PCIDevice *d)
         goto err_pcie_cap;
     }
     pcie_cap_root_init(d);
-    rc = pcie_aer_init(d, IOH_EP_AER_OFFSET);
+    rc = pcie_aer_init(d, IOH_EP_AER_OFFSET, PCI_ERR_SIZEOF);
     if (rc < 0) {
         goto err;
     }
diff --git a/hw/pci-bridge/xio3130_downstream.c b/hw/pci-bridge/xio3130_downstream.c
index b3a6479..9737041 100644
--- a/hw/pci-bridge/xio3130_downstream.c
+++ b/hw/pci-bridge/xio3130_downstream.c
@@ -92,7 +92,7 @@ static int xio3130_downstream_initfn(PCIDevice *d)
         goto err_pcie_cap;
     }
     pcie_cap_arifwd_init(d);
-    rc = pcie_aer_init(d, XIO3130_AER_OFFSET);
+    rc = pcie_aer_init(d, XIO3130_AER_OFFSET, PCI_ERR_SIZEOF);
     if (rc < 0) {
         goto err;
     }
diff --git a/hw/pci-bridge/xio3130_upstream.c b/hw/pci-bridge/xio3130_upstream.c
index eada582..4d7f894 100644
--- a/hw/pci-bridge/xio3130_upstream.c
+++ b/hw/pci-bridge/xio3130_upstream.c
@@ -81,7 +81,7 @@ static int xio3130_upstream_initfn(PCIDevice *d)
     }
     pcie_cap_flr_init(d);
     pcie_cap_deverr_init(d);
-    rc = pcie_aer_init(d, XIO3130_AER_OFFSET);
+    rc = pcie_aer_init(d, XIO3130_AER_OFFSET, PCI_ERR_SIZEOF);
     if (rc < 0) {
         goto err;
     }
diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c
index 98d2c18..45f351b 100644
--- a/hw/pci/pcie_aer.c
+++ b/hw/pci/pcie_aer.c
@@ -94,12 +94,12 @@ static void aer_log_clear_all_err(PCIEAERLog *aer_log)
     aer_log->log_num = 0;
 }
 
-int pcie_aer_init(PCIDevice *dev, uint16_t offset)
+int pcie_aer_init(PCIDevice *dev, uint16_t offset, uint16_t size)
 {
     PCIExpressDevice *exp;
 
     pcie_add_capability(dev, PCI_EXT_CAP_ID_ERR, PCI_ERR_VER,
-                        offset, PCI_ERR_SIZEOF);
+                        offset, size);
     exp = &dev->exp;
     exp->aer_cap = offset;
 
diff --git a/include/hw/pci/pcie_aer.h b/include/hw/pci/pcie_aer.h
index 2fb8388..156acb0 100644
--- a/include/hw/pci/pcie_aer.h
+++ b/include/hw/pci/pcie_aer.h
@@ -87,7 +87,7 @@ struct PCIEAERErr {
 
 extern const VMStateDescription vmstate_pcie_aer_log;
 
-int pcie_aer_init(PCIDevice *dev, uint16_t offset);
+int pcie_aer_init(PCIDevice *dev, uint16_t offset, uint16_t size);
 void pcie_aer_exit(PCIDevice *dev);
 void pcie_aer_write_config(PCIDevice *dev,
                            uint32_t addr, uint32_t val, int len);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [v15 07/15] vfio: add aer support for vfio device
  2016-01-05  1:20 [Qemu-devel] [v15 00/15] vfio-pci: pass the aer error to guest Cao jin
                   ` (5 preceding siblings ...)
  2016-01-05  1:20 ` [Qemu-devel] [v15 06/15] aer: impove pcie_aer_init to support vfio device Cao jin
@ 2016-01-05  1:20 ` Cao jin
  2016-01-05  1:20 ` [Qemu-devel] [v15 08/15] vfio: add check host bus reset is support or not Cao jin
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Cao jin @ 2016-01-05  1:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: chen.fan.fnst, alex.williamson, mst

From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>

Calling pcie_aer_init to initilize aer related registers for
vfio device, then reload physical related registers to expose
device capability.

Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
---
 hw/vfio/pci.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 hw/vfio/pci.h |  3 +++
 2 files changed, 81 insertions(+), 3 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 64b0867..38b0aa5 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1832,6 +1832,62 @@ static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos)
     return 0;
 }
 
+static int vfio_setup_aer(VFIOPCIDevice *vdev, uint8_t cap_ver,
+                          int pos, uint16_t size)
+{
+    PCIDevice *pdev = &vdev->pdev;
+    PCIDevice *dev_iter;
+    uint8_t type;
+    uint32_t errcap;
+
+    if (!(vdev->features & VFIO_FEATURE_ENABLE_AER)) {
+        pcie_add_capability(pdev, PCI_EXT_CAP_ID_ERR,
+                            cap_ver, pos, size);
+        return 0;
+    }
+
+    dev_iter = pci_bridge_get_device(pdev->bus);
+    if (!dev_iter) {
+        goto error;
+    }
+
+    while (dev_iter) {
+        type = pcie_cap_get_type(dev_iter);
+        if ((type != PCI_EXP_TYPE_ROOT_PORT &&
+             type != PCI_EXP_TYPE_UPSTREAM &&
+             type != PCI_EXP_TYPE_DOWNSTREAM)) {
+            goto error;
+        }
+
+        if (!dev_iter->exp.aer_cap) {
+            goto error;
+        }
+
+        dev_iter = pci_bridge_get_device(dev_iter->bus);
+    }
+
+    errcap = vfio_pci_read_config(pdev, pos + PCI_ERR_CAP, 4);
+    /*
+     * The ability to record multiple headers is depending on
+     * the state of the Multiple Header Recording Capable bit and
+     * enabled by the Multiple Header Recording Enable bit.
+     */
+    if ((errcap & PCI_ERR_CAP_MHRC) &&
+        (errcap & PCI_ERR_CAP_MHRE)) {
+        pdev->exp.aer_log.log_max = PCIE_AER_LOG_MAX_DEFAULT;
+    } else {
+        pdev->exp.aer_log.log_max = 0;
+    }
+
+    pcie_cap_deverr_init(pdev);
+    return pcie_aer_init(pdev, pos, size);
+
+error:
+    error_report("vfio: Unable to enable AER for device %s, parent bus "
+                 "does not support AER signaling", vdev->vbasedev.name);
+    return -1;
+}
+
 static int vfio_add_ext_cap(VFIOPCIDevice *vdev)
 {
     PCIDevice *pdev = &vdev->pdev;
@@ -1839,6 +1895,7 @@ static int vfio_add_ext_cap(VFIOPCIDevice *vdev)
     uint16_t cap_id, next, size;
     uint8_t cap_ver;
     uint8_t *config;
+    int ret = 0;
 
     /*
      * In order to avoid breaking config space, create a copy to
@@ -1860,16 +1917,29 @@ static int vfio_add_ext_cap(VFIOPCIDevice *vdev)
          */
         size = vfio_ext_cap_max_size(config, next);
 
-        pcie_add_capability(pdev, cap_id, cap_ver, next, size);
-        pci_set_long(dev->config + next, PCI_EXT_CAP(cap_id, cap_ver, 0));
+        switch (cap_id) {
+        case PCI_EXT_CAP_ID_ERR:
+            ret = vfio_setup_aer(vdev, cap_ver, next, size);
+            break;
+        default:
+            pcie_add_capability(pdev, cap_id, cap_ver, next, size);
+            break;
+        }
+
+        if (ret) {
+            goto out;
+        }
+
+        pci_set_long(pdev->config + next, PCI_EXT_CAP(cap_id, cap_ver, 0));
 
         /* Use emulated next pointer to allow dropping extended caps */
         pci_long_test_and_set_mask(vdev->emulated_config_bits + next,
                                    PCI_EXT_CAP_NEXT_MASK);
     }
 
+out:
     g_free(config);
-    return 0;
+    return ret;
 }
 
 static int vfio_add_capabilities(VFIOPCIDevice *vdev)
@@ -2624,6 +2694,11 @@ static int vfio_initfn(PCIDevice *pdev)
         goto out_teardown;
     }
 
+    if ((vdev->features & VFIO_FEATURE_ENABLE_AER) &&
+        !pdev->exp.aer_cap) {
+        goto out_teardown;
+    }
+
     /* QEMU emulates all of MSI & MSIX */
     if (pdev->cap_present & QEMU_PCI_CAP_MSIX) {
         memset(vdev->emulated_config_bits + pdev->msix_cap, 0xff,
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index f004d52..48c1f69 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -15,6 +15,7 @@
 #include "qemu-common.h"
 #include "exec/memory.h"
 #include "hw/pci/pci.h"
+#include "hw/pci/pci_bridge.h"
 #include "hw/vfio/vfio-common.h"
 #include "qemu/event_notifier.h"
 #include "qemu/queue.h"
@@ -127,6 +128,8 @@ typedef struct VFIOPCIDevice {
 #define VFIO_FEATURE_ENABLE_VGA (1 << VFIO_FEATURE_ENABLE_VGA_BIT)
 #define VFIO_FEATURE_ENABLE_REQ_BIT 1
 #define VFIO_FEATURE_ENABLE_REQ (1 << VFIO_FEATURE_ENABLE_REQ_BIT)
+#define VFIO_FEATURE_ENABLE_AER_BIT 2
+#define VFIO_FEATURE_ENABLE_AER (1 << VFIO_FEATURE_ENABLE_AER_BIT)
     int32_t bootindex;
     uint8_t pm_cap;
     bool has_vga;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [v15 08/15] vfio: add check host bus reset is support or not
  2016-01-05  1:20 [Qemu-devel] [v15 00/15] vfio-pci: pass the aer error to guest Cao jin
                   ` (6 preceding siblings ...)
  2016-01-05  1:20 ` [Qemu-devel] [v15 07/15] vfio: add aer support for " Cao jin
@ 2016-01-05  1:20 ` Cao jin
  2016-01-05  1:20 ` [Qemu-devel] [v15 09/15] add check reset mechanism when hotplug vfio device Cao jin
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Cao jin @ 2016-01-05  1:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: chen.fan.fnst, alex.williamson, mst

From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>

when init vfio devices done, we should test all the devices supported
aer whether conflict with others. For each one, get the hot reset
info for the affected device list.  For each affected device, all
should attach to the VM and on/below the same bus. also, we should test
all of the non-AER supporting vfio-pci devices on or below the target
bus to verify they have a reset mechanism.

Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
---
 hw/vfio/pci.c | 238 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 hw/vfio/pci.h |   1 +
 2 files changed, 232 insertions(+), 7 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 38b0aa5..16ab0e3 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1832,6 +1832,218 @@ static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos)
     return 0;
 }
 
+static bool vfio_pci_host_slot_match(PCIHostDeviceAddress *host1,
+                                     PCIHostDeviceAddress *host2)
+{
+    return (host1->domain == host2->domain && host1->bus == host2->bus &&
+            host1->slot == host2->slot);
+}
+
+static bool vfio_pci_host_match(PCIHostDeviceAddress *host1,
+                                PCIHostDeviceAddress *host2)
+{
+    return (vfio_pci_host_slot_match(host1, host2) &&
+            host1->function == host2->function);
+}
+
+struct VFIODeviceFind {
+    PCIDevice *pdev;
+    bool found;
+};
+
+static void vfio_check_device_noreset(PCIBus *bus, PCIDevice *pdev,
+                                      void *opaque)
+{
+    DeviceState *dev = DEVICE(pdev);
+    DeviceClass *dc = DEVICE_GET_CLASS(dev);
+    VFIOPCIDevice *vdev;
+    struct VFIODeviceFind *find = opaque;
+
+    if (find->found) {
+        return;
+    }
+
+    if (!object_dynamic_cast(OBJECT(dev), "vfio-pci")) {
+        if (!dc->reset) {
+            goto found;
+        }
+        return;
+    }
+    vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
+    if (!(vdev->features & VFIO_FEATURE_ENABLE_AER) &&
+        !vdev->vbasedev.reset_works) {
+        goto found;
+    }
+
+    return;
+found:
+    find->pdev = pdev;
+    find->found = true;
+}
+
+static void device_find(PCIBus *bus, PCIDevice *pdev, void *opaque)
+{
+    struct VFIODeviceFind *find = opaque;
+
+    if (find->found) {
+        return;
+    }
+
+    if (pdev == find->pdev) {
+        find->found = true;
+    }
+}
+
+static int vfio_check_host_bus_reset(VFIOPCIDevice *vdev)
+{
+    PCIBus *bus = vdev->pdev.bus;
+    struct vfio_pci_hot_reset_info *info = NULL;
+    struct vfio_pci_dependent_device *devices;
+    VFIOGroup *group;
+    struct VFIODeviceFind find;
+    int ret, i;
+
+    ret = vfio_get_hot_reset_info(vdev, &info);
+    if (ret) {
+        error_report("vfio: Cannot enable AER for device %s,"
+                     " device does not support hot reset.",
+                     vdev->vbasedev.name);
+        goto out;
+    }
+
+    /* List all affected devices by bus reset */
+    devices = &info->devices[0];
+
+    /* Verify that we have all the groups required */
+    for (i = 0; i < info->count; i++) {
+        PCIHostDeviceAddress host;
+        VFIOPCIDevice *tmp;
+        VFIODevice *vbasedev_iter;
+        bool found = false;
+
+        host.domain = devices[i].segment;
+        host.bus = devices[i].bus;
+        host.slot = PCI_SLOT(devices[i].devfn);
+        host.function = PCI_FUNC(devices[i].devfn);
+
+        /* Skip the current device */
+        if (vfio_pci_host_match(&host, &vdev->host)) {
+            continue;
+        }
+
+        /* Ensure we own the group of the affected device */
+        QLIST_FOREACH(group, &vfio_group_list, next) {
+            if (group->groupid == devices[i].group_id) {
+                break;
+            }
+        }
+
+        if (!group) {
+            error_report("vfio: Cannot enable AER for device %s, "
+                         "depends on group %d which is not owned.",
+                         vdev->vbasedev.name, devices[i].group_id);
+            ret = -1;
+            goto out;
+        }
+
+        /* Ensure affected devices for reset on/blow the bus */
+        QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
+            if (vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
+                continue;
+            }
+            tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
+            if (vfio_pci_host_match(&host, &tmp->host)) {
+                PCIDevice *pci = PCI_DEVICE(tmp);
+
+                /*
+                 * AER errors may be broadcast to all functions of a multi-
+                 * function endpoint.  If any of those sibling functions are
+                 * also assigned, they need to have AER enabled or else an
+                 * error may continue to cause a vm_stop condition.  IOW,
+                 * AER setup of this function would be pointless.
+                 */
+                if (vfio_pci_host_slot_match(&vdev->host, &tmp->host) &&
+                    !(tmp->features & VFIO_FEATURE_ENABLE_AER)) {
+                    error_report("vfio: Cannot enable AER for device %s, on same slot"
+                                 " the dependent device %s which does not enable AER.",
+                                 vdev->vbasedev.name, tmp->vbasedev.name);
+                    ret = -1;
+                    goto out;
+                }
+
+                find.pdev = pci;
+                find.found = false;
+                pci_for_each_device(bus, pci_bus_num(bus),
+                                    device_find, &find);
+                if (!find.found) {
+                    error_report("vfio: Cannot enable AER for device %s, "
+                                 "the dependent device %s is not under the same bus",
+                                 vdev->vbasedev.name, tmp->vbasedev.name);
+                    ret = -1;
+                    goto out;
+                }
+                found = true;
+                break;
+            }
+        }
+
+        /* Ensure all affected devices assigned to VM */
+        if (!found) {
+            error_report("vfio: Cannot enable AER for device %s, "
+                         "the dependent device %04x:%02x:%02x.%x "
+                         "is not assigned to VM.",
+                         vdev->vbasedev.name, host.domain, host.bus,
+                         host.slot, host.function);
+            ret = -1;
+            goto out;
+        }
+    }
+
+    /*
+     * Check the all pci devices on or below the target bus
+     * have a reset mechanism at least.
+     */
+    find.pdev = NULL;
+    find.found = false;
+    pci_for_each_device(bus, pci_bus_num(bus),
+                        vfio_check_device_noreset, &find);
+    if (find.found) {
+        error_report("vfio: Cannot enable AER for device %s, "
+                     "the affected device %s does not have a reset mechanism.",
+                     vdev->vbasedev.name, find.pdev->name);
+        ret = -1;
+        goto out;
+    }
+
+    ret = 0;
+out:
+    g_free(info);
+    return ret;
+}
+
+static int vfio_check_devices_host_bus_reset(void)
+{
+    VFIOGroup *group;
+    VFIODevice *vbasedev;
+    VFIOPCIDevice *vdev;
+
+    /* Check All vfio-pci devices if have bus reset capability */
+    QLIST_FOREACH(group, &vfio_group_list, next) {
+        QLIST_FOREACH(vbasedev, &group->device_list, next) {
+            if (vbasedev->type != VFIO_DEVICE_TYPE_PCI) {
+                continue;
+            }
+            vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+            if ((vdev->features & VFIO_FEATURE_ENABLE_AER) &&
+                vfio_check_host_bus_reset(vdev)) {
+                return -1;
+            }
+        }
+    }
+
+    return 0;
+}
+
 static int vfio_setup_aer(VFIOPCIDevice *vdev, uint8_t cap_ver,
                           int pos, uint16_t size)
 {
@@ -2009,13 +2221,6 @@ static void vfio_pci_post_reset(VFIOPCIDevice *vdev)
     vfio_intx_enable(vdev);
 }
 
-static bool vfio_pci_host_match(PCIHostDeviceAddress *host1,
-                                PCIHostDeviceAddress *host2)
-{
-    return (host1->domain == host2->domain && host1->bus == host2->bus &&
-            host1->slot == host2->slot && host1->function == host2->function);
-}
-
 static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
 {
     VFIOGroup *group;
@@ -2521,6 +2726,20 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
     vdev->req_enabled = false;
 }
 
+static void vfio_pci_machine_done_notify(Notifier *notifier, void *unused)
+{
+    int ret;
+
+    ret = vfio_check_devices_host_bus_reset();
+    if (ret) {
+        exit(1);
+    }
+}
+
+static Notifier machine_notifier = {
+    .notify = vfio_pci_machine_done_notify,
+};
+
 static int vfio_initfn(PCIDevice *pdev)
 {
     VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
@@ -2867,6 +3086,11 @@ static const TypeInfo vfio_pci_dev_info = {
 static void register_vfio_pci_dev_type(void)
 {
     type_register_static(&vfio_pci_dev_info);
+    /*
+     * Register notifier when machine init is done, since we need
+     * check the configration manner after all vfio device are inited.
+     */
+    qemu_add_machine_init_done_notifier(&machine_notifier);
 }
 
 type_init(register_vfio_pci_dev_type)
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 48c1f69..59ae194 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -15,6 +15,7 @@
 #include "qemu-common.h"
 #include "exec/memory.h"
 #include "hw/pci/pci.h"
+#include "hw/pci/pci_bus.h"
 #include "hw/pci/pci_bridge.h"
 #include "hw/vfio/vfio-common.h"
 #include "qemu/event_notifier.h"
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [v15 09/15] add check reset mechanism when hotplug vfio device
  2016-01-05  1:20 [Qemu-devel] [v15 00/15] vfio-pci: pass the aer error to guest Cao jin
                   ` (7 preceding siblings ...)
  2016-01-05  1:20 ` [Qemu-devel] [v15 08/15] vfio: add check host bus reset is support or not Cao jin
@ 2016-01-05  1:20 ` Cao jin
  2016-01-05  1:20 ` [Qemu-devel] [v15 10/15] pci: Introduce device hot reset Cao jin
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Cao jin @ 2016-01-05  1:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: chen.fan.fnst, alex.williamson, mst

From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>

Since we support multi-function hotplug. the function 0 indicate
the closure of the slot, so we have the chance to do the check.

Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
---
 hw/pci/pci.c             | 29 +++++++++++++++++++++++++++++
 hw/vfio/pci.c            | 19 +++++++++++++++++++
 hw/vfio/pci.h            |  2 ++
 include/hw/pci/pci_bus.h |  5 +++++
 4 files changed, 55 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 168b9cc..f6ca6ef 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -81,6 +81,7 @@ static void pci_bus_realize(BusState *qbus, Error **errp)
     PCIBus *bus = PCI_BUS(qbus);
 
     vmstate_register(NULL, -1, &vmstate_pcibus, bus);
+    notifier_with_return_list_init(&bus->hotplug_notifiers);
 }
 
 static void pci_bus_unrealize(BusState *qbus, Error **errp)
@@ -1835,6 +1836,22 @@ PCIDevice *pci_find_device(PCIBus *bus, int bus_num, uint8_t devfn)
     return bus->devices[devfn];
 }
 
+void pci_bus_add_hotplug_notifier(PCIBus *bus, NotifierWithReturn *notify)
+{
+    notifier_with_return_list_add(&bus->hotplug_notifiers, notify);
+}
+
+void pci_bus_remove_hotplug_notifier(NotifierWithReturn *notifier)
+{
+    notifier_with_return_remove(notifier);
+}
+
+static int pci_bus_hotplug_notifier(PCIBus *bus, void *opaque)
+{
+    return notifier_with_return_list_notify(&bus->hotplug_notifiers,
+                                            opaque);
+}
+
 static void pci_qdev_realize(DeviceState *qdev, Error **errp)
 {
     PCIDevice *pci_dev = (PCIDevice *)qdev;
@@ -1877,6 +1894,18 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
         pci_qdev_unrealize(DEVICE(pci_dev), NULL);
         return;
     }
+
+    /*
+     *  If the function is func 0, indicate the closure of the slot.
+     *  signal the callback.
+     */
+    if (DEVICE(pci_dev)->hotplugged &&
+        pci_get_function_0(pci_dev) == pci_dev &&
+        pci_bus_hotplug_notifier(bus, pci_dev)) {
+        error_setg(errp, "failed to hotplug function 0");
+        pci_qdev_unrealize(DEVICE(pci_dev), NULL);
+        return;
+    }
 }
 
 static void pci_default_realize(PCIDevice *dev, Error **errp)
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 16ab0e3..ff25c9b 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2044,6 +2044,19 @@ static int vfio_check_devices_host_bus_reset(void)
     return 0;
 }
 
+static int vfio_check_bus_reset(NotifierWithReturn *n, void *opaque)
+{
+    VFIOPCIDevice *vdev = container_of(n, VFIOPCIDevice, hotplug_notifier);
+    PCIDevice *pci_dev = PCI_DEVICE(vdev);
+    PCIDevice *pci_func0 = opaque;
+
+    if (pci_get_function_0(pci_dev) != pci_func0) {
+        return 0;
+    }
+
+    return vfio_check_host_bus_reset(vdev);
+}
+
 static int vfio_setup_aer(VFIOPCIDevice *vdev, uint8_t cap_ver,
                           int pos, uint16_t size)
 {
@@ -2091,6 +2104,9 @@ static int vfio_setup_aer(VFIOPCIDevice *vdev, uint8_t cap_ver,
         pdev->exp.aer_log.log_max = 0;
     }
 
+    vdev->hotplug_notifier.notify = vfio_check_bus_reset;
+    pci_bus_add_hotplug_notifier(pdev->bus, &vdev->hotplug_notifier);
+
     pcie_cap_deverr_init(pdev);
     return pcie_aer_init(pdev, pos, size);
 
@@ -2972,6 +2988,9 @@ static void vfio_exitfn(PCIDevice *pdev)
     vfio_unregister_req_notifier(vdev);
     vfio_unregister_err_notifier(vdev);
     pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
+    if (vdev->features & VFIO_FEATURE_ENABLE_AER) {
+        pci_bus_remove_hotplug_notifier(&vdev->hotplug_notifier);
+    }
     vfio_disable_interrupts(vdev);
     if (vdev->intx.mmap_timer) {
         timer_free(vdev->intx.mmap_timer);
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 59ae194..b385f07 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -142,6 +142,8 @@ typedef struct VFIOPCIDevice {
     bool no_kvm_intx;
     bool no_kvm_msi;
     bool no_kvm_msix;
+
+    NotifierWithReturn hotplug_notifier;
 } VFIOPCIDevice;
 
 uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h
index 403fec6..7812fa9 100644
--- a/include/hw/pci/pci_bus.h
+++ b/include/hw/pci/pci_bus.h
@@ -39,8 +39,13 @@ struct PCIBus {
        Keep a count of the number of devices with raised IRQs.  */
     int nirq;
     int *irq_count;
+
+    NotifierWithReturnList hotplug_notifiers;
 };
 
+void pci_bus_add_hotplug_notifier(PCIBus *bus, NotifierWithReturn *notify);
+void pci_bus_remove_hotplug_notifier(NotifierWithReturn *notify);
+
 typedef struct PCIBridgeWindows PCIBridgeWindows;
 
 /*
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [v15 10/15] pci: Introduce device hot reset
  2016-01-05  1:20 [Qemu-devel] [v15 00/15] vfio-pci: pass the aer error to guest Cao jin
                   ` (8 preceding siblings ...)
  2016-01-05  1:20 ` [Qemu-devel] [v15 09/15] add check reset mechanism when hotplug vfio device Cao jin
@ 2016-01-05  1:20 ` Cao jin
  2016-01-05  1:20 ` [Qemu-devel] [v15 11/15] vfio: add hot reset callback Cao jin
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Cao jin @ 2016-01-05  1:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: chen.fan.fnst, alex.williamson, mst

From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>

The secondary bus reset in bridge control register setting trigger
a hot reset, Specially for vfio device, we usually need to do a hot
reset for the host bus other than the device reset.

Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
---
 hw/core/qdev.c         | 24 ++++++++++++++++++++++++
 hw/pci/pci_bridge.c    |  2 +-
 include/hw/qdev-core.h |  3 +++
 3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index b3ad467..9c48bae 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -311,6 +311,13 @@ static int qdev_reset_one(DeviceState *dev, void *opaque)
     return 0;
 }
 
+static int qdev_hot_reset_one(DeviceState *dev, void *opaque)
+{
+    device_hot_reset(dev);
+
+    return 0;
+}
+
 static int qbus_reset_one(BusState *bus, void *opaque)
 {
     BusClass *bc = BUS_GET_CLASS(bus);
@@ -335,6 +342,11 @@ void qbus_reset_all(BusState *bus)
     qbus_walk_children(bus, NULL, NULL, qdev_reset_one, qbus_reset_one, NULL);
 }
 
+void qbus_hot_reset_all(BusState *bus)
+{
+    qbus_walk_children(bus, NULL, NULL, qdev_hot_reset_one, qbus_reset_one, NULL);
+}
+
 void qbus_reset_all_fn(void *opaque)
 {
     BusState *bus = opaque;
@@ -1284,6 +1296,18 @@ void device_reset(DeviceState *dev)
     }
 }
 
+void device_hot_reset(DeviceState *dev)
+{
+    DeviceClass *klass = DEVICE_GET_CLASS(dev);
+
+    if (klass->hot_reset) {
+        klass->hot_reset(dev);
+        return;
+    }
+
+    device_reset(dev);
+}
+
 Object *qdev_get_machine(void)
 {
     static Object *dev;
diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
index 40c97b1..f1903db 100644
--- a/hw/pci/pci_bridge.c
+++ b/hw/pci/pci_bridge.c
@@ -268,7 +268,7 @@ void pci_bridge_write_config(PCIDevice *d,
     newctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL);
     if (~oldctl & newctl & PCI_BRIDGE_CTL_BUS_RESET) {
         /* Trigger hot reset on 0->1 transition. */
-        qbus_reset_all(&s->sec_bus.qbus);
+        qbus_hot_reset_all(&s->sec_bus.qbus);
     }
 }
 
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index c537969..e9fe4b3 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -131,6 +131,7 @@ typedef struct DeviceClass {
 
     /* callbacks */
     void (*reset)(DeviceState *dev);
+    void (*hot_reset)(DeviceState *dev);
     DeviceRealize realize;
     DeviceUnrealize unrealize;
 
@@ -351,6 +352,7 @@ void qdev_reset_all_fn(void *opaque);
  */
 void qbus_reset_all(BusState *bus);
 void qbus_reset_all_fn(void *opaque);
+void qbus_hot_reset_all(BusState *bus);
 
 /* This should go away once we get rid of the NULL bus hack */
 BusState *sysbus_get_default(void);
@@ -372,6 +374,7 @@ void qdev_machine_init(void);
  * Reset a single device (by calling the reset method).
  */
 void device_reset(DeviceState *dev);
+void device_hot_reset(DeviceState *dev);
 
 const struct VMStateDescription *qdev_get_vmsd(DeviceState *dev);
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [v15 11/15] vfio: add hot reset callback
  2016-01-05  1:20 [Qemu-devel] [v15 00/15] vfio-pci: pass the aer error to guest Cao jin
                   ` (9 preceding siblings ...)
  2016-01-05  1:20 ` [Qemu-devel] [v15 10/15] pci: Introduce device hot reset Cao jin
@ 2016-01-05  1:20 ` Cao jin
  2016-01-05  1:20 ` [Qemu-devel] [v15 12/15] vfio: add bus in reset flag Cao jin
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: Cao jin @ 2016-01-05  1:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: chen.fan.fnst, alex.williamson, mst

From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>

For Vfio device, Once need to recovery devices by bus reset such as AER,
we always need to reset the host bus to recovery the devices under the bus,
so we need to specify to do host bus reset.
---
 hw/vfio/pci.c | 15 +++++++++++++++
 hw/vfio/pci.h |  1 +
 2 files changed, 16 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index ff25c9b..ee88db3 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1914,6 +1914,8 @@ static int vfio_check_host_bus_reset(VFIOPCIDevice *vdev)
     /* List all affected devices by bus reset */
     devices = &info->devices[0];
 
+    vdev->single_depend_dev = (info->count == 1);
+
     /* Verify that we have all the groups required */
     for (i = 0; i < info->count; i++) {
         PCIHostDeviceAddress host;
@@ -3035,6 +3037,18 @@ post_reset:
     vfio_pci_post_reset(vdev);
 }
 
+static void vfio_pci_device_hot_reset(DeviceState *dev)
+{
+    PCIDevice *pdev = DO_UPCAST(PCIDevice, qdev, dev);
+    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
+
+    if (vdev->features & VFIO_FEATURE_ENABLE_AER) {
+        vfio_pci_hot_reset(vdev, vdev->single_depend_dev);
+    } else {
+        vfio_pci_reset(dev);
+    }
+}
+
 static void vfio_instance_init(Object *obj)
 {
     PCIDevice *pci_dev = PCI_DEVICE(obj);
@@ -3082,6 +3096,7 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
     PCIDeviceClass *pdc = PCI_DEVICE_CLASS(klass);
 
     dc->reset = vfio_pci_reset;
+    dc->hot_reset = vfio_pci_device_hot_reset;
     dc->props = vfio_pci_dev_properties;
     dc->vmsd = &vfio_pci_vmstate;
     dc->desc = "VFIO-based PCI device assignment";
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index b385f07..6186e62 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -142,6 +142,7 @@ typedef struct VFIOPCIDevice {
     bool no_kvm_intx;
     bool no_kvm_msi;
     bool no_kvm_msix;
+    bool single_depend_dev;
 
     NotifierWithReturn hotplug_notifier;
 } VFIOPCIDevice;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [v15 12/15] vfio: add bus in reset flag
  2016-01-05  1:20 [Qemu-devel] [v15 00/15] vfio-pci: pass the aer error to guest Cao jin
                   ` (10 preceding siblings ...)
  2016-01-05  1:20 ` [Qemu-devel] [v15 11/15] vfio: add hot reset callback Cao jin
@ 2016-01-05  1:20 ` Cao jin
  2016-01-05 19:58   ` Alex Williamson
  2016-01-05  1:20 ` [Qemu-devel] [v15 13/15] pcie_aer: expose pcie_aer_msg() interface Cao jin
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 20+ messages in thread
From: Cao jin @ 2016-01-05  1:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: chen.fan.fnst, alex.williamson, mst

From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>

mark the host bus be in reset. avoid multiple devices trigger the
host bus reset many times.

Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
---
 hw/vfio/pci.c                 | 6 ++++++
 include/hw/vfio/vfio-common.h | 1 +
 2 files changed, 7 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index ee88db3..aa0d945 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2249,6 +2249,11 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
 
     trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi");
 
+    if (vdev->vbasedev.bus_in_reset) {
+        vdev->vbasedev.bus_in_reset = false;
+        return 0;
+    }
+
     vfio_pci_pre_reset(vdev);
     vdev->vbasedev.needs_reset = false;
 
@@ -2312,6 +2317,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
                 }
                 vfio_pci_pre_reset(tmp);
                 tmp->vbasedev.needs_reset = false;
+                tmp->vbasedev.bus_in_reset = true;
                 multi = true;
                 break;
             }
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index f037f3c..44b19d7 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -95,6 +95,7 @@ typedef struct VFIODevice {
     bool reset_works;
     bool needs_reset;
     bool no_mmap;
+    bool bus_in_reset;
     VFIODeviceOps *ops;
     unsigned int num_irqs;
     unsigned int num_regions;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [v15 13/15] pcie_aer: expose pcie_aer_msg() interface
  2016-01-05  1:20 [Qemu-devel] [v15 00/15] vfio-pci: pass the aer error to guest Cao jin
                   ` (11 preceding siblings ...)
  2016-01-05  1:20 ` [Qemu-devel] [v15 12/15] vfio: add bus in reset flag Cao jin
@ 2016-01-05  1:20 ` Cao jin
  2016-01-05  1:20 ` [Qemu-devel] [v15 14/15] vfio-pci: pass the aer error to guest Cao jin
  2016-01-05  1:20 ` [Qemu-devel] [v15 15/15] vfio: add 'aer' property to expose aercap Cao jin
  14 siblings, 0 replies; 20+ messages in thread
From: Cao jin @ 2016-01-05  1:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: chen.fan.fnst, alex.williamson, mst

From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>

For vfio device, we need to propagate the aer error to
Guest OS. we use the pcie_aer_msg() to send aer error
to guest.

Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/pci/pcie_aer.c         | 2 +-
 include/hw/pci/pcie_aer.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c
index 45f351b..fbbd7d2 100644
--- a/hw/pci/pcie_aer.c
+++ b/hw/pci/pcie_aer.c
@@ -370,7 +370,7 @@ static void pcie_aer_msg_root_port(PCIDevice *dev, const PCIEAERMsg *msg)
  *
  * Walk up the bus tree from the device, propagate the error message.
  */
-static void pcie_aer_msg(PCIDevice *dev, const PCIEAERMsg *msg)
+void pcie_aer_msg(PCIDevice *dev, const PCIEAERMsg *msg)
 {
     uint8_t type;
 
diff --git a/include/hw/pci/pcie_aer.h b/include/hw/pci/pcie_aer.h
index 156acb0..c2ee4e2 100644
--- a/include/hw/pci/pcie_aer.h
+++ b/include/hw/pci/pcie_aer.h
@@ -102,5 +102,6 @@ void pcie_aer_root_write_config(PCIDevice *dev,
 
 /* error injection */
 int pcie_aer_inject_error(PCIDevice *dev, const PCIEAERErr *err);
+void pcie_aer_msg(PCIDevice *dev, const PCIEAERMsg *msg);
 
 #endif /* QEMU_PCIE_AER_H */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [v15 14/15] vfio-pci: pass the aer error to guest
  2016-01-05  1:20 [Qemu-devel] [v15 00/15] vfio-pci: pass the aer error to guest Cao jin
                   ` (12 preceding siblings ...)
  2016-01-05  1:20 ` [Qemu-devel] [v15 13/15] pcie_aer: expose pcie_aer_msg() interface Cao jin
@ 2016-01-05  1:20 ` Cao jin
  2016-01-05  1:20 ` [Qemu-devel] [v15 15/15] vfio: add 'aer' property to expose aercap Cao jin
  14 siblings, 0 replies; 20+ messages in thread
From: Cao jin @ 2016-01-05  1:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: chen.fan.fnst, alex.williamson, mst

From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>

when the vfio device encounters an uncorrectable error in host,
the vfio_pci driver will signal the eventfd registered by this
vfio device, the results in the qemu eventfd handler getting
invoked.

this patch is to pass the error to guest and have the guest driver
recover from the error.

Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
---
 hw/vfio/pci.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 47 insertions(+), 6 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index aa0d945..bc81132 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2557,18 +2557,59 @@ static void vfio_put_device(VFIOPCIDevice *vdev)
 static void vfio_err_notifier_handler(void *opaque)
 {
     VFIOPCIDevice *vdev = opaque;
+    PCIDevice *dev = &vdev->pdev;
+    PCIEAERMsg msg = {
+        .severity = 0,
+        .source_id = (pci_bus_num(dev->bus) << 8) | dev->devfn,
+    };
 
     if (!event_notifier_test_and_clear(&vdev->err_notifier)) {
         return;
     }
 
     /*
-     * TBD. Retrieve the error details and decide what action
-     * needs to be taken. One of the actions could be to pass
-     * the error to the guest and have the guest driver recover
-     * from the error. This requires that PCIe capabilities be
-     * exposed to the guest. For now, we just terminate the
-     * guest to contain the error.
+     * in case the real hardware configration has been changed,
+     * here we should recheck the bus reset capability.
+     */
+    if ((vdev->features & VFIO_FEATURE_ENABLE_AER) &&
+        vfio_check_host_bus_reset(vdev)) {
+        goto stop;
+    }
+    /*
+     * we should read the error details from the real hardware
+     * configuration spaces, here we only need to do is signaling
+     * to guest an uncorrectable error has occurred.
+     */
+    if ((vdev->features & VFIO_FEATURE_ENABLE_AER) &&
+        dev->exp.aer_cap) {
+        uint8_t *aer_cap = dev->config + dev->exp.aer_cap;
+        uint32_t uncor_status;
+        bool isfatal;
+
+        uncor_status = vfio_pci_read_config(dev,
+                           dev->exp.aer_cap + PCI_ERR_UNCOR_STATUS, 4);
+
+        /*
+         * if we receive the error signal but not this device, we can
+         * just ignore it.
+         */
+        if (!(uncor_status & ~0UL)) {
+            return;
+        }
+
+        isfatal = uncor_status & pci_get_long(aer_cap + PCI_ERR_UNCOR_SEVER);
+
+        msg.severity = isfatal ? PCI_ERR_ROOT_CMD_FATAL_EN :
+                                 PCI_ERR_ROOT_CMD_NONFATAL_EN;
+
+        pcie_aer_msg(dev, &msg);
+        return;
+    }
+
+stop:
+    /*
+     * If the aer capability is not exposed to the guest. we just
+     * terminate the guest to contain the error.
      */
 
     error_report("%s(%04x:%02x:%02x.%x) Unrecoverable error detected.  "
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [v15 15/15] vfio: add 'aer' property to expose aercap
  2016-01-05  1:20 [Qemu-devel] [v15 00/15] vfio-pci: pass the aer error to guest Cao jin
                   ` (13 preceding siblings ...)
  2016-01-05  1:20 ` [Qemu-devel] [v15 14/15] vfio-pci: pass the aer error to guest Cao jin
@ 2016-01-05  1:20 ` Cao jin
  14 siblings, 0 replies; 20+ messages in thread
From: Cao jin @ 2016-01-05  1:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: chen.fan.fnst, alex.williamson, mst

From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>

add 'aer' property to let user able to decide whether expose
the aer capability. by default we should disable aer feature,
because it needs configuration restrictions.

Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
---
 hw/vfio/pci.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index bc81132..e800cf8 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3124,6 +3124,8 @@ static Property vfio_pci_dev_properties[] = {
                        sub_vendor_id, PCI_ANY_ID),
     DEFINE_PROP_UINT32("x-pci-sub-device-id", VFIOPCIDevice,
                        sub_device_id, PCI_ANY_ID),
+    DEFINE_PROP_BIT("aer", VFIOPCIDevice, features,
+                    VFIO_FEATURE_ENABLE_AER_BIT, false),
     /*
      * TODO - support passed fds... is this necessary?
      * DEFINE_PROP_STRING("vfiofd", VFIOPCIDevice, vfiofd_name),
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [v15 12/15] vfio: add bus in reset flag
  2016-01-05  1:20 ` [Qemu-devel] [v15 12/15] vfio: add bus in reset flag Cao jin
@ 2016-01-05 19:58   ` Alex Williamson
  2016-01-06  2:13     ` Chen Fan
  0 siblings, 1 reply; 20+ messages in thread
From: Alex Williamson @ 2016-01-05 19:58 UTC (permalink / raw)
  To: Cao jin, qemu-devel; +Cc: chen.fan.fnst, mst

On Tue, 2016-01-05 at 09:20 +0800, Cao jin wrote:
> From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
> 
> mark the host bus be in reset. avoid multiple devices trigger the
> host bus reset many times.
> 
> Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
> ---
>  hw/vfio/pci.c                 | 6 ++++++
>  include/hw/vfio/vfio-common.h | 1 +
>  2 files changed, 7 insertions(+)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index ee88db3..aa0d945 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -2249,6 +2249,11 @@ static int vfio_pci_hot_reset(VFIOPCIDevice
> *vdev, bool single)
>  
>      trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" :
> "multi");
>  
> +    if (vdev->vbasedev.bus_in_reset) {
> +        vdev->vbasedev.bus_in_reset = false;
> +        return 0;
> +    }
> +
>      vfio_pci_pre_reset(vdev);
>      vdev->vbasedev.needs_reset = false;
>  
> @@ -2312,6 +2317,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice
> *vdev, bool single)
>                  }
>                  vfio_pci_pre_reset(tmp);
>                  tmp->vbasedev.needs_reset = false;
> +                tmp->vbasedev.bus_in_reset = true;
>                  multi = true;
>                  break;
>              }
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-
> common.h
> index f037f3c..44b19d7 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -95,6 +95,7 @@ typedef struct VFIODevice {
>      bool reset_works;
>      bool needs_reset;
>      bool no_mmap;
> +    bool bus_in_reset;
>      VFIODeviceOps *ops;
>      unsigned int num_irqs;
>      unsigned int num_regions;

I imagine this should be a VFIOPCIDevice field, it has no use in the
common code.  The name is also a bit confusing; when I suggested a
bus_in_reset flag, I was referring to a property on the bus itself that
the existing device_reset could query to switch modes rather than add a
separate callback as you've done in this series.  This works, but it's
perhaps more intrusive than I was thinking.  It will need to get
approval by qdev folks.

In any case, this bus_in_reset field is tracking whether a device has
already been reset as part of a hot reset, sort of a more bus-based
version with opposite polarity of needs_reset.  It doesn't actually
track the bus reset state at all, it tracks whether we should skip the
next call to hot reset for that device.  So it should probably be
something like VFIOPCIDevice.skip_hot_reset (though that's not a great
name either).

I also wonder if a "hot" reset callback in qbus is really too PCI
centered, should it just be "bus_reset"?

Finally, it would be great if you could mention in the cover email
which patches are new or more than superficially modified from the
previous version so we can more easily focus on the new code to review.
 Thanks!

Alex

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [v15 12/15] vfio: add bus in reset flag
  2016-01-05 19:58   ` Alex Williamson
@ 2016-01-06  2:13     ` Chen Fan
  2016-01-06 16:44       ` Alex Williamson
  0 siblings, 1 reply; 20+ messages in thread
From: Chen Fan @ 2016-01-06  2:13 UTC (permalink / raw)
  To: Alex Williamson, Cao jin, qemu-devel; +Cc: mst


On 01/06/2016 03:58 AM, Alex Williamson wrote:
> On Tue, 2016-01-05 at 09:20 +0800, Cao jin wrote:
>> From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
>>
>> mark the host bus be in reset. avoid multiple devices trigger the
>> host bus reset many times.
>>
>> Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
>> ---
>>   hw/vfio/pci.c                 | 6 ++++++
>>   include/hw/vfio/vfio-common.h | 1 +
>>   2 files changed, 7 insertions(+)
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index ee88db3..aa0d945 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -2249,6 +2249,11 @@ static int vfio_pci_hot_reset(VFIOPCIDevice
>> *vdev, bool single)
>>   
>>       trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" :
>> "multi");
>>   
>> +    if (vdev->vbasedev.bus_in_reset) {
>> +        vdev->vbasedev.bus_in_reset = false;
>> +        return 0;
>> +    }
>> +
>>       vfio_pci_pre_reset(vdev);
>>       vdev->vbasedev.needs_reset = false;
>>   
>> @@ -2312,6 +2317,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice
>> *vdev, bool single)
>>                   }
>>                   vfio_pci_pre_reset(tmp);
>>                   tmp->vbasedev.needs_reset = false;
>> +                tmp->vbasedev.bus_in_reset = true;
>>                   multi = true;
>>                   break;
>>               }
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-
>> common.h
>> index f037f3c..44b19d7 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -95,6 +95,7 @@ typedef struct VFIODevice {
>>       bool reset_works;
>>       bool needs_reset;
>>       bool no_mmap;
>> +    bool bus_in_reset;
>>       VFIODeviceOps *ops;
>>       unsigned int num_irqs;
>>       unsigned int num_regions;
> I imagine this should be a VFIOPCIDevice field, it has no use in the
> common code.  The name is also a bit confusing; when I suggested a
> bus_in_reset flag, I was referring to a property on the bus itself that
> the existing device_reset could query to switch modes rather than add a
> separate callback as you've done in this series.  This works, but it's
> perhaps more intrusive than I was thinking.  It will need to get
> approval by qdev folks.
maybe I don't get your point. I just think add a bus_in_reset flag in bus
has no much sense. for instance, if assigning device A and B from
different host bus into a same virtual bus. assume all check passed.
then if device A aer occurs. we should reset virtual bus to recover
the device A, we also need to reset the device B and do device B host
bus reset. but here the bus_in_reset just denote the device B not need
to do host bus reset, it's incorrect. right?

>
> In any case, this bus_in_reset field is tracking whether a device has
> already been reset as part of a hot reset, sort of a more bus-based
> version with opposite polarity of needs_reset.  It doesn't actually
> track the bus reset state at all, it tracks whether we should skip the
> next call to hot reset for that device.  So it should probably be
> something like VFIOPCIDevice.skip_hot_reset (though that's not a great
> name either).
>
> I also wonder if a "hot" reset callback in qbus is really too PCI
> centered, should it just be "bus_reset"?
>
> Finally, it would be great if you could mention in the cover email
> which patches are new or more than superficially modified from the
> previous version so we can more easily focus on the new code to review.
>   Thanks!

Oh I am sorry, thanks for your mention. I will detail the change
from next version.

Thanks,
Chen

>
> Alex
>
>
> .
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [v15 12/15] vfio: add bus in reset flag
  2016-01-06  2:13     ` Chen Fan
@ 2016-01-06 16:44       ` Alex Williamson
  2016-01-12  1:13         ` Chen Fan
  0 siblings, 1 reply; 20+ messages in thread
From: Alex Williamson @ 2016-01-06 16:44 UTC (permalink / raw)
  To: Chen Fan, Cao jin, qemu-devel; +Cc: mst

On Wed, 2016-01-06 at 10:13 +0800, Chen Fan wrote:
> On 01/06/2016 03:58 AM, Alex Williamson wrote:
> > On Tue, 2016-01-05 at 09:20 +0800, Cao jin wrote:
> > > From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
> > > 
> > > mark the host bus be in reset. avoid multiple devices trigger the
> > > host bus reset many times.
> > > 
> > > Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
> > > ---
> > >   hw/vfio/pci.c                 | 6 ++++++
> > >   include/hw/vfio/vfio-common.h | 1 +
> > >   2 files changed, 7 insertions(+)
> > > 
> > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > > index ee88db3..aa0d945 100644
> > > --- a/hw/vfio/pci.c
> > > +++ b/hw/vfio/pci.c
> > > @@ -2249,6 +2249,11 @@ static int
> > > vfio_pci_hot_reset(VFIOPCIDevice
> > > *vdev, bool single)
> > >   
> > >       trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ?
> > > "one" :
> > > "multi");
> > >   
> > > +    if (vdev->vbasedev.bus_in_reset) {
> > > +        vdev->vbasedev.bus_in_reset = false;
> > > +        return 0;
> > > +    }
> > > +
> > >       vfio_pci_pre_reset(vdev);
> > >       vdev->vbasedev.needs_reset = false;
> > >   
> > > @@ -2312,6 +2317,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice
> > > *vdev, bool single)
> > >                   }
> > >                   vfio_pci_pre_reset(tmp);
> > >                   tmp->vbasedev.needs_reset = false;
> > > +                tmp->vbasedev.bus_in_reset = true;
> > >                   multi = true;
> > >                   break;
> > >               }
> > > diff --git a/include/hw/vfio/vfio-common.h
> > > b/include/hw/vfio/vfio-
> > > common.h
> > > index f037f3c..44b19d7 100644
> > > --- a/include/hw/vfio/vfio-common.h
> > > +++ b/include/hw/vfio/vfio-common.h
> > > @@ -95,6 +95,7 @@ typedef struct VFIODevice {
> > >       bool reset_works;
> > >       bool needs_reset;
> > >       bool no_mmap;
> > > +    bool bus_in_reset;
> > >       VFIODeviceOps *ops;
> > >       unsigned int num_irqs;
> > >       unsigned int num_regions;
> > I imagine this should be a VFIOPCIDevice field, it has no use in
> > the
> > common code.  The name is also a bit confusing; when I suggested a
> > bus_in_reset flag, I was referring to a property on the bus itself
> > that
> > the existing device_reset could query to switch modes rather than
> > add a
> > separate callback as you've done in this series.  This works, but
> > it's
> > perhaps more intrusive than I was thinking.  It will need to get
> > approval by qdev folks.
> maybe I don't get your point. I just think add a bus_in_reset flag in
> bus
> has no much sense. for instance, if assigning device A and B from
> different host bus into a same virtual bus. assume all check passed.
> then if device A aer occurs. we should reset virtual bus to recover
> the device A, we also need to reset the device B and do device B host
> bus reset. but here the bus_in_reset just denote the device B not
> need
> to do host bus reset, it's incorrect. right?

Let's take an example of the state of this flag on the device to see
why the name doesn't make sense to me.  We have a dual port card with
devices A and B, both on the same host bus.  We start a hot reset on A
and we have the following states:

A.bus_in_reset = false
B.bus_in_reset = false

Well, that's not accurate.  As we complete the hot reset we tag device
B as already being reset:

A.bus_in_reset = false
B.bus_in_reset = true

That's not accurate either, they're both in the same bus hierarchy,
they should not be different.  Later hot reset is called on B and we're
back to:

A.bus_in_reset = false
B.bus_in_reset = false

So I agree with your algorithm, but the variable is not tracking the
state of the bus being in reset, it's tracking whether to skip the next
reset call.

The separate hot (bus) reset in DeviceClass seems unnecessary too, this
is where I think we could work entirely within the PCI code w/o new
qbus/qdev interfaces.  Imagine pci_bridge_write_config()
calls qbus_walk_children() directly instead of calling
qbus_reset_all().  The pre_busfn() could set a flag on the PCIBus to
indicate the bus is in reset.  qdev_reset_one() could be used as the
post_devfn() and the post_busfn() would call qdev_reset_one() followed
by a clear of the flag.  The modification to vfio is then simply that
if we're resetting an AER device and the bus is in reset, we know we
can do a hot reset.  It simply allows us to test which reset type is
occurring within the existing reset callback rather than adding a new
one.

Going back to my idea of a sequence ID, if we had:

bool PCIBus.bus_in_reset
uint PCIBus.bus_reset_seqid

The pre_busfn would do:

PCIBus.bus_in_reset = true
PCIBus.bus_reset_seqid++

Then we could add:

uint VFIOPCIDevice.last_bus_reset_seqid

vfio_pci_reset() would test (PCIBus.bus_in_reset && VFIOPCIDevice.AER)
to know whether to do a hot reset.  vfio_pci_hot_reset() would skip
devices for which (VFIOPCIDevice.last_bus_reset_seqid ==
PCIBus.bus_reset_seqid) and for each device reset would set
VFIOPCIDevice.last_bus_reset_seqid = PCIBus.bus_reset_seqid.

That feels like a much more deterministic solution if MST is willing to
support it in the PCI specific BusState.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [v15 12/15] vfio: add bus in reset flag
  2016-01-06 16:44       ` Alex Williamson
@ 2016-01-12  1:13         ` Chen Fan
  0 siblings, 0 replies; 20+ messages in thread
From: Chen Fan @ 2016-01-12  1:13 UTC (permalink / raw)
  To: Alex Williamson, Cao jin, qemu-devel; +Cc: mst


On 01/07/2016 12:44 AM, Alex Williamson wrote:
> On Wed, 2016-01-06 at 10:13 +0800, Chen Fan wrote:
>> On 01/06/2016 03:58 AM, Alex Williamson wrote:
>>> On Tue, 2016-01-05 at 09:20 +0800, Cao jin wrote:
>>>> From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
>>>>
>>>> mark the host bus be in reset. avoid multiple devices trigger the
>>>> host bus reset many times.
>>>>
>>>> Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
>>>> ---
>>>>    hw/vfio/pci.c                 | 6 ++++++
>>>>    include/hw/vfio/vfio-common.h | 1 +
>>>>    2 files changed, 7 insertions(+)
>>>>
>>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>>> index ee88db3..aa0d945 100644
>>>> --- a/hw/vfio/pci.c
>>>> +++ b/hw/vfio/pci.c
>>>> @@ -2249,6 +2249,11 @@ static int
>>>> vfio_pci_hot_reset(VFIOPCIDevice
>>>> *vdev, bool single)
>>>>    
>>>>        trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ?
>>>> "one" :
>>>> "multi");
>>>>    
>>>> +    if (vdev->vbasedev.bus_in_reset) {
>>>> +        vdev->vbasedev.bus_in_reset = false;
>>>> +        return 0;
>>>> +    }
>>>> +
>>>>        vfio_pci_pre_reset(vdev);
>>>>        vdev->vbasedev.needs_reset = false;
>>>>    
>>>> @@ -2312,6 +2317,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice
>>>> *vdev, bool single)
>>>>                    }
>>>>                    vfio_pci_pre_reset(tmp);
>>>>                    tmp->vbasedev.needs_reset = false;
>>>> +                tmp->vbasedev.bus_in_reset = true;
>>>>                    multi = true;
>>>>                    break;
>>>>                }
>>>> diff --git a/include/hw/vfio/vfio-common.h
>>>> b/include/hw/vfio/vfio-
>>>> common.h
>>>> index f037f3c..44b19d7 100644
>>>> --- a/include/hw/vfio/vfio-common.h
>>>> +++ b/include/hw/vfio/vfio-common.h
>>>> @@ -95,6 +95,7 @@ typedef struct VFIODevice {
>>>>        bool reset_works;
>>>>        bool needs_reset;
>>>>        bool no_mmap;
>>>> +    bool bus_in_reset;
>>>>        VFIODeviceOps *ops;
>>>>        unsigned int num_irqs;
>>>>        unsigned int num_regions;
>>> I imagine this should be a VFIOPCIDevice field, it has no use in
>>> the
>>> common code.  The name is also a bit confusing; when I suggested a
>>> bus_in_reset flag, I was referring to a property on the bus itself
>>> that
>>> the existing device_reset could query to switch modes rather than
>>> add a
>>> separate callback as you've done in this series.  This works, but
>>> it's
>>> perhaps more intrusive than I was thinking.  It will need to get
>>> approval by qdev folks.
>> maybe I don't get your point. I just think add a bus_in_reset flag in
>> bus
>> has no much sense. for instance, if assigning device A and B from
>> different host bus into a same virtual bus. assume all check passed.
>> then if device A aer occurs. we should reset virtual bus to recover
>> the device A, we also need to reset the device B and do device B host
>> bus reset. but here the bus_in_reset just denote the device B not
>> need
>> to do host bus reset, it's incorrect. right?
> Let's take an example of the state of this flag on the device to see
> why the name doesn't make sense to me.  We have a dual port card with
> devices A and B, both on the same host bus.  We start a hot reset on A
> and we have the following states:
>
> A.bus_in_reset = false
> B.bus_in_reset = false
>
> Well, that's not accurate.  As we complete the hot reset we tag device
> B as already being reset:
>
> A.bus_in_reset = false
> B.bus_in_reset = true
>
> That's not accurate either, they're both in the same bus hierarchy,
> they should not be different.  Later hot reset is called on B and we're
> back to:
>
> A.bus_in_reset = false
> B.bus_in_reset = false
>
> So I agree with your algorithm, but the variable is not tracking the
> state of the bus being in reset, it's tracking whether to skip the next
> reset call.
>
> The separate hot (bus) reset in DeviceClass seems unnecessary too, this
> is where I think we could work entirely within the PCI code w/o new
> qbus/qdev interfaces.  Imagine pci_bridge_write_config()
> calls qbus_walk_children() directly instead of calling
> qbus_reset_all().  The pre_busfn() could set a flag on the PCIBus to
> indicate the bus is in reset.  qdev_reset_one() could be used as the
> post_devfn() and the post_busfn() would call qdev_reset_one() followed
> by a clear of the flag.  The modification to vfio is then simply that
> if we're resetting an AER device and the bus is in reset, we know we
> can do a hot reset.  It simply allows us to test which reset type is
> occurring within the existing reset callback rather than adding a new
> one.
>
> Going back to my idea of a sequence ID, if we had:
>
> bool PCIBus.bus_in_reset
> uint PCIBus.bus_reset_seqid
>
> The pre_busfn would do:
>
> PCIBus.bus_in_reset = true
> PCIBus.bus_reset_seqid++
>
> Then we could add:
>
> uint VFIOPCIDevice.last_bus_reset_seqid
>
> vfio_pci_reset() would test (PCIBus.bus_in_reset && VFIOPCIDevice.AER)
> to know whether to do a hot reset.  vfio_pci_hot_reset() would skip
> devices for which (VFIOPCIDevice.last_bus_reset_seqid ==
> PCIBus.bus_reset_seqid) and for each device reset would set
> VFIOPCIDevice.last_bus_reset_seqid = PCIBus.bus_reset_seqid.
>
> That feels like a much more deterministic solution if MST is willing to
> support it in the PCI specific BusState.  Thanks,

Thanks for your suggestion. I will send out a new version soon.

Chen

>
> Alex
>
>
> .
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-01-12  1:18 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-05  1:20 [Qemu-devel] [v15 00/15] vfio-pci: pass the aer error to guest Cao jin
2016-01-05  1:20 ` [Qemu-devel] [v15 01/15] vfio: extract vfio_get_hot_reset_info as a single function Cao jin
2016-01-05  1:20 ` [Qemu-devel] [v15 02/15] vfio: squeeze out vfio_pci_do_hot_reset for support bus reset Cao jin
2016-01-05  1:20 ` [Qemu-devel] [v15 03/15] pcie: modify the capability size assert Cao jin
2016-01-05  1:20 ` [Qemu-devel] [v15 04/15] vfio: make the 4 bytes aligned for capability size Cao jin
2016-01-05  1:20 ` [Qemu-devel] [v15 05/15] vfio: add pcie extanded capability support Cao jin
2016-01-05  1:20 ` [Qemu-devel] [v15 06/15] aer: impove pcie_aer_init to support vfio device Cao jin
2016-01-05  1:20 ` [Qemu-devel] [v15 07/15] vfio: add aer support for " Cao jin
2016-01-05  1:20 ` [Qemu-devel] [v15 08/15] vfio: add check host bus reset is support or not Cao jin
2016-01-05  1:20 ` [Qemu-devel] [v15 09/15] add check reset mechanism when hotplug vfio device Cao jin
2016-01-05  1:20 ` [Qemu-devel] [v15 10/15] pci: Introduce device hot reset Cao jin
2016-01-05  1:20 ` [Qemu-devel] [v15 11/15] vfio: add hot reset callback Cao jin
2016-01-05  1:20 ` [Qemu-devel] [v15 12/15] vfio: add bus in reset flag Cao jin
2016-01-05 19:58   ` Alex Williamson
2016-01-06  2:13     ` Chen Fan
2016-01-06 16:44       ` Alex Williamson
2016-01-12  1:13         ` Chen Fan
2016-01-05  1:20 ` [Qemu-devel] [v15 13/15] pcie_aer: expose pcie_aer_msg() interface Cao jin
2016-01-05  1:20 ` [Qemu-devel] [v15 14/15] vfio-pci: pass the aer error to guest Cao jin
2016-01-05  1:20 ` [Qemu-devel] [v15 15/15] vfio: add 'aer' property to expose aercap Cao jin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).